[HACKERS] Atomics for heap_parallelscan_nextpage() - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | [HACKERS] Atomics for heap_parallelscan_nextpage() |
Date | |
Msg-id | CAKJS1f9tgsPhqBcoPjv9_KUPZvTLCZ4jy=B=bhqgaKn7cYzm-w@mail.gmail.com Whole thread Raw |
Responses |
Re: [HACKERS] Atomics for heap_parallelscan_nextpage()
Re: [HACKERS] Atomics for heap_parallelscan_nextpage() |
List | pgsql-hackers |
Hi, A while back I did some benchmarking on a big 4 socket machine to explore a bit around the outer limits of parallel aggregates. I discovered along the way that, given enough workers, and a simple enough task, that seq-scan workers were held up waiting for the lock to be released in heap_parallelscan_nextpage(). I've since done a little work in this area to try to improve things. I ended up posting about it yesterday in [1]. My original patch used batching to solve the issue; instead of allocating 1 block at a time, the batching patch allocated a range of 10 blocks for the worker to process. However, the implementation still needed a bit of work around reporting sync-scan locations. Andres mentioned in [2] that it might be worth exploring using atomics to do the same job. So I went ahead and did that, and came up with the attached, which is a slight variation on what he mentioned in the thread. To keep things a bit more simple, and streamline, I ended up pulling out the logic for setting the startblock into another function, which we only call once before the first call to heap_parallelscan_nextpage(). I also ended up changing phs_cblock and replacing it with a counter that always starts at zero. The actual block is calculated based on that + the startblock modulo nblocks. This makes things a good bit more simple for detecting when we've allocated all the blocks to the workers, and also works nicely when wrapping back to the start of a relation when we started somewhere in the middle due to piggybacking with a synchronous scan. Performance: With parallel_workers=71, it looks something like: Query 1: 881 GB, ~6 billion row TPC-H lineitem table. tpch=# select count(*) from lineitem; count ------------ 5999989709 (1 row) -- Master Time: 123421.283 ms (02:03.421) Time: 118895.846 ms (01:58.896) Time: 118632.546 ms (01:58.633) -- Atomics patch Time: 74038.813 ms (01:14.039) Time: 73166.200 ms (01:13.166) Time: 72492.338 ms (01:12.492) -- Batching Patch: Batching 10 pages at a time in heap_parallelscan_nextpage() Time: 76364.215 ms (01:16.364) Time: 75808.900 ms (01:15.809) Time: 74927.756 ms (01:14.928) Query 2: Single int column table with 2 billion rows. tpch=# select count(*) from a; count ------------ 2000000000 (1 row) -- Master Time: 5853.918 ms (00:05.854) Time: 5925.633 ms (00:05.926) Time: 5859.223 ms (00:05.859) -- Atomics patch Time: 5825.745 ms (00:05.826) Time: 5849.139 ms (00:05.849) Time: 5815.818 ms (00:05.816) -- Batching Patch: Batching 10 pages at a time in heap_parallelscan_nextpage() Time: 5789.237 ms (00:05.789) Time: 5837.395 ms (00:05.837) Time: 5821.492 ms (00:05.821) I've also attached a text file with the perf report for the lineitem query. You'll notice that the heap_parallelscan_nextpage() is very visible in master, but not on each of the two patches. With the 2nd query, heap_parallelscan_nextpage is fairly insignificant on master's profile, it's only showing up as 0.48%. Likely this must be due to more tuples being read from the page, and more aggregation work getting done before the next page is needed. I'm uncertain why I previously saw a speed up in this case in [1]. I've also noticed that both the atomics patch and unpatched master do something that looks a bit weird with synchronous seq-scans. If the parallel seq-scan piggybacked on another scan, then subsequent parallel scans will start at the same non-zero block location, even when no other concurrent scans exist. I'd have expected this should go back to block 0 again, but maybe I'm just failing to understand the reason for reporting the startblock to ss_report_location() at the end of the scan. I'll now add this to the first commitfest of pg11. I just wanted to note that I've done this, so that it's less likely someone else goes and repeats the same work. [1] https://www.postgresql.org/message-id/CAKJS1f-XhfQ2-%3D85wgYo5b3WtEs%3Dys%3D2Rsq%3DNuvnmaV4ZsM1XQ%40mail.gmail.com [2] https://www.postgresql.org/message-id/20170505023646.3uhnmf2hbwtm63lc%40alap3.anarazel.de -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: