Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) |
Date | |
Msg-id | CAEepm=1_LTOeGGJr2BquEwUJmBnT6PtEQx5GmD=i42GvVS3bcQ@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)
Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation) |
List | pgsql-hackers |
On Wed, Feb 1, 2017 at 8:46 PM, Peter Geoghegan <pg@bowt.ie> wrote: > On Tue, Jan 31, 2017 at 11:23 PM, Thomas Munro > <thomas.munro@enterprisedb.com> wrote: >> So I'm in favour of this patch, which is relatively simple and give us >> faster index builds soon. Eventually we might also be able to have >> approach 1. From what I gather, it's entirely possible that we might >> still need 2 to fall back on in some cases. > > Right. And it can form the basis of an implementation of 1, which in > any case seems to be much more compelling for parallel query, when a > great deal more can be pushed down, and we are not particularly likely > to be I/O bound (usually not much writing to the heap, or WAL > logging). I ran some tests today. First I created test tables representing the permutations of these choices: Table structure: int = Integer key only intwide = Integer key + wide row text = Text key only (using dictionary words) textwide = Text key + wide row Uniqueness: u = each value unique d = 10 duplicates of each value Heap physical order: rand = Random asc = Ascending order (already sorted) desc = Descending order (sorted backwards) I used 10 million rows for this test run, so that gave me 24 tables of the following sizes as reported in "\d+": int tables = 346MB each intwide tables = 1817MB each text tables = 441MB each textwide tables = 1953MB each It'd be interesting to test larger tables of course but I had a lot of permutations to get through. For each of those tables I ran tests corresponding to the permutations of these three variables: Index type: uniq = CREATE UNIQUE INDEX ("u" tables only, ie no duplicates) nonu = CREATE INDEX ("u" and "d" tables) Maintenance memory: 1M, 64MB, 256MB, 512MB Workers: from 0 up to 8 Environment: EDB test machine "cthulhu", Intel(R) Xeon(R) CPU E7- 8830 @ 2.13GHz, 8 socket, 8 cores (16 threads) per socket, CentOS 7.2, Linux kernel 3.10.0-229.7.2.el7.x86_64, 512GB RAM, pgdata on SSD. Database initialised with en_US.utf-8 collation, all defaults except max_wal_size increased to 4GB (otherwise warnings about too frequent checkpoints) and max_parallel_workers_maintenance = 8. Testing done with warm OS cache. I applied your v2 patch on top of 7ac4a389a7dbddaa8b19deb228f0a988e79c5795^ to avoid a conflict. It still had a couple of harmless conflicts that I was able to deal with (not code, just some header stuff moving around). See full results from all permutations attached, but I wanted to highlight the measurements from 'textwide', 'u', 'nonu' which show interesting 'asc' numbers (data already sorted). The 'mem' column is maintenance_work_mem in megabytes. The 'w = 0' column shows the time in seconds for parallel_workers = 0. The other 'w = N' columns show times with higher parallel_workers settings, represented as speed-up relative to the 'w = 0' time. 1. 'asc' = pre-sorted data (w = 0 shows time in seconds, other columns show speed-up relative to that time): mem | w = 0 | w = 1 | w = 2 | w = 3 | w = 4 | w = 5 | w = 6 | w = 7 | w = 8 -----+--------+-------+-------+-------+-------+-------+-------+-------+------- 1 | 119.97 | 4.61x | 4.83x | 5.32x | 5.61x | 5.88x | 6.10x | 6.18x | 6.09x 64 | 19.42 | 1.18x | 1.10x | 1.23x | 1.23x | 1.16x | 1.19x | 1.20x | 1.21x 256 | 18.35 | 1.02x | 0.92x | 0.98x | 1.02x | 1.06x | 1.07x | 1.08x | 1.10x 512 | 17.75 | 1.01x | 0.89x | 0.95x | 0.99x | 1.02x | 1.05x | 1.06x | 1.07x 2. 'rand' = randomised data: mem | w = 0 | w = 1 | w = 2 | w = 3 | w = 4 | w = 5 | w = 6 | w = 7 | w = 8 -----+--------+-------+-------+-------+-------+-------+-------+-------+------- 1 | 130.25 | 1.82x | 2.19x | 2.52x | 2.58x | 2.72x | 2.72x | 2.83x | 2.89x 64 | 117.36 | 1.80x | 2.20x | 2.43x | 2.47x | 2.55x | 2.51x | 2.59x | 2.69x 256 | 124.68 | 1.87x | 2.20x | 2.49x | 2.52x | 2.64x | 2.70x | 2.72x | 2.75x 512 | 115.77 | 1.51x | 1.72x | 2.14x | 2.08x | 2.19x | 2.31x | 2.44x | 2.48x 3. 'desc' = reverse-sorted data: mem | w = 0 | w = 1 | w = 2 | w = 3 | w = 4 | w = 5 | w = 6 | w = 7 | w = 8 -----+--------+-------+-------+-------+-------+-------+-------+-------+------- 1 | 115.19 | 1.88x | 2.39x | 2.78x | 3.50x | 3.62x | 4.20x | 4.19x | 4.39x 64 | 112.17 | 1.85x | 2.25x | 2.99x | 3.63x | 3.65x | 4.01x | 4.31x | 4.62x 256 | 119.55 | 1.76x | 2.21x | 2.85x | 3.43x | 3.37x | 3.77x | 4.24x | 4.28x 512 | 119.50 | 1.85x | 2.19x | 2.87x | 3.26x | 3.28x | 3.74x | 4.24x | 3.93x The 'asc' effects are much less pronounced when the key is an int. Here is the equivalent data for 'intwide', 'u', 'nonu': 1. 'asc' mem | w = 0 | w = 1 | w = 2 | w = 3 | w = 4 | w = 5 | w = 6 | w = 7 | w = 8 -----+-------+-------+-------+-------+-------+-------+-------+-------+------- 1 | 12.19 | 1.55x | 1.93x | 2.21x | 2.44x | 2.64x | 2.76x | 2.91x | 2.83x 64 | 7.35 | 1.29x | 1.53x | 1.69x | 1.86x | 1.98x | 2.04x | 2.07x | 2.09x 256 | 7.34 | 1.26x | 1.47x | 1.64x | 1.79x | 1.92x | 1.96x | 1.98x | 2.02x 512 | 7.24 | 1.24x | 1.46x | 1.65x | 1.80x | 1.91x | 1.97x | 2.00x | 1.92x 2. 'rand' mem | w = 0 | w = 1 | w = 2 | w = 3 | w = 4 | w = 5 | w = 6 | w = 7 | w = 8 -----+-------+-------+-------+-------+-------+-------+-------+-------+------- 1 | 15.16 | 1.56x | 2.01x | 2.32x | 2.57x | 2.73x | 2.87x | 2.95x | 2.91x 64 | 12.97 | 1.55x | 1.97x | 2.25x | 2.44x | 2.58x | 2.70x | 2.74x | 2.71x 256 | 13.14 | 1.47x | 1.86x | 2.12x | 2.31x | 2.50x | 2.62x | 2.58x | 2.69x 512 | 13.61 | 1.48x | 1.91x | 2.22x | 2.37x | 2.55x | 2.65x | 2.73x | 2.73x 3. 'desc' mem | w = 0 | w = 1 | w = 2 | w = 3 | w = 4 | w = 5 | w = 6 | w = 7 | w = 8 -----+-------+-------+-------+-------+-------+-------+-------+-------+------- 1 | 13.45 | 1.51x | 1.94x | 2.31x | 2.56x | 2.75x | 2.95x | 3.05x | 3.00x 64 | 10.27 | 1.42x | 1.82x | 2.05x | 2.30x | 2.46x | 2.59x | 2.64x | 2.65x 256 | 10.52 | 1.39x | 1.70x | 2.02x | 2.24x | 2.34x | 2.39x | 2.48x | 2.56x 512 | 10.62 | 1.43x | 1.82x | 2.06x | 2.32x | 2.51x | 2.61x | 2.68x | 2.69x Full result summary and scripts used for testing attached. -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: