Re: Using quicksort for every external sort run - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Using quicksort for every external sort run |
Date | |
Msg-id | CAM3SWZRTODngPxko3vJ5CcUrGzVqpQ7_Ff75Sioxq8+Hv=d4_A@mail.gmail.com Whole thread Raw |
In response to | Re: Using quicksort for every external sort run (Peter Geoghegan <pg@heroku.com>) |
Responses |
Re: Using quicksort for every external sort run
|
List | pgsql-hackers |
On Mon, Nov 30, 2015 at 12:29 PM, Peter Geoghegan <pg@heroku.com> wrote: >> I'm kind of curious as to why the optimal for the patched code appears >> at 1GB and not lower. If I get a chance to rebuild the test, I will >> look into that more. > > I think that the availability of abbreviated keys (or something that > allows most comparisons made by quicksort/the heap to be resolved at > the SortTuple level) could make a big difference for things like this. Using the Hydra POWER7 server [1] + the gensort benchmark [2], which uses the C collation, and has abbreviated keys that have lots of entropy, I see benefits with higher and higher maintenance_work_mem settings. I will present a variety of cases, which seemed like something Greg Stark is particularly interested in. On the whole, I am quite pleased with how things are shown to be improved in a variety of different scenarios. Looking at CREATE INDEX build times on an (unlogged) gensort table with 50 million, 100 million, 250 million, and 500 million tuples, with maintenance_work_mem settings of 512MB, 1GB, 10GB, and 15GB, there are sustained improvements as more memory is made available. I'm not saying that that would be the case with low cardinality leading attribute tuples -- probably not -- but it seems pretty nice that this case can sustain improvements as more memory is made available. The server used here has reasonably good disks (Robert goes into this in his blogpost), but nothing spectacular. This is what a 500 million tuple gensort table looks like: postgres=# \dt+ List of relations Schema | Name | Type | Owner | Size | Description --------+-----------+-------+-------+-------+------------- public | sort_test | table | pg | 32 GB | (1 row) Results: 50 million tuple table (best of 3): ------------------------------------------ 512MB: (8-way final merge) external sort ended, 171058 disk blocks used: CPU 4.11s/79.30u sec elapsed 83.60 sec 1GB: (4-way final merge) external sort ended, 171063 disk blocks used: CPU 4.29s/71.34u sec elapsed 75.69 sec 10GB: N/A 15GB: N/A 1GB (master branch): (3-way final merge) external sort ended, 171064 disk blocks used: CPU 6.19s/163.00u sec elapsed 170.84 sec 100 million tuple table (best of 3): -------------------------------------------- 512MB: (16-way final merge) external sort ended, 342114 disk blocks used: CPU 8.61s/177.77u sec elapsed 187.03 sec 1GB: (8-way final merge) external sort ended, 342124 disk blocks used: CPU 8.07s/165.15u sec elapsed 173.70 sec 10GB: N/A 15GB: N/A 1GB (master branch): (5-way final merge) external sort ended, 342129 disk blocks used: CPU 11.68s/358.17u sec elapsed 376.41 sec 250 million tuple table (best of 3): -------------------------------------------- 512MB: (39-way final merge) external sort ended, 855284 disk blocks used: CPU 19.96s/486.57u sec elapsed 507.89 sec 1GB: (20-way final merge) external sort ended, 855306 disk blocks used: CPU 22.63s/475.33u sec elapsed 499.09 sec 10GB: (2-way final merge) external sort ended, 855326 disk blocks used: CPU 21.99s/341.34u sec elapsed 366.15 sec 15GB: (2-way final merge) external sort ended, 855326 disk blocks used: CPU 23.23s/322.18u sec elapsed 346.97 sec 1GB (master branch): (11-way final merge) external sort ended, 855315 disk blocks used: CPU 30.56s/973.00u sec elapsed 1015.63 sec 500 million tuple table (best of 3): -------------------------------------------- 512MB: (77-way final merge) external sort ended, 1710566 disk blocks used: CPU 45.70s/1016.70u sec elapsed 1069.02 sec 1GB: (39-way final merge) external sort ended, 1710613 disk blocks used: CPU 44.34s/1013.26u sec elapsed 1067.16 sec 10GB: (4-way final merge) external sort ended, 1710649 disk blocks used: CPU 46.46s/772.97u sec elapsed 841.35 sec 15GB: (3-way final merge) external sort ended, 1710652 disk blocks used: CPU 51.55s/729.88u sec elapsed 809.68 sec 1GB (master branch): (20-way final merge) external sort ended, 1710632 disk blocks used: CPU 69.35s/2013.21u sec elapsed 2113.82 sec I attached a detailed account of these benchmarks, for those that really want to see the nitty-gritty. This includes a 1GB case for patch without memory prefetching (which is not described in this message). [1] http://rhaas.blogspot.com/2012/03/performance-and-scalability-on-ibm.html [2] https://github.com/petergeoghegan/gensort -- Peter Geoghegan
Attachment
pgsql-hackers by date: