Re: Memory prefetching while sequentially fetching from SortTuple array, tuplestore - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Memory prefetching while sequentially fetching from SortTuple array, tuplestore |
Date | |
Msg-id | CAM3SWZR5rv3+F3FOKf35=dti7oTmmcdFoe2voGuR0pddg3Jb+Q@mail.gmail.com Whole thread Raw |
In response to | Re: Memory prefetching while sequentially fetching from SortTuple array, tuplestore (Peter Geoghegan <pg@heroku.com>) |
Responses |
Re: Memory prefetching while sequentially fetching from
SortTuple array, tuplestore
Re: Memory prefetching while sequentially fetching from SortTuple array, tuplestore |
List | pgsql-hackers |
On Sun, Nov 29, 2015 at 10:14 PM, Peter Geoghegan <pg@heroku.com> wrote: > I'm currently running some benchmarks on my external sorting patch on > the POWER7 machine that Robert Haas and a few other people have been > using for some time now [1]. So far, the benchmarks look very good > across a variety of scales. > > I'll run a round of tests without the prefetching enabled (which the > patch series makes further use of -- they're also used when writing > tuples out). If there is no significant impact, I'll completely > abandon this patch, and we can move on. I took a look at this. It turns out prefetching significantly helps on the POWER7 system, when sorting gensort tables of 50 million, 100 million, 250 million, and 500 million tuples (3 CREATE INDEX tests for each case, 1GB maintenance_work_mem): [pg@hydra gensort]$ cat test_output_patch_1gb.txt | grep "sort ended" LOG: external sort ended, 171063 disk blocks used: CPU 4.33s/71.28u sec elapsed 75.75 sec LOG: external sort ended, 171063 disk blocks used: CPU 4.30s/71.32u sec elapsed 75.91 sec LOG: external sort ended, 171063 disk blocks used: CPU 4.29s/71.34u sec elapsed 75.69 sec LOG: external sort ended, 342124 disk blocks used: CPU 8.10s/165.56u sec elapsed 174.35 sec LOG: external sort ended, 342124 disk blocks used: CPU 8.07s/165.15u sec elapsed 173.70 sec LOG: external sort ended, 342124 disk blocks used: CPU 8.01s/164.73u sec elapsed 174.84 sec LOG: external sort ended, 855306 disk blocks used: CPU 23.65s/491.37u sec elapsed 522.44 sec LOG: external sort ended, 855306 disk blocks used: CPU 21.13s/508.02u sec elapsed 530.48 sec LOG: external sort ended, 855306 disk blocks used: CPU 22.63s/475.33u sec elapsed 499.09 sec LOG: external sort ended, 1710613 disk blocks used: CPU 47.99s/1016.78u sec elapsed 1074.55 sec LOG: external sort ended, 1710613 disk blocks used: CPU 46.52s/1015.25u sec elapsed 1078.23 sec LOG: external sort ended, 1710613 disk blocks used: CPU 44.34s/1013.26u sec elapsed 1067.16 sec [pg@hydra gensort]$ cat test_output_patch_noprefetch_1gb.txt | grep "sort ended" LOG: external sort ended, 171063 disk blocks used: CPU 4.79s/78.14u sec elapsed 83.03 sec LOG: external sort ended, 171063 disk blocks used: CPU 3.85s/77.71u sec elapsed 81.64 sec LOG: external sort ended, 171063 disk blocks used: CPU 3.94s/77.71u sec elapsed 81.71 sec LOG: external sort ended, 342124 disk blocks used: CPU 8.88s/180.15u sec elapsed 189.69 sec LOG: external sort ended, 342124 disk blocks used: CPU 8.30s/179.07u sec elapsed 187.92 sec LOG: external sort ended, 342124 disk blocks used: CPU 8.29s/179.06u sec elapsed 188.02 sec LOG: external sort ended, 855306 disk blocks used: CPU 22.16s/516.86u sec elapsed 541.35 sec LOG: external sort ended, 855306 disk blocks used: CPU 21.66s/513.59u sec elapsed 538.00 sec LOG: external sort ended, 855306 disk blocks used: CPU 22.56s/499.63u sec elapsed 525.53 sec LOG: external sort ended, 1710613 disk blocks used: CPU 45.00s/1062.26u sec elapsed 1118.52 sec LOG: external sort ended, 1710613 disk blocks used: CPU 44.42s/1061.33u sec elapsed 1117.27 sec LOG: external sort ended, 1710613 disk blocks used: CPU 44.47s/1064.93u sec elapsed 1118.79 sec For example, the 50 million tuple test has over 8% of its runtime shaved off. This seems to be a consistent pattern. Note that only the writing of tuples uses prefetching here, because that happens to be the only affected codepath for prefetching (note also that this is the slightly different, external-specific version of the patch). I hesitate to give that up, although it is noticeable that it matters less at higher scales, where we're bottlenecked on quicksorting itself, more so than writing. Those costs grow at different rates, of course. Perhaps we can consider more selectively applying prefetching in the context of writing out tuples. After all, the amount of useful work that we can do pending fetching from memory ought to be more consistent and manageable, which could make it a consistent win. I will need to think about this some more. -- Peter Geoghegan
pgsql-hackers by date: