Re: Using quicksort for every external sort run - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Using quicksort for every external sort run |
Date | |
Msg-id | CAM3SWZRY9vQx1P+YKPedKyNwd6cf4_KhDntfir=ReKW60C5L3g@mail.gmail.com Whole thread Raw |
In response to | Re: Using quicksort for every external sort run (Jeff Janes <jeff.janes@gmail.com>) |
Responses |
Re: Using quicksort for every external sort run
|
List | pgsql-hackers |
On Mon, Nov 30, 2015 at 9:51 AM, Jeff Janes <jeff.janes@gmail.com> wrote: >> As I said, it seems a little bit unfair to hand-tune work_mem or >> maintenance_work_mem like that. Who can afford to do that? I think you >> agree that it's untenable to have DBAs allocate work_mem differently >> for cases where an internal sort or external sort is expected; >> workloads are just far too complicated and changeable. > > Right, I agree with all that. But I think it is important to know > where the benefits come from. It looks like about half comes from > being more robust to overly-large memory usage, and half from absolute > improvements which you get at each implementations own best setting. > Also, if someone had previously restricted work_mem (or more likely > maintenance_work_mem) simply to avoid the large memory penalty, they > need to know to revisit that decision. Although they still don't get > any actual benefit from using too much memory, just a reduced penalty. Well, to be clear, they do get a benefit with much larger memory sizes. It's just that the benefit does not continue indefinitely. I agree with this assessment, though. > I'm kind of curious as to why the optimal for the patched code appears > at 1GB and not lower. If I get a chance to rebuild the test, I will > look into that more. I think that the availability of abbreviated keys (or something that allows most comparisons made by quicksort/the heap to be resolved at the SortTuple level) could make a big difference for things like this. Bear in mind that the merge phase has better cache characteristics when many attributes must be compared, and not mostly just leading attributes. Alphasort [1] merges in-memory runs (built with quicksort) to create on-disk runs for this reason. (I tried that, and it didn't help -- maybe I get that benefit from merging on-disk runs, since modern machines have so much more memory than in 1994). > It has a Perc H710 RAID controller with 15,000 RPM drives, but it is > also a virtualized system that has other stuff going on. The disks > are definitely better than your average household computer, but I > don't think they are anything special as far as real database hardware > goes. What I meant was that it's better than my laptop. :-) > What would be next in reviewing the patches? Digging into the C-level > implementation? Yes, certainly, but let me post a revised version first. I have improved the comments, and performed some consolidation of commits. Also, I am going to get a bunch of test results from the POWER7 system. I think I might see more benefits with higher maintenance_work_mem settings that you saw, primarily because my case can mostly just use abbreviated keys during the quicksort operations. Also, I find it very very useful that while (for example) your 3GB test case was slower than your 1GB test case, it was only 5% slower. I have a lot of hope that we can have a cost model for sizing an effective maintenance_work_mem for this reason -- the consequences of being wrong are really not that severe. It's unfortunate that we currently waste so much memory by blindly adhering to work_mem/maintenance_work_mem. This matters a lot more when we have parallel sort. [1] http://www.cs.berkeley.edu/~rxin/db-papers/alphasort.pdf -- Peter Geoghegan
pgsql-hackers by date: