Re: Using quicksort for every external sort run - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Using quicksort for every external sort run |
Date | |
Msg-id | CAM3SWZSsgVvriLsaPYUec-iSGfHzMXsWrnQ4ZhUTtBaGjRbWBg@mail.gmail.com Whole thread Raw |
In response to | Re: Using quicksort for every external sort run (Simon Riggs <simon@2ndQuadrant.com>) |
Responses |
Re: Using quicksort for every external sort run
Re: Using quicksort for every external sort run Re: Using quicksort for every external sort run |
List | pgsql-hackers |
On Tue, Nov 24, 2015 at 3:32 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > My feeling is that numbers rarely speak for themselves, without LSD. (Which > numbers?) Guffaw. > How are we doing here? Keen to see this work get committed, so we can move > onto parallel sort. What's the summary? I showed a test case where a CREATE INDEX sort involving 5 runs and a merge only took about 18% longer than an equivalent fully internal sort [1] using over 5 times the memory. That's about 2.5X faster than the 9.5 performance on the same system with the same amount of memory. Overall, the best cases I saw were the original "quicksort with spillover" cases [2]. They were just under 4X faster. I care about that less, though, because that will happen way less often, and won't help with larger sorts that are even more CPU bound. There is a theoretical possibility that this is slower on systems where multiple merge passes are required as a consequence of not having runs as long as possible (due to not using replacement selection heap). That will happen very infrequently [3], and is very probably still worth it. So, the bottom line is: This patch seems very good, is unlikely to have any notable downside (no case has been shown to be regressed), but has yet to receive code review. I am working on a new version with the first two commits consolidated, and better comments, but that will have the same code, unless I find bugs or am dissatisfied. It mostly needs thorough code review, and to a lesser extent some more performance testing. Parallel sort is very important. Robert, Amit and I had a call about this earlier today. We're all in agreement that this should be extended in that direction, and have a rough idea about how it ought to fit together with the parallelism primitives. Parallel sort in 9.6 could certainly happen -- that's what I'm aiming for. I haven't really done preliminary research yet; I'll know more in a little while. > How about we commit it with a sort_algorithm = 'foo' parameter so we can > compare things before release of 9.6? I had a debug GUC (like the existing one to disable top-N heapsorts) that disabled "quicksort with spillover". That's almost the opposite of what you're asking for, though, because that makes us never use a heap. You're asking for me to write a GUC to always use a heap. That's not a good way of testing this patch, because it's inconvenient to consider the need to use a heap beyond the first run (something that now exists solely for the benefit of "quicksort with spillover"; a heap will often never be used even for the first run). Besides, the merge optimization is a big though independent part of this, and doesn't make sense to control with the same GUC. If I haven't gotten this right, we should not commit the patch. If the patch isn't superior to the existing approach in virtually every way, then there is no point in making it possible for end-users to disable with messy GUCs -- it should be reverted. [1] Message: http://www.postgresql.org/message-id/CAM3SWZRiHaF7jdf923ZZ2qhDJiErqP5uU_+JPuMvUmeD0z9fFA@mail.gmail.com Attachment: http://www.postgresql.org/message-id/attachment/39660/quicksort_external_test.txt [2] http://www.postgresql.org/message-id/CAM3SWZTzLT5Y=VY320NznAyz2z_em3us6x=7rXMEUma9Z9yN6Q@mail.gmail.com [3] http://www.postgresql.org/message-id/CAM3SWZTX5=nHxPpogPirQsH4cR+BpQS6r7Ktax0HMQiNLf-1qA@mail.gmail.com -- Peter Geoghegan
pgsql-hackers by date: