Progress on fast path sorting, btree index creation time - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Progress on fast path sorting, btree index creation time |
Date | |
Msg-id | CAEYLb_VkoXJkG4Ehb=ah2yK6cqSCZLb-YHUCOuFNxgT7hENrrA@mail.gmail.com Whole thread Raw |
Responses |
Re: Progress on fast path sorting, btree index creation time
Re: Progress on fast path sorting, btree index creation time |
List | pgsql-hackers |
I'll try and keep this terse. I've promised to justify the number of specialisations that are in my fast-path sorting patch, and I may yet conclude that a reduction is appropriate. Not today though - there are quite a few ideas in the patch (even more now), and not all have been exhaustively evaluated. Maybe that isn't what some had hoped for right now, but deferring publicly and definitively answering those questions to focus on tweaking the patch has brought additional benefits. I felt that I went too long without checking in with the community though. The purpose of this e-mail is to: 1. Report back a further improvement in the performance benefits seen when sorting with multiple sortkeys. The performance benefits seen for that case were previously relatively modest; they're better now. 2. Report improvements from applying these techniques to btree index tuplesorting (short version: they're also quite good). The data used is exactly the same as it was in my previous benchmark; orderlines is 1538MB, and has lots of duplicates. The environment is also identical. 3. Resolve two anticipated controversies that are, respectively, somewhat orthogonal and completely orthogonal to the binary bloat controversy. The first (controversy A) is that I have added a new piece of infrastructure, pg_always_inline, which, as the name suggests, is a portable way of insisting that a function should be invariably inlined. Portable libraries like libc have been doing this for a long time now, and I actually use the libc macro (that expands to __attribute__((always_inline)) ) where possible. The second possible point of contention (controversy B) is that I have jettisoned various protections against bad qsort implementations that I believe are a legacy of when we used the system qsort pre-2006, that can no longer be justified. For example, quick sort performing badly in the face of lots of duplicates is a well understood problem today (partitioning should stop on equal keys), and ISTM that protecting against that outside the qsort implementation (but only for index sorts) is wrong-headed. The first possibly controversial adjustment (controversy A) is clearly also the reason for (1) - that has been well isolated. I haven't got around to quantifying the performance improvement seen due to the second possibly controversial adjustment (controversy B), but I believe that it can be justified as a case of effectively removing redundant code anyway. After all, we now assert against comparing a tuple to itself anyway, and the "cheap insurance" that existed in comparetup_index_btree was never present in any form in comparetup_index_heap, and we heard no complaints, AFAIK. Attached are: * A detailing of libc's use of __always_inline / __attribute__((always_inline)). Note that it often appears in code that is built for all platforms. * The WIP patch itself, rebased to integrate with the new SortSupport infrastructure. I've gone a bit crazy with btree specialisations, but I suspect that's where it matters least and makes most sense. * A new Python script for bench marking index creation, that is similar to the other one I previously posted for bench marking sorting heap tuples. * A spreadsheet that shows the results of re-running my earlier heap tuple sorting benchmark with this new patch. The improvement in the query that orders by 2 columns is all that is pertinent there, when considering the value of (1) and the sense in standing still for controversy A. * A spreadsheet that shows the difference in index creation times, generated with the help of the new python script. Thoughts? I had another idea when writing this patch that I haven't developed at all but I'll share anyway. That is, it might be a good idea to use a balance quicksort: http://xlinux.nist.gov/dads//HTML/balancedqsrt.html It might be possible to get a reasonable approximation of the actual median value of a given column with existing statistics, which could be hinted to qsort_arg. This would do a better job of guessing an appropriate initial pivot value for qsort than the med3 sampling technique (what we do now, advocated by Sedgewick: use the median of the first, middle and last elements of the current partition), though we'd still use that med3 technique to select all other pivots, and perhaps signal to qsort_arg "you're on your own, fall back of med3 for the initial pivot" with a null ptr, according to some heuristic. There's obviously a not inconsiderable impedance mismatch to resolve if we're to do that though, so that Tuplesortstate has a pointer to the median SortTuple. Can I get a straw poll on how much of a problem worst-case performance of qsort is believed to be? In a perfect world, if it were somehow possible to know the perfect pivots ahead of time from a histogram or something, we'd have a quick sort variant with worst-case performance of O(n log(n)). That, or the limited approximation that I've sketched would perhaps be worthwhile, even if it was subject to a number of caveats. Wikipedia claims that the worst case for quick sort is O(n log(n)) with the refinements recommended by Sedgewick's 1978 paper, but this seems like nonsense to me - the med3 technique is a good heuristic in practice, but it's perfectly possible in theory for it's sampling to always get things completely wrong (this is not an unfamiliar problem). How often it messes up in the real world and how much it matters is something that I wouldn't like to speculate on, though it is the main factor that will determine if the idea is worth pursuing. I can tell you that I have heard one person observe that it had an unpredictable runtime. However, they might not have noticed if this patch was applied, because in general the worse quicksort does the better this patch does. Also, the fact that we use a median of "medians" when n > 40 (Dr. Sedgewick again, if I'm not mistaken) makes me a little skeptical of that claim. Come to think of it, it might not be a bad idea to add a bunch of comments to qsort_arg while I have this swapped into my head, as it currently has no comments at all. While I'm thinking out loud, here's another idea: Have a GUC which, when enabled (perhaps potentially at various different granularities), makes Timsort the in-memory sorting algorithm, as it now is by default for Python, Java SE7 (for arrays) and the Android platform; for certain applications, this could be a real win, and I don't think that it would have to be invasive: it could be dropped in. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Attachment
pgsql-hackers by date: