Re: Progress on fast path sorting, btree index creation time - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Progress on fast path sorting, btree index creation time |
Date | |
Msg-id | CA+TgmoYOsN3ZLmJCY1ttA6-WwtC6PfOXrFtxxpts=TD9HxQCYw@mail.gmail.com Whole thread Raw |
In response to | Re: Progress on fast path sorting, btree index creation time (Peter Geoghegan <peter@2ndquadrant.com>) |
Responses |
Re: Progress on fast path sorting, btree index creation time
Re: Progress on fast path sorting, btree index creation time |
List | pgsql-hackers |
On Thu, Feb 9, 2012 at 9:37 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote: > On 9 February 2012 13:50, Robert Haas <robertmhaas@gmail.com> wrote: >> I'm also more than slightly concerned that we are losing sight of the >> forest for the trees. I have heard reports that sorting large amounts >> of data is many TIMES slower for us than for a certain other database >> product. I frankly wonder if this entire patch is a distraction. >> When inlining is the tool of choice for further performance >> improvement, it suggests that we're pretty much out of ideas, and that >> whatever we get out of inlining - be it 6% or 30% - will be the end. >> I don't like that idea very much. > > If you're talking about the product I think you're talking about, > well, their contracts forbid any actual benchmarks, so we won't have > any hard figures, but I find it intuitively difficult to believe that > the difference is that large, mostly because I've demonstrated that > the performance of qsort is a pretty good proxy for the performance of > a query like "select * from foo order by bar", and qsort can't be too > far from optimal. Besides, didn't you yourself say "I've been > skeptical of this patch for a number of reasons, the major one of > which is that most workloads spend only a very small amount of time in > doing qucksorts"? Emphasis on "large" amounts of data. I've said from the beginning that the small-sort case is less interesting to me: it's nice to make it faster, but nobody's going to quit using PostgreSQL because a quicksort takes 8ms instead of 6ms. It's when we start talking about operations that take minutes or hours that the differences become material. > I have a new plan. I think you'll like it: > > * drop the type specific specialisations entirely. Check. > * drop qsort_arg (the original, unspecialised version). We now have > exactly the same number of source code copies of qsort as before: 2. This may be useful elsewhere some day, but I suppose we can always pull it back out of git if we need it. > * gut the comparison of the leading sort key only from > comparetup_heap, comparetup_index_btree, etc (I collectively refer to > these functions as tuple-class encapsulating comparators, distinct > from their comparator proper). > > * Instantiate two specialisations of qsort_arg: single key and > multiple key (exactly the same number as has already been agreed to, > since we lost the generic version). However, there is one key > difference now: we call one of the tuple class encapsulating functions > through a function pointer if the first comparison gives equality - . > Early indications are that this is almost as much of a win as the > single-key case. So the code effectively does this (but through a > function pointer): > > #define MULTI_ADDITIONAL_CODE(COMPAR) \ > { \ > return comparetup_heap(aT, bT, state); \ > } > > This works because most comparisons won't actually need to look at the > second key, and because the cut-down tuple-class encapsulating > comparator that is used in the last revision of the patch doesn't > actually make any assumption about the particular tuple-class > encapsulating comparator in use. > > It may not even be worth-while having a separate copy of the qsort > function for the multiple key case, so the question of binary bloat > becomes entirely irrelevant, as there would be a net gain of zero > copies of qsort_arg. I'm not sure I entirely follow all this, but I'll look at the code once you have it. Are you saying that all the comparetup_yadda functions are redundant to each other in the single-key case? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: