Re: Progress on fast path sorting, btree index creation time - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Progress on fast path sorting, btree index creation time |
Date | |
Msg-id | CA+TgmobRRiZ7H1OuQtFnFS=_cwtJUxMMyL3mcN6-uw8X5F4oWA@mail.gmail.com Whole thread Raw |
In response to | Re: Progress on fast path sorting, btree index creation time (Jay Levitt <jay.levitt@gmail.com>) |
Responses |
Re: Progress on fast path sorting, btree index creation
time
Re: Progress on fast path sorting, btree index creation time Re: Progress on fast path sorting, btree index creation time |
List | pgsql-hackers |
On Tue, Feb 7, 2012 at 2:39 PM, Jay Levitt <jay.levitt@gmail.com> wrote: > Jim "Decibel!" Nasby wrote: >> >> I agree that it's probably pretty unusual to index floats. > > FWIW: Cubes and points are floats, right? So would spatial indexes benefit > from this optimization, or is it only raw floats? Cubes are not floats, nor are points. In general, what's being proposed here is to make a separate copy of quicksort for each of N datatypes, with the comparator function inlined. In order for this to benefit multiple types, they'd have to use the same set of machine instructions to compare values on disk. So in general you can make this apply to as many datatypes as you want: it's just that each one will require another copy of the quicksort code in the binary. The more complex the comparator is, the less you'll save, because the savings presumably come largely from things like saving and resoring registers and pushing/popping stack frames, and that's really only going to be material if the underlining comparator is pretty darn cheap. Actually, to be more precise, the patch is proposing to create TWO copies of the quicksort code for each of N datatypes, one for the case where there is but a single sort key and the other for the case where there are multiple sort keys. Having a separate code for the single sort key case saves more because it lets you get rid of a control loop - you don't have to iterate down the list of keys, because there's only one. I've been skeptical of this patch for a number of reasons, the major one of which is that most workloads spend only a very small amount of time in doing qucksorts, and most quicksorts are of very small amounts of data and therefore fast anyway. It is easy to construct an artificial test case that does lots and lots of in-memory sorting, but in real life I think that's not the great part of what people use databases for. The time gets spent on things like groveling through big tables or doing joins, and then maybe we sort the rows at the end, but that's not where most of the time goes. It may be rightly pointed out, of course, that a penny saved is a penny earned: the fact that most people don't spend much time quicksorting doesn't mean that we shouldn't try to make quicksorting as fast as possible. But there are a couple of additional considerations. First, the code is, at present, uglier than I would like. I mean to spend some time trying to clean that up, but there are 102 patches in this CommitFest (the number seems to keep slowly growing, despite the deadline being three weeks gone) and this isn't the only one I care about, so I haven't quite gotten there yet. But hopefully soon. Second, there's a concern about binary bloat: duplicating lots of code with different comparators inlined generates, well, a lot of machine code. Of course, an 0.8% increase in the size of the resulting binary is very unlikely to produce any measurable performance regression on its own, but that's not really the issue. The issue is rather that we could easily go nuts applying this technique in lots of different places - or even just in one place to lots of different types. Let's say that we find that even for types with very complex comparators, we can get a 5% speedup on quick-sorting speed using this technique. Should we, then, apply the technique to every type we have? Perhaps the binary would grow by, I don't know, 10%. Maybe that's still not enough to start hurting performance (or making packagers complain), but surely it's getting there, and what happens when we discover that similar speedups are possible using the same trick in five other scenarios? I think it's a bit like going out to dinner and spending $50. It's nicer than eating at home, and many people can afford to do it occasionally, but if you do it every night, it gets expensive (and possibly fattening). So we need some principled way of deciding how much inlining is reasonable, because I am 100% certain this is not going to be the last time someone discovers that a massive exercise in inlining can yield a nifty performance benefit in some case or another: index builds and COPY have already been mentioned on this thread, and I expect that people will find other cases as well. I'm not really sure what the "budget" is - i.e. how much binary bloat we can afford to add - or how many cases there are that can benefit, but the first isn't infinite and the second is more than the first. Having said all that, I am inclined to commit at least some portion of this, but I wanted to knock off a few other patches that have been lingering for a while first. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: