Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates |
Date | |
Msg-id | CA+TgmoY7LdpFSanui2PawqZyCe6TxCL47Mb3yOtnSE=gfXyohw@mail.gmail.com Whole thread Raw |
In response to | Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates (Peter Geoghegan <pg@heroku.com>) |
Responses |
Re: Re: Reusing abbreviated keys during second pass of
ordered [set] aggregates
Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates |
List | pgsql-hackers |
On Mon, Dec 21, 2015 at 2:55 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Mon, Dec 21, 2015 at 10:53 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> PFA my proposal for comment changes for 9.5 and master. This is based >> on your 0001, but I edited somewhat. Please let me know your >> thoughts. I am not willing to go further and rearrange actual code in >> 9.5 at this point; it just isn't necessary. > > Fine by me. But this revision hasn't made the important point at all > -- which is that 0002 is safe. That's a stronger guarantee than the > abbreviated key representation being pass-by-value. Right. I don't think that we should back-patch that stuff into 9.5. >> Looking at this whole system again, I wonder if we're missing a trick >> here. How about if we decree from on high that the abbreviated-key >> comparator is always just the application of an integer comparison >> operator? The only abbreviation function that we have right now that >> wouldn't be happy with that is the one for numeric, and that looks >> like it could be fixed. > > I gather you mean that we'd decree that they were always compared > using a text or uuid style 3-way unsigned integer comparison, allowing > us to hardcode that. Right. >> Then we could hard-code knowledge of this >> representation in tuplesort.c in such a way that it wouldn't need to >> call a comparator function at all - instead of doing >> ApplySortComparator() and then maybe ApplySortAbbrevFullComparator(), >> it would do something like: >> >> if (using_abbreviation && (compare = ApplyAbbrevComparator(...)) != 0) >> return compare; >> >> I'm not sure if that would save enough cycles vs. the status quo to be >> worth bothering with, but it seems like it might. You may have a >> better feeling for that than I do. > > I like this idea. I thought of it myself already, actually. > > It has some nice properties, because unsigned integers are so simple > and flexible. For example, we can do it with int4 sometimes, and then > compare two attributes at once on 64-bit platforms. Maybe the second > attribute (the second set of 4 bytes) isn't an int4, though -- it's > any other type's abbreviated key (which can be assumed to have a > compatible representation). That's one more advanced possibility. Yikes. > It seems worthwhile because numeric is the odd one out. Once I or > someone else gets around to adding network types, which could happen > in 9.6, then we are done (because there is already a pending patch > that adds support for remaining character-like types and alternative > opclasses). I really don't foresee much additional use of abbreviated > keys beyond these cases. There'd be a small but nice boost by not > doing pointer chasing for the abbreviated key comparison, I imagine, > but that benefit would be felt everywhere. Numeric is the odd one out > currently, but as you say it can be fixed, and I foresee no other > trouble from any other opclass/type that cares about abbreviation. > > Another issue is that abbreviated keys in indexes are probably not > going to take 8 bytes, because they'll go in the ItemId array. It's > likely to be very useful to be able to take the first two bytes, and > know that a uint16 comparison is all that is needed, and have a basis > to perform an interpolation search rather than a binary search > (although that needs to be adaptive to different distributions, I > think it could be an effective technique -- there are many cache > misses for binary searches on internal B-Tree pages). Hmm, that seems iffy to me. There are plenty of data sets where 8 bytes is enough to get some meaningful information about what part of the keyspace you're in, but 2 bytes isn't. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: