Home > mailing lists

Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
Date	December 22, 2015 17:31:39
Msg-id	CA+TgmoY7LdpFSanui2PawqZyCe6TxCL47Mb3yOtnSE=gfXyohw@mail.gmail.com Whole thread Raw
In response to	Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates (Peter Geoghegan <pg@heroku.com>)
Responses	Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
List	pgsql-hackers

Tree view

On Mon, Dec 21, 2015 at 2:55 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Mon, Dec 21, 2015 at 10:53 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> PFA my proposal for comment changes for 9.5 and master.  This is based
>> on your 0001, but I edited somewhat.  Please let me know your
>> thoughts.  I am not willing to go further and rearrange actual code in
>> 9.5 at this point; it just isn't necessary.
>
> Fine by me. But this revision hasn't made the important point at all
> -- which is that 0002 is safe. That's a stronger guarantee than the
> abbreviated key representation being pass-by-value.

Right.  I don't think that we should back-patch that stuff into 9.5.

>> Looking at this whole system again, I wonder if we're missing a trick
>> here.  How about if we decree from on high that the abbreviated-key
>> comparator is always just the application of an integer comparison
>> operator?  The only abbreviation function that we have right now that
>> wouldn't be happy with that is the one for numeric, and that looks
>> like it could be fixed.
>
> I gather you mean that we'd decree that they were always compared
> using a text or uuid style 3-way unsigned integer comparison, allowing
> us to hardcode that.

Right.

>> Then we could hard-code knowledge of this
>> representation in tuplesort.c in such a way that it wouldn't need to
>> call a comparator function at all - instead of doing
>> ApplySortComparator() and then maybe ApplySortAbbrevFullComparator(),
>> it would do something like:
>>
>> if (using_abbreviation && (compare = ApplyAbbrevComparator(...)) != 0)
>>     return compare;
>>
>> I'm not sure if that would save enough cycles vs. the status quo to be
>> worth bothering with, but it seems like it might.  You may have a
>> better feeling for that than I do.
>
> I like this idea. I thought of it myself already, actually.
>
> It has some nice properties, because unsigned integers are so simple
> and flexible. For example, we can do it with int4 sometimes, and then
> compare two attributes at once on 64-bit platforms. Maybe the second
> attribute (the second set of 4 bytes) isn't an int4, though -- it's
> any other type's abbreviated key (which can be assumed to have a
> compatible representation). That's one more advanced possibility.

Yikes.

> It seems worthwhile because numeric is the odd one out. Once I or
> someone else gets around to adding network types, which could happen
> in 9.6, then we are done (because there is already a pending patch
> that adds support for remaining character-like types and alternative
> opclasses). I really don't foresee much additional use of abbreviated
> keys beyond these cases. There'd be a small but nice boost by not
> doing pointer chasing for the abbreviated key comparison, I imagine,
> but that benefit would be felt everywhere. Numeric is the odd one out
> currently, but as you say it can be fixed, and I foresee no other
> trouble from any other opclass/type that cares about abbreviation.
>
> Another issue is that abbreviated keys in indexes are probably not
> going to take 8 bytes, because they'll go in the ItemId array. It's
> likely to be very useful to be able to take the first two bytes, and
> know that a uint16 comparison is all that is needed, and have a basis
> to perform an interpolation search rather than a binary search
> (although that needs to be adaptive to different distributions, I
> think it could be an effective technique -- there are many cache
> misses for binary searches on internal B-Tree pages).

Hmm, that seems iffy to me.  There are plenty of data sets where 8
bytes is enough to get some meaningful information about what part of
the keyspace you're in, but 2 bytes isn't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Robert Haas
Date: 22 December 2015, 17:13:38
Subject: Re: Speed up Clog Access by increasing CLOG buffers

From: Robert Haas
Date: 22 December 2015, 17:34:31
Subject: Re: Possible marginally-incompatible change to array subscripting

Re: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates - Mailing list pgsql-hackers

Previous

Next