Re: Abbreviated keys for Numeric - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Abbreviated keys for Numeric |
Date | |
Msg-id | CAM3SWZQ_ZyuSjNwXYvdvAoW8otTFhXYs7jQ1HLnZtuHYXzoMzg@mail.gmail.com Whole thread Raw |
In response to | Re: Abbreviated keys for Numeric (Andrew Gierth <andrew@tao11.riddles.org.uk>) |
Responses |
Re: Abbreviated keys for Numeric
|
List | pgsql-hackers |
Attached is a revision of this patch, that I'm calling v2. What do you think, Andrew? I've cut the int32 representation/alternative !USE_FLOAT8_BYVAL encoding scheme entirely, which basically means that 32-bit systems don't get to have this optimization. 64-bit systems have been commonplace now for about a decade. This year, new phones came out with 64-bit architectures, so increasingly even people that work with embedded systems don't care about 32-bit. I'm not suggesting that legacy doesn't matter - far from it - but I care much less about having the latest performance improvements on what are largely legacy systems. Experience suggests that this is a good time of the cycle to cut scope. The last commitfest has a way of clarifying what is actually important. It seems unwise to include what is actually a fairly distinct encoding scheme, which the int32/ !USE_FLOAT8_BYVAL variant really was (the same can't really be said for text abbreviation, since that can basically work the same way on 32-bit systems, with very little extra code). This isn't necessarily the right decision in general, but I feel it's the right decision in the present climate of everyone frantically closing things out, and feeling burnt out. I'm sorry that I threw some of your work away, but since we both have other pressing concerns, perhaps this is understandable. It may be revisited, or I may lose the argument on this point, but going this way cuts the code by about 30%, and makes me feel a lot better about the risk of regressing marginal cases, since I know we always have 8 bytes to work with. There might otherwise be a danger of regressing under tested 32-bit platforms, or indeed missing other bugs, and frankly I don't have time to think about that right now. Other than that, I've tried to keep things closer to the text opclass. For example, the cost model now has a few debugging traces (disabled by default). I have altered the ad-hoc cost model so that it no longer concerns itself with NULL inputs, which seemed questionable (not least since the abbreviation conversion function isn't actually called for NULL inputs. Any attempt to track the full count within numeric code therefore cannot work.). I also now allocate a buffer of scratch memory separately from the main sortsupport object - doing one allocation for all sortsupport state, bunched together as a buffer seemed like a questionable micro-optimization. For similar reasons, I avoid playing tricks in the VARATT_IS_SHORT() case -- my preferred approach to avoiding palloc()/pfree() cycles is to simply re-use the same buffer across calls to numeric_abbrev_convert(), and maybe risk having to enlarge the relatively tiny buffer once or twice. In other words, it works more or less the same way as it does with text abbreviation. It seemed unwise to silently disable abbreviation when someone happened to build with DEC_DIGITS != 4. A static assertion now gives these unusual cases the opportunity to make an informed decision about either disabling abbreviation or not changing DEC_DIGITS in light of the performance penalty, in a self-documenting way. The encoding scheme is unchanged. I think that your conclusions on those details were sound. Numeric abbreviation has a more compelling cost/benefit ratio than even that of text. I easily managed to get the same 6x - 7x improvement that you reported when sorting 10 million random numeric rows. Thanks -- Peter Geoghegan
Attachment
pgsql-hackers by date: