Re: Fixed length data types issue - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Fixed length data types issue |
Date | |
Msg-id | 6043.1157937411@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Fixed length data types issue (Gregory Stark <gsstark@mit.edu>) |
Responses |
Re: Fixed length data types issue
Re: Fixed length data types issue Re: Fixed length data types issue Re: Fixed length data types issue |
List | pgsql-hackers |
Gregory Stark <gsstark@mit.edu> writes: > I'm a bit confused by this and how it would be handled in your sketch. I > assumed we needed a bit pattern dedicated to 4-byte length headers because > even though it would never occur on disk it would be necessary to for the > uncompressed and/or detoasted data. > In your scheme what would PG_GETARG_TEXT() give you if the data was detoasted > to larger than 16k? I'm imagining that it would give you the same old uncompressed in-memory representation as it does now, ie, 4-byte length word and uncompressed data. The weak spot of the scheme is that it assumes different, incompatible in-memory and on-disk representations. This seems to require either (a) coercing values to in-memory form before they ever get handed to any datatype manipulation function, or (b) thinking of some magic way to pass out-of-band info about the contents of the datum. (b) is the same stumbling block we have in connection with making typmod available to datatype manipulation functions. I don't want to reject (b) entirely, but it seems to require some pretty major structural changes. OTOH (a) is not very pleasant either, and so what would be nice is if we could tell by inspection of the Datum alone which format it's in. After further thought I have an alternate proposal that does that, but it's got its own disadvantage: it requires storing uncompressed 4-byte length words in big-endian byte order everywhere. This might be a showstopper (does anyone know the cost of ntohl() on modern Intel CPUs?), but if it's not then I see things working like this: * If high order bit of datum's first byte is 0, then it's an uncompressed datum in what's essentially the same as our current in-memory format except that the 4-byte length word must be big-endian (to ensure that the leading bit can be kept zero). In particular this format will be aligned on 4- or 8-byte boundary as called for by the datatype definition. * If high order bit of first byte is 1, then it's some compressed variant. I'd propose divvying up the code space like this: * 0xxxxxxx uncompressed 4-byte length word as stated above* 10xxxxxx 1-byte length word, up to 62 bytes of data* 110xxxxx 2-byte length word, uncompressed inline data* 1110xxxx 2-byte length word, compressed inline data* 1111xxxx 1-bytelength word, out-of-line TOAST pointer This limits us to 8K uncompressed or 4K compressed inline data without toasting, which is slightly annoying but probably still an insignificant limitation. It also means more distinct cases for the heap_deform_tuple inner loop to think about, which might be a problem. Since the compressed forms would not be aligned to any boundary, there's an important special case here: how can heap_deform_tuple tell whether the next field is compressed or not? The answer is that we'll have to require pad bytes between fields to be zero. (They already are zeroed by heap_form_tuple, but now it'd be a requirement.) So the algorithm for decoding a non-null field is: * if looking at a byte with high bit 0, then we are eitheron the start of an uncompressed field, or on a pad byte beforesucha field. Advance to the declared alignment boundary forthe datatype, read a 4-byte length word, and proceed. * if looking at a byte with high bit 1, then we are at thestart of a compressed field (which will never have any precedingpadbytes). Decode length as per rules above. The good thing about this approach is that it requires zero changes to fundamental system structure. The pack/unpack rules in heap_form_tuple and heap_deform_tuple change a bit, and the mechanics of PG_DETOAST_DATUM change, but a Datum is still just a pointer and you can always tell what you've got by examining the pointed-to data. regards, tom lane
pgsql-hackers by date: