Short varlena header bit-packing options - Mailing list pgsql-hackers
From | Gregory Stark |
---|---|
Subject | Short varlena header bit-packing options |
Date | |
Msg-id | 87ejoprit4.fsf@stark.xeocode.com Whole thread Raw |
List | pgsql-hackers |
I've sent a message on the subject of 2-byte headers and choice of bitpatterns twice during the mail outage last night. They may come through eventually but please ignore them as the situation has changed as new evidence has come to light. I was previously doing the second of the two bitpatterns mentioned in Tom's email: Gregory Stark <stark@enterprisedb.com> writes: > In any case it seems a bit backwards to me. Wouldn't it be better to > preserve bits in the case of short length words where they're precious > rather than long ones? If we make 0xxxxxxx the 1-byte case it means ... Well, I don't find that real persuasive: you're saying that it's important to have a 1-byte not 2-byte header for datums between 64 and 127 bytes long. Which is by definition less than a 2% savings for those values. I think its's more important to pick bitpatterns that reduce the number of cases heap_deform_tuple has to think about while decoding the length of a field --- every "if" in that inner loop is expensive. I realized this morning that if we are going to preserve the rule that 4-byte-header and compressed-header cases can be distinguished from the data alone, there is no reason to be very worried about whether the 2-byte cases can represent the maximal length of an in-line datum. If you want to do 16K inline (and your page is big enough for that) you can just fall back to the 4-byte-header case. So there's no real disadvantage if the 2-byte headers can only go up to 4K or so. This gives us some more flexibility in the bitpattern choices. Another thought that occurred to me is that if we preserve the convention that a length word's value includes itself, then for a 1-byte header the bit pattern 10000000 is meaningless --- the count has to be at least 1. So one trick we could play is to take over this value as the signal for "toast pointer follows", with the assumption that the tuple-decoder code knows a-priori how big a toast pointer is. I am not real enamored of this, because it certainly adds one case to the inner heap_deform_tuple loop and it'll give us problems if we ever want more than one kind of toast pointer. But it's a possibility. Anyway, a couple of encodings that I'm thinking about now involve limiting uncompressed data to 1G (same as now), so that we can play with the first 2 bits instead of just 1: 00xxxxxx 4-byte length word, aligned, uncompressed data (up to 1G) 01xxxxxx 4-byte length word, aligned, compressed data (up to 1G) 100xxxxx 1-byte length word, unaligned, TOAST pointer 1010xxxx 2-byte length word, unaligned, uncompressed data (up to 4K) 1011xxxx 2-byte length word, unaligned, compressed data (up to 4K) 11xxxxxx 1-byte length word, unaligned, uncompressed data (up to 63b) or 00xxxxxx 4-byte length word, aligned, uncompressed data (up to 1G) 010xxxxx 2-byte length word, unaligned, uncompressed data (up to 8K) 011xxxxx 2-byte length word, unaligned, compressed data (up to 8K) 10000000 1-byte length word, unaligned, TOAST pointer 1xxxxxxx 1-byte length word, unaligned, uncompressed data (up to 127b) (xxxxxxx not all zero) This second choice allows longer datums in both the 1-byte and 2-byte header formats, but it hardwires the length of a TOAST pointer and requires four cases to be distinguished in the inner loop; the first choice only requires three cases, because TOAST pointer and 1-byte header can be handled by the same rule "length is low 6 bits of byte". The second choice also loses the ability to store in-line compressed data above 8K, but that's probably an insignificant loss. There's more than one way to do it ... regards, tom lane There had been some mention on the list about ditching the 2-byte cases which I didn't understand because I don't think they add any overhead or complexity to the critical heap_form/deform_tuple code or much complexity to the typedef or macro interface. However there's another factor I hadn't considered. It becomes quite awkward in tuptoaster.c to use pg_lzcompress. Since that generates a regular 4-byte datum and then tuptoaster needs to convert it to a 2-byte datum. Worse, if it's over 4kB it *can't*. It's going to push it out as an external toasted value but how does it represent it in the meantime? This could all be fixed, it's just a SMOP to restructure tuptoaster.c to keep things in regular datums with an array of flags somewhere else indicating which have been compressed right up until the moment before it reruns heap_form_tuple. But I just punted for now. We can do this in a second patch and rejigger the bitpatterns selected at that time. It doesn't look like everyone's convinced we want them anyways. So this is the set of bitpatterns I'm working with now: 00xxxxxx 4-byte length word, aligned, uncompressed data (up to 1G) 01xxxxxx 4-byte length word, aligned, compressed data (up to 1G) 10000000 1-byte length word, unaligned, TOAST pointer 1xxxxxxx 1-byte length word, unaligned, uncompressed data (up to 127b) (xxxxxxx not all zero) There's no code space to indicate if the out-of-line TOAST pointer is compressed or not. We could just compare the rawsize with the extsize since the toaster guarantees not to compress if there's no gain. Alternatively because of padding we have 24 bits to play with inside the toast pointer. We could set a flag in there. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: