Pie-in-sky dreaming about reworking tuple layout entirely - Mailing list pgsql-hackers
From | Gregory Stark |
---|---|
Subject | Pie-in-sky dreaming about reworking tuple layout entirely |
Date | |
Msg-id | 87irj16umm.fsf@enterprisedb.com Whole thread Raw |
Responses |
Re: Pie-in-sky dreaming about reworking tuple layout entirely
|
List | pgsql-hackers |
I can't shake the feeling that merely tweaking the way our varlenas work with a shortvarlena or with compressed varlena headers is missing the real source of our headaches. It seems very strange to me to be trying to step through a tuple with length bits at the head of every field. It's a lot of work spent dealing with a terribly inconvenient format when we can pick the format to be whatever we like. If we were designing this from scratch I don't think would have picked this approach at all. We would instead store offsets to every field whose offset isn't static in the header so we can jump straight to a field without having to look at all the previous fields at all. I feel this would be far to invasive a change for me to contemplate currently though. I don't know enough about what consequences it would have. I just wanted to get the idea out there in case sometime down the line, perhaps even not for several releasees, I might be prepared or someone else might be prepared to tackle such a project. It might make a lot of things cleaner and easier such as dealing with user defined fixed length fields whose length depends on the typmod. What I picture would be basically be an array of offsets into the tuple. A field would be the data from its offset to the next. We would have to store an offset to the end of the tuple as well, and probably another field with the length of the array so we can find the end quickly without counting bits in the null bitmap. I also picture a kind of compression that store offsets modulo 256 as 1-byte offsets with a separate list of where the 256 byte boundaries are. That means we would probably still need a heap_deform tuple that steps through the list of offsets, null bitmap, and 256-boundary list, and builds a in-memory list of pointers to the fields with nulls for the null values. Actually for the in-memory storage instead of a simple array of pointers I picture an array of structs where each struct contains a pointer to the field, a 4-byte length value, and the typmod (for a total of 12 bytes on a 32-bit platform and 16 bytes on a 64-bit platform). Then instead of passing a Datum directly to functions we would pass one of these structs (possibly by reference?). That means no function would ever have to store the length of a field internal to the field itself, and it could always depend on having the typmod available. In real terms this actually doesn't reduce space usage, in fact it probably increases it. It's effectively storing a 1-byte header for every field even if it's something like an integer. But you could imagine some further refinements. If you store all the non-varlena fields first then you can just start the list of offsets at the first varlena. heap_deform tuple could step through the tupldescriptor and null bitmap first setting up the offsets to any field with an attlen without the help of any offsets. (Incidentally, to do that there's a tricky bit. You have to be able to put new fixed length fields before varlena fields even if they're added later. But that means you need to know they're missing on the old records so you can't insert them out of order in the null bitmap. So the null bitmap would have to be in the order they were added, and even if that's not the order they appear physically or the order they appear logically. Even more confusing than the original plan that got shot down as too confusing already.) The main inefficiency in this is it would mean passing 3x as much data (only 2x as much data on 64-bit platforms) to functions. But given the way functions are called with a pointer to 100-element arrays it doesn't seem like that should be a big concern. The upside is that data types could use different size storage for different typmods without having to define whole new types based on internal implementation details the user doesn't know about. NUMERIC could internally use integer for some typmods and just perform integer arithmetic. It wouldn't have to reserve any bits to indicate the scale or precision of the number since the typmod is available whenever it's processing an element from the table and it's passed around with the field by the system. There's no security concern because the system is responsible for passing the typmod, not the caller. Only C code would be able to construct arbitrary typmods; SQL or Pl/PgSQL would have to use casts to safely set the typmod on their output. On advantage of passing around this meta information automatically is that it obviates the need for many core data types whose main advantage is that they use more efficient storage than is possible with domains or user defined types. The recently discussed UUID type, for example, offers nothing that couldn't be implemented with a user-defined type using bytea storage internally -- except for the fact that it doesn't store an extra 4 bytes of length header in every value. Another advantage is that we wouldn't need a whole slew of functions in the catalogs that do the same thing slightly differently depending on the input type. Currently each one has to be declared separately since there's no way to know what type of datum they've been passed. I suspect something like this will be necessary when it comes to implementing per-column encoding anyways. Unless we adopt some sort of universal encoding as the "preferred" database encoding and always convert to and from it, we'll need to pass around the encoding of any text datum along with the datum too. Well I'm happier having gotten these ideas down. Even if, as I said, I have no plans to implement these ideas in the near future. It would mean totally rewriting some pretty hair internal parts that I'm only starting to have a basic understanding of. I am planning to experiment with the shortvarlena form for shortvarchar et al but I hope that's something we only use for a few versions. I still hope we move to something like the above someday instead. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: