Re: Fixed length data types issue - Mailing list pgsql-hackers
From | Martijn van Oosterhout |
---|---|
Subject | Re: Fixed length data types issue |
Date | |
Msg-id | 20060908140011.GG5479@svana.org Whole thread Raw |
In response to | Re: Fixed length data types issue (mark@mark.mielke.cc) |
Responses |
Re: Fixed length data types issue
Re: Fixed length data types issue |
List | pgsql-hackers |
On Fri, Sep 08, 2006 at 09:28:21AM -0400, mark@mark.mielke.cc wrote: > > But that won't help in the example you posted upthread, because char(N) > > is not fixed-length. > > It can be fixed-length, or at least, have an upper bound. If marked > up to contain only ascii characters, it doesn't, at least in theory, > and even if it is unicode, it's not going to need more than 4 bytes > per character. char(2) through char(16) only require 4 bits to > store the length header, leaving 4 bits for encoding information. > bytea(2) through bytea(16), at least in theory, should require none. If your talking about an upper-bound, then it's not fixed length anymore, and you need to expend bytes storing the length. ASCII bytes only take one byte in most encodings, include UTF8. Doodling this morning I remember why the simple approach didn't work. If you look at the varlena header, 2 bits are reserved. Say you take one bit to indicate "short header". Then lengths 0-31 bytes can be represented with a one byte header, yay! However, now you only have enough bits leftover to store 29 bits for the length, so we've just cut the maximum datum size from 1GB to 512MB. Is that a fair trade? Probably not, so you'd need a more sophisticated scheme. > For my own uses, I would like for bytea(16) to have no length header. > The length is constant. UUID or MD5SUM. Store the length at the head > of the table, or look up the information from the schema. I'm still missing the argument of why you can't just make a 16-byte type. Around half the datatypes in postgresql are fixed-length and have no header. I'm completely confused about why people are hung up about bytea(16) not being fixed length when it's trivial to create a type that is. > I see the complexity argument. Existing code is too heavy to change > completely. People talking about compromises such as allowing the > on disk layout to be different from the in memory layout. The biggest cost of having differing memory and disk layouts is that you have to "unpack" each disk page as it's read it. This means an automatic doubling of memory usage for the buffer cache. If you're RAM limited, that's the last thing you want. Currently, the executor will use the contents of the actual disk page when possible, saving a lot of copying. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
pgsql-hackers by date: