Re: Variable length varlena headers redux - Mailing list pgsql-hackers
From | Gregory Stark |
---|---|
Subject | Re: Variable length varlena headers redux |
Date | |
Msg-id | 87fy9agirq.fsf@stark.xeocode.com Whole thread Raw |
In response to | Re: Variable length varlena headers redux (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Variable length varlena headers redux
Re: Variable length varlena headers redux |
List | pgsql-hackers |
Tom Lane <tgl@sss.pgh.pa.us> writes: > Gregory Stark <stark@enterprisedb.com> writes: > > I don't really see a way around it though. Places that fill in VARDATA before > > the size (formatting.c seems to be the worst case) will just have to be > > changed and it'll be a fairly fragile point. > > No, we're not going there: it'd break too much code now and it'd be a > continuing source of bugs for the foreseeable future. The sane way to > design this is that > > (1) code written to existing practice will always generate 4-byte > headers. (Hence, VARDATA() acts the same as now.) That's the format > that generally gets passed around in memory. So then we don't need to replace VARSIZE with SET_VARLENA_LEN at all. > (2) creation of a short header is handled by the TOAST code just before > the tuple goes to disk. > > (3) replacement of a short header with a 4-byte header is considered > part of de-TOASTing. So (nigh) every tuple will get deformed and reformed once before it goes to disk? Currently the toast code doesn't even look at a tuple if it's small enough, but in this case we would want it to fire even on very narrow rows. One design possibility I considered was doing this in heap_deform_tuple and heap_form_tuple. Basically skipping the extra deform/form_tuple cycle in the toast code. I had considered having heap_deform_tuple palloc copies of these data before returning them. But that has the same problems. The other problem is that there may be places in the code that receive a datum from someplace where they have every right to expect it not to be toasted. For example, plpgsql deforming a tuple they just formed, or even as the return value from a function. They might be quite surprised to receive a toasted tuple. Note also that that's going to force us to palloc and memcpy these data. Are there going to be circumstances where existing code where this changes the memory context lifetime of some data? If, for example, soemthing like the inet code knows its arguments can never be large enough to get toasted and doesn't do a FREE_IF_COPY on its btree operator arguments. > After we have that working, we can work on offering alternative macros > that let specific functions avoid the overhead of conversion between > 4-byte headers and short ones, in much the same way that there are TOAST > macros now that let specific functions get down-and-dirty with the > out-of-line TOAST representation. But first we have to get to the point > where 4-byte-header datums can be distinguished from short-header datums > by inspection; and that requires either network byte order in the 4-byte > length word or some other change in its representation. > > > Actually I think neither htonl nor bitshifting the entire 4-byte word is going > > to really work here. Both will require 4-byte alignment. > > And your point is what? The 4-byte form can continue to require > alignment, and *will* require it in any case, since many of the affected > datatypes expect alignment of the data within the varlena. The trick is > that when we are examining a non-aligned address within a tuple, we have > to be able to tell whether we are looking at the first byte of a > short-header datum (not aligned) or a pad byte. This is easily done, > for instance by decreeing that pad bytes must be zeroes. Well if we're doing it in toast then the alignment of the payload really doesn't matter at all. It'll be realigned after detoasting anyways. What I had had in mind was to prohibit using smaller headers than the alignment of the data type. But that was on the assumption we would continue to use the compressed header in memory and not copy it. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: