Home > mailing lists
Re: Variable length varlena headers redux - Mailing list pgsql-hackers

From	Bruce Momjian
Subject	Re: Variable length varlena headers redux
Date	February 9, 2007 07:45:13
Msg-id	200702090535.l195Z8C07463@momjian.us Whole thread Raw
In response to	Re: Variable length varlena headers redux (Bruce Momjian <bruce@momjian.us>)
Responses	Re: Variable length varlena headers redux
List	pgsql-hackers
Tree view
Bruce Momjian wrote:
> 
> Uh, I thought the approach was to create type-specific in/out functions,
> and add casting so every time there were referenced, they would expand
> to a varlena structure in memory.

Oh, one more thing.  You are going to need to teach the code that walks
through a tuple attributes about the short header types.  I think you
should set pg_type.typlen = -3 (vs -1 for varlena) and put your macro
code there too.  (As an example, see the macro att_addlength().)

I know it is kind of odd to have a data type that is only used on disk,
and not in memory, but I see this as a baby varlena type, used only to
store and get varlena values using less disk space.

---------------------------------------------------------------------------
> 
> Gregory Stark wrote:
> > 
> > I've been looking at this again and had a few conversations about it. This may
> > be easier than I had originally thought but there's one major issue that's
> > bugging me. Do you see any way to avoid having every user function everywhere
> > use a new macro api instead of VARDATA/VARATT_DATA and VARSIZE/VARATT_SIZEP?
> > 
> > The two approaches I see are either 
> > 
> > a) To have two sets of macros, one of which, VARATT_DATA and VARATT_SIZEP are
> > for constructing new tuples and behaves exactly as it does now. So you always
> > construct a four-byte header datum. Then in heap_form*tuple we check if you
> > can use a shorter header and convert. VARDATA/VARSIZE would be for looking at
> > existing datums and would interpret the header bits.
> > 
> > This seems very fragile since one stray call site using VARATT_DATA to find
> > the data in an existing datum would cause random bugs that only occur rarely
> > in certain circumstances. It would even work as long as the size is filled in
> > with VARATT_SIZEP first which it usually is, but fail if someone changes the
> > order of the statements.
> > 
> > or 
> > 
> > b) throw away VARATT_DATA and VARATT_SIZEP and make all user function
> > everywhere change over to a new macro api. That seems like a pretty big
> > burden. It's safer but means every contrib module would have to be updated and
> > so on.
> > 
> > I'm hoping I'm missing something and there's a way to do this without breaking
> > the api for every user function.
> > 
> > 
> 
> -- Start of included mail From: Tom Lane <tgl@sss.pgh.pa.us>
> 
> > To: Gregory Stark <stark@enterprisedb.com>
> > cc: Gregory Stark <gsstark@mit.edu>, Bruce Momjian <bruce@momjian.us>, 
> >             Peter Eisentraut <peter_e@gmx.net>, pgsql-hackers@postgresql.org, 
> >             Martijn van Oosterhout <kleptog@svana.org>
> > Subject: Re: [HACKERS] Fixed length data types issue 
> > Date: Mon, 11 Sep 2006 13:15:43 -0400
> > Lines: 64
> > Xref: stark.xeocode.com work.enterprisedb:683
> 
> > Gregory Stark <stark@enterprisedb.com> writes:
> > > In any case it seems a bit backwards to me. Wouldn't it be better to
> > > preserve bits in the case of short length words where they're precious
> > > rather than long ones? If we make 0xxxxxxx the 1-byte case it means ...
> > 
> > Well, I don't find that real persuasive: you're saying that it's
> > important to have a 1-byte not 2-byte header for datums between 64 and
> > 127 bytes long.  Which is by definition less than a 2% savings for those
> > values.  I think its's more important to pick bitpatterns that reduce
> > the number of cases heap_deform_tuple has to think about while decoding
> > the length of a field --- every "if" in that inner loop is expensive.
> > 
> > I realized this morning that if we are going to preserve the rule that
> > 4-byte-header and compressed-header cases can be distinguished from the
> > data alone, there is no reason to be very worried about whether the
> > 2-byte cases can represent the maximal length of an in-line datum.
> > If you want to do 16K inline (and your page is big enough for that)
> > you can just fall back to the 4-byte-header case.  So there's no real
> > disadvantage if the 2-byte headers can only go up to 4K or so.  This
> > gives us some more flexibility in the bitpattern choices.
> > 
> > Another thought that occurred to me is that if we preserve the
> > convention that a length word's value includes itself, then for a
> > 1-byte header the bit pattern 10000000 is meaningless --- the count
> > has to be at least 1.  So one trick we could play is to take over
> > this value as the signal for "toast pointer follows", with the
> > assumption that the tuple-decoder code knows a-priori how big a
> > toast pointer is.  I am not real enamored of this, because it certainly
> > adds one case to the inner heap_deform_tuple loop and it'll give us
> > problems if we ever want more than one kind of toast pointer.  But
> > it's a possibility.
> > 
> > Anyway, a couple of encodings that I'm thinking about now involve
> > limiting uncompressed data to 1G (same as now), so that we can play
> > with the first 2 bits instead of just 1:
> > 
> > 00xxxxxx    4-byte length word, aligned, uncompressed data (up to 1G)
> > 01xxxxxx    4-byte length word, aligned, compressed data (up to 1G)
> > 100xxxxx    1-byte length word, unaligned, TOAST pointer
> > 1010xxxx    2-byte length word, unaligned, uncompressed data (up to 4K)
> > 1011xxxx    2-byte length word, unaligned, compressed data (up to 4K)
> > 11xxxxxx    1-byte length word, unaligned, uncompressed data (up to 63b)
> > 
> > or
> > 
> > 00xxxxxx    4-byte length word, aligned, uncompressed data (up to 1G)
> > 010xxxxx    2-byte length word, unaligned, uncompressed data (up to 8K)
> > 011xxxxx    2-byte length word, unaligned, compressed data (up to 8K)
> > 10000000    1-byte length word, unaligned, TOAST pointer
> > 1xxxxxxx    1-byte length word, unaligned, uncompressed data (up to 127b)
> >         (xxxxxxx not all zero)
> > 
> > This second choice allows longer datums in both the 1-byte and 2-byte
> > header formats, but it hardwires the length of a TOAST pointer and
> > requires four cases to be distinguished in the inner loop; the first
> > choice only requires three cases, because TOAST pointer and 1-byte
> > header can be handled by the same rule "length is low 6 bits of byte".
> > The second choice also loses the ability to store in-line compressed
> > data above 8K, but that's probably an insignificant loss.
> > 
> > There's more than one way to do it ...
> > 
> >             regards, tom lane
> > 
> -- End of included mail.
> 
> > 
> > 
> > -- 
> >   Gregory Stark
> >   EnterpriseDB          http://www.enterprisedb.com
> 
> -- 
>   Bruce Momjian  <bruce@momjian.us>          http://momjian.us
>   EnterpriseDB                               http://www.enterprisedb.com
> 
>   + If your life is a hard drive, Christ can be your backup. +
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +
pgsql-hackers by date:
From: Jan Wieck
Date: 09 February 2007, 07:36:14
Subject: Re: Proposal: Commit timestamp
From: RaviKumar.Mandala@versata.com
Date: 09 February 2007, 07:46:45
Subject: Database backup mechanism
Re: Variable length varlena headers redux - Mailing list pgsql-hackers

Previous

Next