Re: [RFC] indirect toast tuple support - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: [RFC] indirect toast tuple support |
Date | |
Msg-id | 20130219140055.GA4582@awork2.anarazel.de Whole thread Raw |
In response to | Re: [RFC] indirect toast tuple support (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [RFC] indirect toast tuple support
Re: [RFC] indirect toast tuple support |
List | pgsql-hackers |
On 2013-02-19 08:48:05 -0500, Robert Haas wrote: > On Sat, Feb 16, 2013 at 11:42 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > Given that there have been wishes to support something like b) for quite > > some time, independent from logical decoding, it seems like a good idea > > to add support for it. Its e.g. useful for avoiding repeated detoasting > > or decompression of tuples. > > > > The problem with b) is that there is no space in varlena's flag bits to > > directly denote that a varlena points into memory instead of either > > directly containing the data or a varattrib_1b_e containing a > > varatt_external pointing to an on-disk toasted tuple. > > So the other way that we could do this is to use something that's the > same size as a TOAST pointer but has different content - the > seemingly-obvious choice being va_toastrelid == 0. Unfortunately that would mean you need to copy the varatt_external (or whatever it would be called) to aligned storage to check what it is. Thats why I went the other way. Its a bit sad that varatt_1b_e only contains a length and not a type byte. I would like to change the storage of existing toast types but thats not going to work for pg_upgrade reasons... > I'd be a little > reluctant to do it the way you propose because we might, at some > point, want to try to reduce the size of toast pointers. If you have > a tuple with many attributes, the size of the TOAST pointers > themselves starts to add up. It would be nice to be able to have 8 > byte or even 4 byte toast pointers to handle those situations. If we > steal one or both of those lengths to mean "the data is cached in > memory somewhere" then we can't use those lengths in a smaller on-disk > representation, which would seem a shame. I agree. As I said above, having the type overlayed into the lenght was and is a bad idea, I just haven't found a better one thats compatible yet. Except inventing typlen=-3 aka "toast2" or something. But even that wouldn't help getting rid of existing pg_upgraded tables. Besides being a maintenance nightmare. The only reasonable thing I can see us doing is renaming varattrib_1b_e.va_len_1be into va_type and redefine VARSIZE_1B_E into a switch that maps types into lengths. But I think I would put this off, except placing a comment somewhere, until its gets necessary. > But having said that, +1 on the general idea of getting something like > this done. We really need a better infrastructure to avoid copying > large values around repeatedly in memory - a gigabyte is a lot of data > to be slinging around. > > Of course, you will not be surprised to hear that I think this is 9.4 material. Yes, obviously. But I need time to actually propose a working patch (I already found 2 bugs in what I had submitted), thats why I brought it up now. No point in wasting time if there's an oviously better idea around. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: