Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem) - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem)
Date
Msg-id CAMT0RQQvki5v5FPZMXSR3cn4Y2gMUUtEYb+Hf_OQmqUbeBrUeQ@mail.gmail.com
Whole thread Raw
In response to Support for 8-byte TOAST values (aka the TOAST infinite loop problem)  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem)
List pgsql-hackers
Hi Michael

I'll take a look at the patch set.

While digging around in the TOAST code did you have any ideas on how
one could extract the TOAST APIs in a way that they can be added in
Table Access Method definition ?

Not all TAMs need TOAST, but the ones that do could also be the ones
that still like to do something different when materializing toasted
values.

And TOAST is actually a nice abstraction which could be used as basis
for both offloading more columns into separate forks and files as well
as implementing some kinds of vectored, columnar and compressed
storages.

----
Hannu

On Thu, Jun 19, 2025 at 7:59 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> Hi all,
>
> I have been looking at $subject and the many past reviews recently,
> also related to some of the work related to the potential support for
> zstandard compression in TOAST values, and found myself pondering
> about the following message from Tom, to be reminded that nothing has
> been done regarding the fact that the backend may finish in an
> infinite loop once a TOAST table reaches 4 billion values:
> https://www.postgresql.org/message-id/764273.1669674269%40sss.pgh.pa.us
>
> Spoiler: I have heard of users that are in this case, and the best
> thing we can do currently except raising shoulders is to use
> workarounds with data externalization AFAIK, which is not nice,
> usually, and users notice the problem once they see some backends
> stuck in the infinite loop.  I have spent some time looking at the
> problem, and looked at all the proposals in this area like these ones
> (I hope so at least):
> https://commitfest.postgresql.org/patch/4296/
> https://www.postgresql.org/message-id/flat/224711f9-83b7-a307-b17f-4457ab73aa0a@sigaev.ru
>
> Anyway, it seems like nothing goes in a direction that I think would
> be suited to fix the two following problems (some of the proposed
> patches broke backward-compatibility, as well, and that's critical):
> - The limit of TOAST values to 4 billions, because external TOAST
> pointers want OIDs.
> - New compression methods, see the recent proposal about zstandard at
> [1].  ToastCompressionId is currently limited to 4 values because the
> extinfo field of varatt_external has only this much data remaining.
> Spoiler: I want to propose a new varatt_external dedicated to
> zstandard-compressed external pointers, but that's not for this
> thread.
>
> Please find attached a patch set I have finished with while poking at
> the problem, to address points 1) and 2) in the first email mentioned
> at the top of this message.  It is not yet ready for prime day yet
> (there are a couple of things that would need adjustments), but I have
> reached the point where I am going to need a consensus about what
> people would be OK to have in terms of design to be able to support
> multiple types of varatt_external to address these issues.  And I'm OK
> to consume time on that for the v19 cycle.
>
> While hacking at (playing with?) the whole toasting and detoasting
> code to understand the blast radius that this would involve, I have
> quickly found that it is very annoying to have to touch at many places
> of varatt.h to make variants of the existing varatt_external structure
> (what we store on-disk as varlena Datum for external TOAST pointers).
> Spoiler: it's my first time touching the internals of this code area
> so deeply.  Most of the code churns happen because we need to make the
> [de]toast code aware of what to do depending on the vartags of the
> external varlenas.  It would be simple to hardcode a bunch of new
> VARATT_IS_EXTERNAL_ONDISK() variants to plug in the new structures.
> While it is efficient, this has a cost for out-of-core code and in
> core because all the places that touch external TOAST pointers need to
> be adjusted.  Point is: it can be done.  But if we introduce more
> types of external TOAST pointers we need to always patch all these
> areas, and there's a cost in that each time one or more new vartags
> are added.
>
> So, I have invented a new interface aimed at manipulating on-disk
> external TOAST pointers, called toast_external, that is an on-memory
> structure that services as an interface between on-disk external TOAST
> pointers and what the backend wants to look at when retrieving chunks
> of data from the TOAST relations.  That's the main proposal of this
> patch set, with a structure looking like that:
> typedef struct toast_external_data
> {
>         /* Original data size (includes header) */
>         int32       rawsize;
>         /* External saved size (without header) */
>         uint32      extsize;
>         /* compression method */
>         ToastCompressionId compression_method;
>         /* Relation OID of TOAST table containing the value */
>         Oid                     toastrelid;
>         /*
>          * Unique ID of value within TOAST table.  This could be an OID or an
>          * int8 value.  This field is large enough to be able to store any of
>          * them.
>          */
>         uint64          value;
> } toast_external_data;
>
> This is a bit similar to what the code does for R/W and R/O vartags,
> only applying to the on-disk external pointers.  Then, the [de]toast
> code and extension code is updated so as varlenas are changed into
> this structure if we need to retrieve some of its data, and these
> areas of the code do not need to know about the details of the
> external TOAST pointers.  When saving an external set of chunks, this
> structure is filled with information depending on what
> toast_save_datum() deals with, be it a short varlena, a non-compressed
> external value, or a compressed external value, then builds a varlena
> with the vartag we want.
>
> External TOAST pointers have three properties that are hardcoded in
> the tree, bringing some challenges of their own:
> - The maximum size of a chunk, TOAST_MAX_CHUNK_SIZE, tweaked at close
> to 2k to make 4 chunks fit on a page.  This depends on the size of the
> external pointer.  This one was actually easy to refactor.
> - The varlena header size, based on VARTAG_SIZE(), which is kind of
> tricky to refactor out in the new toast_external.c, but that seems OK
> even if this knowledge stays in varatt.h.
> - The toast pointer size, aka TOAST_POINTER_SIZE.  This one is
> actually very interesting (tricky): we use it in one place,
> toast_tuple_find_biggest_attribute(), as a lower bound to decide if an
> attribute should be toastable or not.  I've refactored the code to use
> a "best" guess depending on the value type in the TOAST relation, but
> that's not 100% waterproof.  That needs more thoughts.
>
> Anyway, the patch set is able to demonstrate how much needs to be done
> in the tree to support multiple chunk_id types, and the CI is happy
> with the attached.  Some of the things done:
> - Introduction of a user-settable GUC called default_toast_type, that
> can be switched between two modes "oid" and "int8", to force the
> creation of a TOAST relation using one type or the other.
> - Dump, restore and upgrade support are integrated, relying on a GUC
> makes the logic a breeze.
> - 64b values are retrieved from a single counter in the control file,
> named a "TOAST counter", which has the same reliability and properties
> as an OID, with checkpoint support, WAL records, etc.
> - Rewrites are soft, so I have kicked the can down the toast on this
> point to not make the proposal more complicated than it should be: a
> VACUUM FULL retains the same TOAST value type as the original.  We
> could extend rewrites so as the type of TOAST value is changed.  It is
> possible to setup a new cluster with default_toast_type = int8 set
> after an upgrade, with the existing tables still using the OID mode.
> This relates to the recent proposal with a concurrent VACUUM FULL
> (REPACK discussion).
>
> The patch set keeps the existing vartag_external with OID values for
> backward-compatibility, and adds a second vartag_external that can
> store 8-byte values.  This model is the simplest one, and
> optimizations are possible, where the Datum TOAST pointer could be
> shorter depending on the ID type (OID or int8), the compression method
> and the actual value to divide in chunks.  For example, if you know
> that a chunk of data to save has a value less than UINT32_MAX, we
> could store 4 bytes worth of data instead of 8.  This design has the
> advantage to allow plugging in new TOAST external structures easily.
> Now I've not spent extra time in this tuning, because there's no point
> in spending more time without an actual agreement about three things,
> and *that's what I'm looking for* as feedback for this upcoming CF:
> - The structures of the external TOAST pointers.  Variable-sized
> pointers could be one possibility, across multiple vartags.  Ideas are
> welcome.
> - How many vartag_external types we want.
> - If people are actually OK with this translation layer or not, and I
> don't disagree that there may be some paths hot enough where the
> translation between the on-disk varlenas and this on-memory
> toast_external_data hurts.  Again, it is possible to hardcode more
> types of vartags in the tree, or just bypass the translation in the
> paths that are too hot.  That's doable still brutal, but if that's the
> final consensus reached I'm OK with that as well.  (See for example
> the changes in amcheck to see how simpler things get.)
>
> The patch set has been divided into multiple pieces to ease its
> review.  Again, I'm not completely happy with everything in it, but
> it's a start.  Each patch has its own commit message, so feel free to
> refer to them for more details:
> - 0001 introduces the GUC default_toast_type.  It is just defined, not
> used in the tree at this stage.
> - 0002 adds support for catcache lookups for int8 values, required to
> allow TOAST values with int8 and its indexes.  Potentially useful for
> extensions.
> - 0003 introduces the "TOAST counter", 8 bytes in the control file to
> allocate values for the int8 chunk_id.  That's cheap, reliable.
> - 0004 is a mechanical change, that enlarges a couple of TOAST
> interfaces to use values of uint64 instead of OID.
> - 0005, again a mechanical change, reducing a bit the footprint of
> TOAST_MAX_CHUNK_SIZE because OID and int8 values need different
> values.
> - 0006 tweaks pg_column_toast_chunk_id() to use int8 as return type.
>
> Then comes the "main" patches:
> - 0007 adds support for int8 chunk_id in TOAST tables.  This is mostly
> a mechanical change.  If applying the patches up to this point,
> external Datums are applied to both OID and int8 values.  Note that
> there is one tweak I'm unhappy with: the toast counter generation
> would need to be smarter to avoid concurrent values because we don't
> cross-check the TOAST index for existing values.  (Sorry, got slightly
> lazy here).
> - 0008 adds tests for external compressed and uncompressed TOAST
> values for int8 TOAST types.
> - 0009 adds support for dump, restore, upgrades of the TOAST table
> types.
> - 0010 is the main dish: refactoring of the TOAST code to use
> toast_external_data, with OID vartags as the only type defined.
> - 0011 adds a second vartag_external: the one with int8 values stored
> in the external TOAST pointer.
> - 0012 is a bonus for amcheck: what needs to be done in its TAP tests
> to allow the corruption cases to work when supporting a new vartag.
>
> That was a long message.  Thank you for reading if you have reached
> this point.
>
> Regards,
>
> [1]: https://www.postgresql.org/message-id/CAFAfj_HX84EK4hyRYw50AOHOcdVi-%2BFFwAAPo7JHx4aShCvunQ%40mail.gmail.com
> --
> Michael



pgsql-hackers by date:

Previous
From: Bertrand Drouvot
Date:
Subject: Re: Remove unused wait_event_info parameter in FileStartReadV()
Next
From: Jakub Wartak
Date:
Subject: Re: Adding basic NUMA awareness