Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem) - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem) |
Date | |
Msg-id | aJb5OQsJWCKB5CjU@paquier.xyz Whole thread Raw |
In response to | Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem) (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On Fri, Aug 08, 2025 at 03:32:16PM -0400, Tom Lane wrote: > I found some time to look at the v4 patchset, and have a bunch of > comments of different sizes. Thanks for the comments. I have replied to some of the items here: https://www.postgresql.org/message-id/aJbygEBqJgmLS0wF%40paquier.xyz Will try to answer the rest here. > 0001: > > I'm good with widening all these values to 64 bits, but I wonder > if it's a great idea to use unadorned "uint64" as the data type. > That's impossible to grep for, doesn't convey anything much about > what the variables are, etc. I'm tempted to propose instead > inventing a typedef "BigOid" or some such name (bikeshedding > welcome). The elog's could be handled with, say, > #define PRIBO PRIu64 > This suggestion isn't made with the idea that we'd someday switch > to an even wider type, but just with the idea of making it clearer > what these values are being used for. When you see "Oid" you > know it's some sort of object identifier, and I'm sad to give > that up here. Already mentioned on the other message, but I'm OK with a "bigoid" type and a BigOid type. Honestly, I'm not sure about a replacement for PRIu64, as we don't really do that for Oids with %u. > toast_pointer isn't initialized at this point. I see you fixed that > in 0004, but it doesn't help to split the patch series if the > intermediate steps are broken. Oops. FWIW, I've rebased this patch set much more than 4 times. It looks like I've messed up some of the diffs. Sorry about that. > 0003: > > No objection to the struct renaming, but does this go far > enough? Aren't we going to need to rename TOAST_POINTER_SIZE > to TOAST_OID_POINTER_SIZE, etc, so that we can have similar > symbols for the wider version? I'm suspicious of not renaming > the functions that work on these, too. (Oh, it looks like you > did some of that in later parts.) Yeah. I've stuck that into the later parts where the int8 bits have been added. Perhaps I've not been ambitious enough. > BTW, given that varatt.h has "typedef struct varatt_external {...} > varatt_external", I'm certain that the vast majority of uses of > "struct varatt_external" could be shortened to "varatt_external". > And I think we should do that, because using "struct foo" not "foo" > is not project style. This patch would be a fine time to do that. Good point. > 0004: > > Shouldn't VARATT_EXTERNAL_GET_POINTER go away entirely? > It looks to me like every use of that should be replaced by > toast_external_info_get_data(). Point about indirect pointers mentioned on the other message, where we could rename VARATT_EXTERNAL_GET_POINTER to VARATT_EXTERNAL_INDIRECT_GET_POINTER or equivalent to limit the confusion? > I wonder if we shouldn't try to get rid of the phraseology "standard > TOAST pointer", and instead write something like "short TOAST pointer" > or "small TOAST pointer". These aren't really going to be more > "standard" than the wider ones, IMO. I'm seeing one place in arrayfuncs.c and one in detoast.h using this term on HEAD. I would do simpler: no standard and no short, just with a removal of the "standard" part if I were to change something. > I don't like replacing "va_valueid" with just "value". Dropping > the "id" is not an improvement, because now a reader might be > confused about whether this is somehow the actual value of the > toasted datum. Okay, sure. One reason behind the field renaming was also to track down all the areas in need to be updated. > Functions in toast_external.c are under-documented. Some lack > any header comment whatever, and others have one that doesn't > actually say what they do. Okay. > I kind of wonder whether the run-time indirection this design causes > is going to be a performance problem. Perhaps not, given the expenses > involved in accessing a toasted value, but it has a faint feeling > of overdesign. Mentioned on the other message, linked to this message: https://www.postgresql.org/message-id/aGzLiDUB_18-8aVQ%40paquier.xyz > It looks like a lot of this code would just flat out dump core if > passed an invalid vartag value. Probably not good enough, given > that we might look at corrupted data. Right. That's where we would need an error path in toast_external_get_info() if nothing is found. That's a cheap defense, I guess. Good thing is that the lookups are centralized in a single code path, at least with this design. > I kind of feel that treating this operation as a toast_external_info > method is the wrong thing, since where it looks like you are going > is to fetch an identifier and only then decide which vartag you > need to use. That ugliness comes out here: > > + /* > + * Retrieve the external TOAST information, with the value still unknown. > + * We need to do this again once we know the actual value assigned, to > + * define the correct vartag_external for the new TOAST tuple. > + */ Yeah, I was feeling a bit depressed when writing the concept of what a "default" method should be before assigning the value, but that was still feeling right. > 0007: > > Surely we do not need the cast in this: > > + return murmurhash64((int64) DatumGetInt64(datum)); > > I see you copied that from int4hashfast, but that's not good > style either. Okay. > > More generally, though, why do we need catcache support for int8? > There aren't any caches on toast chunks, and I doubt we are going to > introduce any. Sure there might be a need for this down the road, > but I don't see that this patch series is the time to add it. I recall that this part was required for the value conflict lookups and also the TOAST value retrievals, so we are going to need it. > 0008: > > I totally hate the idea of introducing a GUC for this. This should be > user-transparent. Or if it isn't user-transparent, a GUC is still > the wrong thing; some kind of relation option would be more sensible. Per other email, I'm not sure what you entirely mean here: should 8B values be the default with existing TOAST OID values kept as they are across upgrades? Or something else? > 0009: > > I do not love this either. I think the right thing is just to widen > the existing nextOid counter to 64 bits, and increment it in-memory as > 64 bits, and then return either all 64 bits when a 64-bit Oid is asked > for, or just the low-order 32 bits when a 32-bit OID is asked for > (with appropriate hacking for 32-bit wraparound). Having a separate > counter will make it too easy for the same value to be produced by > nearby calls to the 32-bit and 64-bit Oid generators, which is likely > a bad thing, if only because of potential confusion. Acknowledged on other message, fine my be. > 0011: > > Not reviewing this yet, because I disagree with the basic design. > I didn't look at the later patches either. Without an agreement about the design choices related to the first patches up to 0006, I doubt there this is any need to review any of the follow-up patches yet because the choices of the first patches influence the next patches in the series. Thanks for the feedback! -- Michael
Attachment
pgsql-hackers by date: