Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem) - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem) |
Date | |
Msg-id | 1891064.1754681536@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem) (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem)
Re: Support for 8-byte TOAST values (aka the TOAST infinite loop problem) |
List | pgsql-hackers |
Michael Paquier <michael@paquier.xyz> writes: > Attached is a v4, due to conflicts mainly caused by the recent changes > in varatt.h done by e035863c9a04. I found some time to look at the v4 patchset, and have a bunch of comments of different sizes. 0001: I'm good with widening all these values to 64 bits, but I wonder if it's a great idea to use unadorned "uint64" as the data type. That's impossible to grep for, doesn't convey anything much about what the variables are, etc. I'm tempted to propose instead inventing a typedef "BigOid" or some such name (bikeshedding welcome). The elog's could be handled with, say, #define PRIBO PRIu64 This suggestion isn't made with the idea that we'd someday switch to an even wider type, but just with the idea of making it clearer what these values are being used for. When you see "Oid" you know it's some sort of object identifier, and I'm sad to give that up here. This hunk is flat out buggy: @@ -1766,6 +1774,7 @@ check_tuple_attribute(HeapCheckContext *ctx) return true; /* It is external, and we're looking at a page on disk */ + toast_pointer_valueid = toast_pointer.va_valueid; /* * Must copy attr into toast_pointer for alignment considerations toast_pointer isn't initialized at this point. I see you fixed that in 0004, but it doesn't help to split the patch series if the intermediate steps are broken. 0002: OK, although some of the hunks in 0001 seem to belong here. 0003: No objection to the struct renaming, but does this go far enough? Aren't we going to need to rename TOAST_POINTER_SIZE to TOAST_OID_POINTER_SIZE, etc, so that we can have similar symbols for the wider version? I'm suspicious of not renaming the functions that work on these, too. (Oh, it looks like you did some of that in later parts.) BTW, given that varatt.h has "typedef struct varatt_external {...} varatt_external", I'm certain that the vast majority of uses of "struct varatt_external" could be shortened to "varatt_external". And I think we should do that, because using "struct foo" not "foo" is not project style. This patch would be a fine time to do that. 0004: Shouldn't VARATT_EXTERNAL_GET_POINTER go away entirely? It looks to me like every use of that should be replaced by toast_external_info_get_data(). I wonder if we shouldn't try to get rid of the phraseology "standard TOAST pointer", and instead write something like "short TOAST pointer" or "small TOAST pointer". These aren't really going to be more "standard" than the wider ones, IMO. I don't like replacing "va_valueid" with just "value". Dropping the "id" is not an improvement, because now a reader might be confused about whether this is somehow the actual value of the toasted datum. Functions in toast_external.c are under-documented. Some lack any header comment whatever, and others have one that doesn't actually say what they do. I kind of wonder whether the run-time indirection this design causes is going to be a performance problem. Perhaps not, given the expenses involved in accessing a toasted value, but it has a faint feeling of overdesign. It looks like a lot of this code would just flat out dump core if passed an invalid vartag value. Probably not good enough, given that we might look at corrupted data. 0005: Nice! 0006: get_new_value is very confusingly documented. Is the indexid that of the toastrel's index? What attribute is the attnum for? Why should the caller need to provide either, rather than get_new_value knowing that internally? Also, again you are using "value" for something that is not the value of the to-be-toasted datum. I'd suggest something more like "get_new_toast_identifier". I kind of feel that treating this operation as a toast_external_info method is the wrong thing, since where it looks like you are going is to fetch an identifier and only then decide which vartag you need to use. That ugliness comes out here: + /* + * Retrieve the external TOAST information, with the value still unknown. + * We need to do this again once we know the actual value assigned, to + * define the correct vartag_external for the new TOAST tuple. + */ 0007: Surely we do not need the cast in this: + return murmurhash64((int64) DatumGetInt64(datum)); I see you copied that from int4hashfast, but that's not good style either. More generally, though, why do we need catcache support for int8? There aren't any caches on toast chunks, and I doubt we are going to introduce any. Sure there might be a need for this down the road, but I don't see that this patch series is the time to add it. 0008: I totally hate the idea of introducing a GUC for this. This should be user-transparent. Or if it isn't user-transparent, a GUC is still the wrong thing; some kind of relation option would be more sensible. 0009: I do not love this either. I think the right thing is just to widen the existing nextOid counter to 64 bits, and increment it in-memory as 64 bits, and then return either all 64 bits when a 64-bit Oid is asked for, or just the low-order 32 bits when a 32-bit OID is asked for (with appropriate hacking for 32-bit wraparound). Having a separate counter will make it too easy for the same value to be produced by nearby calls to the 32-bit and 64-bit Oid generators, which is likely a bad thing, if only because of potential confusion. 0010: OK 0011: Not reviewing this yet, because I disagree with the basic design. I didn't look at the later patches either. regards, tom lane
pgsql-hackers by date: