Better support for whole-row operations and composite types - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Better support for whole-row operations and composite types |
Date | |
Msg-id | 3072.1080587912@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Better support for whole-row operations and composite types
Re: Better support for whole-row operations and composite Re: Better support for whole-row operations and composite |
List | pgsql-hackers |
We have a number of issues revolving around the fact that composite types (row types) aren't first-class objects. I think it's past time to fix that. Here are some notes about doing it. I am not sure all these ideas are fully-baked ... comments appreciated. When represented as a Datum, the format of a row-type object needs to be something like this: * overall length: int4 (this makes the Datum a valid varlena item) * row type id: Oid (either a composite type id or RECORDOID) * row type typmod: int4 (see below for usage) -- pad if needed to MAXALIGN boundary * heap tuple representation, beginning with a HeapTupleHeaderData struct If we do it exactly as above then we will be wasting some space, because the xmin/xmax/cmax and ctid fields of HeapTupleHeaderData are of no use in a row that isn't actually a table member row. It is very tempting to overlay the length and rowtype fields with the HeapTupleHeaderData struct. This would save some code as well as space --- see discussion below. Only named composite types, not RECORD, will be allowed to be used as table column types. This ensures that any row object stored on disk will have a valid composite type ID embedded in it, so that the row structure can be retrieved when the row is read. However, we want to be able to support row objects in memory that are of transient record types (for example, the output of a function returning RECORD will have a record type determined by the query itself). I propose that we handle this case by setting the type id to RECORDOID and using the typmod to identify the particular record type --- the typmod will essentially be an index into a backend-local cache of record types. More detail below. We'll add "tdtypeid" and "tdtypmod" fields to TupleDesc structs. This will make it easy to set the embedded type information correctly when manufacturing a row datum using a TupleDesc. For TupleDescs associated with relations, tdtypeid is just the relation's row type OID, and tdtypmod is -1. For TupleDescs representing transient row types, we initially set tdtypeid to RECORDOID and tdtypmod to -1 (indicating a completely anonymous row type). If the row type actually needs to be identifiable then we establish a cache entry for it and set the typmod to an index for the cache entry. I think this will only need to happen when the query contains a function-returning-RECORD or a whole-row variable referencing what would otherwise be an anonymous row type, such as a JOIN result. Composite types, as well as the RECORD type, will be marked in pg_type as pass-by-ref, varlena (typlen -1), typalign 'd'. (We will use the maximum alignment always to avoid any dependency on types of the contained columns.) The present function call and return conventions involving TupleTableSlots will be replaced by simply passing and returning these row objects as pass-by-reference Datums. In the case of functions returning rowtypes, we'll continue to support the present ReturnSetInfo convention for returning a separate TupleDesc describing the result type --- but this will just be a crosscheck. We will be able to make generic I/O routines for composite types, comparable to those used now for arrays. Not sure what a convenient external format would look like. (Possibly use the same conventions as for a 1-D array?) We will need to make the convention that the type OID of a composite type is passed to the input routine, in the same way that an array input routine gets the typelem OID; else the input routine won't know what to do. We could also think about allowing functions that are declared as accepting RECORD (ie, polymorphic-across-row-types functions). They would use the same methods already used by polymorphic functions to find out the true types of their inputs. (Might be best to invent a separate pseudotype, say ANYRECORD, rather than overloading RECORD for this purpose.) The recently developed SRF API is a bit unfortunate since it exposes the assumption that a TupleTableSlot must be involved in returning a tuple. If we don't overlay the Datum header with HeapTupleHeader then I think we have to make TupleGetDatum copy the passed tuple and insert the row type info from the slot's tupledesc, which'd be pretty inefficient because it means making an extra copy of the row data. But if we do overlay the header fields, then I think we can set up backwards-compatibility definitions in which the slot is simply ignored. Specifically: TupleDescGetSlot: no-op, returns NULLTupleGetDatum: ignore slot, return tuple t_data pointer as datum This will work because heap_formtuple and BuildTupleFromCStrings can return a HeapTuple whose t_data part is already a valid row Datum, simply by setting the appropriate length and type fields in it. (If the tuple is ever stored to disk as a regular table row, these fields will be overwritten with xmin/cmin info at that time.) To convert a row Datum into something that can be passed to heap_getattr, one could use a local variable of type HeapTupleData and set its t_data field to the datum's pointer value. t_len is copied from the datum contents, while the other fields of HeapTupleData can just be set to zeroes. ExecEvalVar for a whole-row reference will need to copy the scan tuple so that it can insert the correct length and tuple type fields. (We cannot scribble on the tuple as it sits in the disk buffer, of course.) Fortunately this shouldn't be a major memory leak anymore since the copy can be made in the current short-lived memory context. Handling anonymous RECORD types ------------------------------- I envision expanding typcache.c to be able to store TupleDesc structures for composite and record types. In the case of regular composite types this is not especially difficult. For record types, we are essentially trying to make a backend-local mapping from typmod values to TupleDescs. There are a couple of interesting points: * We have to be able to re-use an already-existing cache entry if it matches a requested TupleDesc. This avoids indefinite growth of the type cache over many queries. There could still be issues with memory leakage if a single backend session uses a huge number of distinct record types over its lifetime, but that doesn't seem likely to be an issue in practice. (We could avoid this problem by recycling no-longer-needed cache entries, but what with plan caching I'm not sure there's any pleasant way to do that. For the moment I intend that cache entries for record types will live for the life of the backend.) * Since record typmod values are backend-local, they aren't meaningful in query structures stored on disk. When a stored rule is read in, we'll need to be able to replace any embedded typmod values with correct assignments for the current backend. Safely storing composite types on disk -------------------------------------- If a composite row value contains any out-of-line TOAST references, we'd have to expand those references before we could safely store the value on disk. This can be handled by the same tuptoaster.c routines that are already concerned with replacing unsafe references. ALTER TABLE issues ------------------ If an ALTER TABLE command does something that requires examining or changing every row of a table, it would presumably have to do the same to all entries in any composite-type column of the table's rowtype. To avoid surprises and interesting debates about who has permissions to do this, it might be wise to restrict on-disk composite columns to be only of standalone composite types (ie, those made with CREATE TYPE AS). This restriction would also avoid debates about whether table constraints apply to composite-type columns. Notes ----- While doing this, we should once and for all rip out the last vestiges of the "attisset" feature. Add an Assert to ExecEvalVar that checks that whole-row vars (and, I guess, any system column as well) are fetched from a scan tuple, never the inner or outer side of a join. If they've not been converted into ordinary field references in a join, it's too late. The current API for TypeGetTupleDesc is somewhat bogus --- I don't think the "column alias" option is really appropriate, and it is lacking a typmod argument so it can't be used with record types. We shall have to deprecate it in favor of a new routine. regards, tom lane
pgsql-hackers by date: