Re: Parallel Seq Scan - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Parallel Seq Scan |
Date | |
Msg-id | CA+TgmoZevF_DAhqbo4j7VhwmEzSZT3wprZviqW4zvao1qgC_wA@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel Seq Scan (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Parallel Seq Scan
|
List | pgsql-hackers |
On Thu, Sep 3, 2015 at 6:21 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > [ new patches ] Still more review comments: + /* Allow space for terminating zero-byte */ + size = add_size(size, 1); This is pointless. The length is already stored separately, and if it weren't, this wouldn't be adequate anyway because a varlena can contain NUL bytes. It won't if it's text, but it could be bytea or numeric or whatever. RestoreBoundParams is broken, because it can do unaligned reads, which will core dump on some architectures (and merely be slow on others). If there are two or more parameters, and the first one is a varlena with a length that is not a multiple of MAXIMUM_ALIGNOF, the second SerializedParamExternData will be misaligned. Also, it's pretty lame that we send the useless pointer even for a pass-by-reference data type and then overwrite the bad pointer with a good one a few lines later. It would be better to design the serialization format so that we don't send the bogus pointer over the wire in the first place. It's also problematic in my view that there is so much duplicated code here. SerializedParamExternData and SerializedParamExecData are very similar and there are large swaths of very similar code to handle each case. Both structures contain Datum value, Size length, bool isnull, and Oid ptype, albeit not in the same order for some reason. The only difference is that SerializedParamExternData contains uint16 pflags and SerializedParamExecData contains int paramid. I think we need to refactor this code to get rid of all this duplication. I suggest that we decide to represent a datum here in a uniform fashion, perhaps like this: First, store a 4-byte header word. If this is -2, the value is NULL and no data follows. If it's -1, the value is pass-by-value and sizeof(Datum) bytes follow. If it's >0, the value is pass-by-reference and the value gives the number of following bytes that should be copied into a brand-new palloc'd chunk. Using a format like this, we can serialize and restore datums in various contexts - including bind and exec params - without having to rewrite the code each time. For example, for param extern data, you can dump an array of all the ptypes and paramids and then follow it with all of the params one after another. For param exec data, you can dump an array of all the ptypes and paramids and then follow it with the values one after another. The code that reads and writes the datums in both cases can be the same. If we need to send datums in other contexts, we can use the same code for it. The attached patch - which I even tested! - shows what I have in mind. It can save and restore the ParamListInfo (bind parameters). I haven't tried to adapt it to the exec parameters because I don't quite understand what you are doing there yet, but you can see that the datum-serialization logic is separated from the stuff that knows about the details of ParamListInfo, so datumSerialize() should be reusable for other purposes. This also doesn't have the other problems mentioned above. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
pgsql-hackers by date: