Thread: Garbage pad bytes within datums are bad news
I tracked down the problem reported here: http://archives.postgresql.org/pgsql-admin/2008-04/msg00038.php What it boils down to is that equal() doesn't see these two Consts as equal: {CONST :consttype 1009 :consttypmod -1 :constlen -1 :constbyval false :constisnull false :constvalue 48 [ 0 0 0 48 0 0 0 1 0 0 0 0 0 0 0 25 0 00 3 0 0 0 1 0 0 0 5 49 127 127 127 0 0 0 5 50 127 127 127 0 0 0 5 51 12 7 127 127 ] } {CONST :consttype 1009 :consttypmod -1 :constlen -1 :constbyval false :constisnull false :constvalue 48 [ 0 0 0 48 0 0 0 1 0 0 0 0 0 0 0 25 0 00 3 0 0 0 1 0 0 0 5 49 0 0 0 0 0 0 5 50 0 0 0 0 0 0 5 51 0 0 0 ] } The datums are arrays of text, and the bytes that are different are garbage pad bytes between array entries. Since equal() uses simple bytewise equality (cf datumIsEqual()) it sees the constants as unequal. The reason the behavior is a bit erratic is that the array constructor isn't bothering to initialize these bytes, so you might or might not get a failure depending on what happened to be there before. Now, in large chunks of the system, a false not-equal result doesn't cause anything worse than inefficiency, but in the particular case here you actually get an error :-(. I'm surprised that we've not seen something like this reported before, because this has been busted since forever. From a semantic point of view it would be nicer if equal() used a type-specific equality operator to compare Datums, but that idea runs up against the same problem we saw in connection with HOT comparison of index-column values: how do you know which equality operator to use, if a data type has more than one? Not to mention it'd be slow. The alternative seems to be to forbid uninitialized pad bytes within Datums. That's not very pleasant to contemplate either, since it'll forever be vulnerable to sins of omission. Thoughts? regards, tom lane
> The alternative seems to be to forbid uninitialized pad bytes within > Datums. That's not very pleasant to contemplate either, since it'll > forever be vulnerable to sins of omission. Hmm, we can add to palloc random filling of allocated memory with --enable-cassert. It'll allow to catch such bugs in most cases. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Teodor Sigaev <teodor@sigaev.ru> writes: >> The alternative seems to be to forbid uninitialized pad bytes within >> Datums. That's not very pleasant to contemplate either, since it'll >> forever be vulnerable to sins of omission. > Hmm, we can add to palloc random filling of allocated memory with > --enable-cassert. It'll allow to catch such bugs in most cases. Based on the fact that this problem went undetected for so long, I'm not real convinced of that. The difficulty is exactly that not very many system behaviors will fail obviously if nodetrees that should be seen as equal are not. Perhaps we could add some additional tests to help exercise it? But how do you get them applied to all datatypes? regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > The alternative seems to be to forbid uninitialized pad bytes within > Datums. That's not very pleasant to contemplate either, since it'll > forever be vulnerable to sins of omission. Just brainstorming here, I don't think this is a good solution but perhaps it could lead somewhere interesting... We could have actual equal operators include an assertion that the datums are also datumIsEqual? That isn't guaranteed to catch every case but it would be good for complex data types like arrays. I suppose if all we want to do is assert that constructors don't create this situation we could have an assertion that calls the constructor a second time (with palloc generating garbage data) and compares the results with datumEqual. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB'sPostgreSQL training!
Gregory Stark <stark@enterprisedb.com> writes: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: >> The alternative seems to be to forbid uninitialized pad bytes within >> Datums. That's not very pleasant to contemplate either, since it'll >> forever be vulnerable to sins of omission. > Just brainstorming here, I don't think this is a good solution but perhaps it > could lead somewhere interesting... > We could have actual equal operators include an assertion that the datums are > also datumIsEqual? That isn't guaranteed to catch every case but it would be > good for complex data types like arrays. That still puts the responsibility on the individual datatype author to get it right. The case I'm most worried about is user-written datatypes that are never going to magically acquire such asserts. There is another path we could try to take, which is to fix things so that there aren't any cases where a false not-equal report will break critical behavior. I seem to recall that the original justification for implementing equal() this way was exactly the argument that it didn't matter if you sometimes got a false not-equal report, so long as copyObject() was guaranteed to generate equal() trees (which it is). This also seems prone to somebody breaking the assumption in future, though, even assuming that we can make it work like that today. (I'm not too sure how to deal with the ORDER-BY/DISTINCT semantics without it.) I guess we could have a test mode in which datumIsEqual always returned false... regards, tom lane
> That still puts the responsibility on the individual datatype author to > get it right. The case I'm most worried about is user-written datatypes > that are never going to magically acquire such asserts. It seems to me that working with two assumption (binary equal and catalog-defined equal function) in the same time is a wrong way. If we decide to use binary equal criteria, then why we need catalog-defined equal at all? If we use catalog-defined one, why we should require binary equality? Using both way in the same time is an error prone. It's possible to say that two value is equal if they are binary the same, if not - weshould find catalog-defined equal operation and call it. Binary comparison is only an optimization. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Teodor Sigaev <teodor@sigaev.ru> writes: >> That still puts the responsibility on the individual datatype author to >> get it right. The case I'm most worried about is user-written datatypes >> that are never going to magically acquire such asserts. > It seems to me that working with two assumption (binary equal and > catalog-defined equal function) in the same time is a wrong way. If we decide to > use binary equal criteria, then why we need catalog-defined equal at all? If we > use catalog-defined one, why we should require binary equality? Using both way > in the same time is an error prone. It's possible to say that two value is equal > if they are binary the same, if not - we should find catalog-defined equal > operation and call it. Binary comparison is only an optimization. That only works if there is a unique function that we can say is "THE" equal operation for the datatype (with a true result guaranteeing that every operation the datatype has will produce the same results from the two values). There is no such concept in PG at the moment, and if memory serves there are at least three built-in types for which the default btree "equality" function in fact doesn't guarantee that. So even if we wanted to pursue that path, it wouldn't produce a fix that we could back-patch. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Gregory Stark <stark@enterprisedb.com> writes: >> "Tom Lane" <tgl@sss.pgh.pa.us> writes: >>> The alternative seems to be to forbid uninitialized pad bytes within >>> Datums. That's not very pleasant to contemplate either, since it'll >>> forever be vulnerable to sins of omission. > >> Just brainstorming here, I don't think this is a good solution but perhaps it >> could lead somewhere interesting... Another thought. Perhaps every data type should define an operator which is a true equals. Ie, it guarantees that *no* internal state that any function could expose is different between two datums. Most data types could implement it just by calling memcmp (or postgres could provide such a definition if it's left undefined). That gives arrays the option of either providing such an operator or guaranteeing no padding bytes and using memcmp. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB'sPostgreSQL training!
Gregory Stark <stark@enterprisedb.com> writes: > I suppose if all we want to do is assert that constructors don't create this > situation we could have an assertion that calls the constructor a second time > (with palloc generating garbage data) and compares the results with > datumEqual. After further reflection I think it might indeed be the case that we only have to worry about the datatype input routines, since those are what are invoked to create Consts that the user has a right to expect are equal(). I cons'd up a quick hack along the above lines and ran the regression tests with it. The failures suggest that aside from array_in, we'll need to fix these types:pathtsquerylquery (in contrib)ltree (in contrib) That seems like a short enough list that the right fix for the back branches is just to fix these input routines (palloc0 instead of palloc should do it). I'm not sure yet if we want some less-klugy solution for the future. regards, tom lane