Thread: Garbage pad bytes within datums are bad news

Garbage pad bytes within datums are bad news

From

Tom Lane

Date:

04 April 2008, 16:57:30

I tracked down the problem reported here:
http://archives.postgresql.org/pgsql-admin/2008-04/msg00038.php
What it boils down to is that equal() doesn't see these two Consts
as equal:
              {CONST               :consttype 1009               :consttypmod -1               :constlen -1
 :constbyval false               :constisnull false               :constvalue 48 [ 0 0 0 48 0 0 0 1 0 0 0 0 0 0 0 25 0
00 3 0 0               0 1 0 0 0 5 49 127 127 127 0 0 0 5 50 127 127 127 0 0 0 5 51 12              7 127 127 ]
    }
 
              {CONST               :consttype 1009               :consttypmod -1               :constlen -1
 :constbyval false               :constisnull false               :constvalue 48 [ 0 0 0 48 0 0 0 1 0 0 0 0 0 0 0 25 0
00 3 0 0               0 1 0 0 0 5 49 0 0 0 0 0 0 5 50 0 0 0 0 0 0 5 51 0 0 0 ]              }
 

The datums are arrays of text, and the bytes that are different are
garbage pad bytes between array entries.  Since equal() uses simple
bytewise equality (cf datumIsEqual()) it sees the constants as unequal.
The reason the behavior is a bit erratic is that the array constructor
isn't bothering to initialize these bytes, so you might or might not
get a failure depending on what happened to be there before.

Now, in large chunks of the system, a false not-equal result doesn't
cause anything worse than inefficiency, but in the particular case here
you actually get an error :-(.  I'm surprised that we've not seen
something like this reported before, because this has been busted since
forever.

From a semantic point of view it would be nicer if equal() used a
type-specific equality operator to compare Datums, but that idea runs up
against the same problem we saw in connection with HOT comparison of
index-column values: how do you know which equality operator to use,
if a data type has more than one?  Not to mention it'd be slow.

The alternative seems to be to forbid uninitialized pad bytes within
Datums.  That's not very pleasant to contemplate either, since it'll
forever be vulnerable to sins of omission.

Thoughts?
        regards, tom lane

Re: Garbage pad bytes within datums are bad news

From

Teodor Sigaev

Date:

04 April 2008, 17:24:07

> The alternative seems to be to forbid uninitialized pad bytes within
> Datums.  That's not very pleasant to contemplate either, since it'll
> forever be vulnerable to sins of omission.

Hmm, we can add to palloc random filling of allocated memory with 
--enable-cassert.  It'll allow to catch such bugs in most cases.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/

Re: Garbage pad bytes within datums are bad news

From

Tom Lane

Date:

04 April 2008, 17:58:22

Teodor Sigaev <teodor@sigaev.ru> writes:
>> The alternative seems to be to forbid uninitialized pad bytes within
>> Datums.  That's not very pleasant to contemplate either, since it'll
>> forever be vulnerable to sins of omission.

> Hmm, we can add to palloc random filling of allocated memory with 
> --enable-cassert.  It'll allow to catch such bugs in most cases.

Based on the fact that this problem went undetected for so long,
I'm not real convinced of that.  The difficulty is exactly that
not very many system behaviors will fail obviously if nodetrees
that should be seen as equal are not.

Perhaps we could add some additional tests to help exercise it?
But how do you get them applied to all datatypes?
        regards, tom lane

Re: Garbage pad bytes within datums are bad news

From

Gregory Stark

Date:

04 April 2008, 18:15:17

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> The alternative seems to be to forbid uninitialized pad bytes within
> Datums.  That's not very pleasant to contemplate either, since it'll
> forever be vulnerable to sins of omission.

Just brainstorming here, I don't think this is a good solution but perhaps it
could lead somewhere interesting...

We could have actual equal operators include an assertion that the datums are
also datumIsEqual? That isn't guaranteed to catch every case but it would be
good for complex data types like arrays.

I suppose if all we want to do is assert that constructors don't create this
situation we could have an assertion that calls the constructor a second time
(with palloc generating garbage data) and compares the results with
datumEqual.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about
EnterpriseDB'sPostgreSQL training!

Re: Garbage pad bytes within datums are bad news

From

Tom Lane

Date:

04 April 2008, 18:25:22

Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> The alternative seems to be to forbid uninitialized pad bytes within
>> Datums.  That's not very pleasant to contemplate either, since it'll
>> forever be vulnerable to sins of omission.

> Just brainstorming here, I don't think this is a good solution but perhaps it
> could lead somewhere interesting...

> We could have actual equal operators include an assertion that the datums are
> also datumIsEqual? That isn't guaranteed to catch every case but it would be
> good for complex data types like arrays.

That still puts the responsibility on the individual datatype author to
get it right.  The case I'm most worried about is user-written datatypes
that are never going to magically acquire such asserts.

There is another path we could try to take, which is to fix things so
that there aren't any cases where a false not-equal report will break
critical behavior.  I seem to recall that the original justification for
implementing equal() this way was exactly the argument that it didn't
matter if you sometimes got a false not-equal report, so long as
copyObject() was guaranteed to generate equal() trees (which it is).
This also seems prone to somebody breaking the assumption in future,
though, even assuming that we can make it work like that today.
(I'm not too sure how to deal with the ORDER-BY/DISTINCT semantics
without it.)  I guess we could have a test mode in which datumIsEqual
always returned false...
        regards, tom lane

Re: Garbage pad bytes within datums are bad news

From

Teodor Sigaev

Date:

04 April 2008, 18:53:52


> That still puts the responsibility on the individual datatype author to
> get it right.  The case I'm most worried about is user-written datatypes
> that are never going to magically acquire such asserts.

It seems to me that working with two assumption (binary equal and 
catalog-defined equal function) in the same time is a wrong way. If we decide to 
use binary equal criteria, then why we need catalog-defined equal at all? If we 
use catalog-defined one, why we should require binary equality? Using both way 
in the same time is an error prone. It's possible to say that two value is equal  if they are binary the same, if not -
weshould find catalog-defined equal 
 
operation and call it. Binary comparison is only an optimization.
-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/

Re: Garbage pad bytes within datums are bad news

From

Tom Lane

Date:

04 April 2008, 19:40:10

Teodor Sigaev <teodor@sigaev.ru> writes:
>> That still puts the responsibility on the individual datatype author to
>> get it right.  The case I'm most worried about is user-written datatypes
>> that are never going to magically acquire such asserts.

> It seems to me that working with two assumption (binary equal and 
> catalog-defined equal function) in the same time is a wrong way. If we decide to 
> use binary equal criteria, then why we need catalog-defined equal at all? If we 
> use catalog-defined one, why we should require binary equality? Using both way 
> in the same time is an error prone. It's possible to say that two value is equal 
>   if they are binary the same, if not - we should find catalog-defined equal 
> operation and call it. Binary comparison is only an optimization.

That only works if there is a unique function that we can say is "THE"
equal operation for the datatype (with a true result guaranteeing that
every operation the datatype has will produce the same results from the
two values).  There is no such concept in PG at the moment, and if
memory serves there are at least three built-in types for which the
default btree "equality" function in fact doesn't guarantee that.
So even if we wanted to pursue that path, it wouldn't produce a fix
that we could back-patch.
        regards, tom lane

Re: Garbage pad bytes within datums are bad news

From

Gregory Stark

Date:

04 April 2008, 19:46:17

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>>> The alternative seems to be to forbid uninitialized pad bytes within
>>> Datums.  That's not very pleasant to contemplate either, since it'll
>>> forever be vulnerable to sins of omission.
>
>> Just brainstorming here, I don't think this is a good solution but perhaps it
>> could lead somewhere interesting...

Another thought. Perhaps every data type should define an operator which is a
true equals. Ie, it guarantees that *no* internal state that any function
could expose is different between two datums. Most data types could implement
it just by calling memcmp (or postgres could provide such a definition if it's
left undefined).

That gives arrays the option of either providing such an operator or
guaranteeing no padding bytes and using memcmp.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about
EnterpriseDB'sPostgreSQL training!

Re: Garbage pad bytes within datums are bad news

From

Tom Lane

Date:

04 April 2008, 19:57:07

Gregory Stark <stark@enterprisedb.com> writes:
> I suppose if all we want to do is assert that constructors don't create this
> situation we could have an assertion that calls the constructor a second time
> (with palloc generating garbage data) and compares the results with
> datumEqual.

After further reflection I think it might indeed be the case that we
only have to worry about the datatype input routines, since those are
what are invoked to create Consts that the user has a right to expect
are equal().

I cons'd up a quick hack along the above lines and ran the regression
tests with it.  The failures suggest that aside from array_in, we'll
need to fix these types:pathtsquerylquery        (in contrib)ltree        (in contrib)

That seems like a short enough list that the right fix for the back
branches is just to fix these input routines (palloc0 instead of palloc
should do it).  I'm not sure yet if we want some less-klugy solution
for the future.
        regards, tom lane