Thread: int24_ops and int42_ops are bogus

int24_ops and int42_ops are bogus

From

Tom Lane

Date:

18 June 2000, 20:24:12

Our documentation claims (eg in the CREATE INDEX ref page) that

: The int24_ops operator class is useful for constructing indices on int2
: data, and doing comparisons against int4 data in query
: qualifications. Similarly, int42_ops support indices on int4 data that
: is to be compared against int2 data in queries.

But as far as I can tell, it is not actually possible for these
opclasses to work as claimed, and never has been. The reason is that
there is only one set of associated operators for an opclass. To have
an opclass that works as suggested above, you would need *two* sets of
operators identified for the opclass. For example, in the case of
int24_ops, you'd need to point at both of:

1. int2 vs. int4 operators (eg, int24lt) --- the planner must see these in order to know that an "int2 < int4" WHERE
clausehas any relevance to the index.

2. int2 vs. int2 operators (eg, int2lt) --- the index access method itself needs these for internal operations on the
index,such as comparing a new datum to the ones already in the index for insertion.

Currently we only reference the first set of operators, which means that
internal operations are wrong for these opclasses. Thus, for example:

create table foo (f1 int4);
create unique index foo42i on foo (f1 int42_ops);
insert into foo values(65537);
insert into foo values(1);
ERROR: Cannot insert a duplicate key into unique index foo42i

In the case of btree operations it's barely possible that we could get
around this by using the three-way comparison support procedure (int2cmp
or int4cmp in these cases) for *all* internal comparisons in the index,
and being careful to use the amop operators --- the right way round! ---
for all comparisons to external values. The btree code is not that
careful now, and I'm not sure it can be made that careful; it's not
clear that the low-level operations can tell whether the key they are
working with is an about-to-be-inserted value (same type as the index
entries) or a comparison key (not same type as the index entries).

Even if we could make it work, it'd be horribly fragile in the face of
future code changes --- people are just too used to assuming that
"a < b" and "b > a" are equivalent ways of coding a test. And we don't
have any way of automatically checking the code, given that all these
values are Datum as far as the compiler knows.

I think we ought to assume that index manipulation deals with only
one datatype for any given index, and therefore these two opclasses
are broken by design and must be removed.

Comments?
regards, tom lane

Re: int24_ops and int42_ops are bogus

From

Bruce Momjian

Date:

18 June 2000, 20:54:53

> I think we ought to assume that index manipulation deals with only
> one datatype for any given index, and therefore these two opclasses
> are broken by design and must be removed.

Agreed.  They are weird.

--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: int24_ops and int42_ops are bogus

From

Tom Lane

Date:

19 June 2000, 00:52:34

I wrote:
> I think we ought to assume that index manipulation deals with only
> one datatype for any given index, and therefore these two opclasses
> are broken by design and must be removed.

I have removed these two opclasses from the system.  I had a further
thought on the issue, which I just want to record in the archives
in case anyone ever comes back and wants to resurrect
int24_ops/int42_ops.

The real design problem with these two opclasses is that if you want
to have an int4 column that you might want to compare against either
int2 or int4 constants, you have to create *two* indexes to handle
the two cases.  The contents of the two indexes will be absolutely
identical, so this approach is inherently silly.  The right way to
attack it is to extend the opclass/amop information so that the
system could understand that a plain-vanilla int4 index might be
used with int4 vs int2 operators to compare against int2 constants
--- or with int4 vs int8 operators to compare against int8 constants,
etc.

It would not be real difficult to extend the opclass representation
to show these relationships, I think.  The hard part is that btree
(and probably the other index types) is sloppy about whether it is
comparing index entries or externally-supplied values and which side
of the comparison is which.  Cleaning that up would be painful and
maybe impractical --- but if it could be done it'd be nifty.

The path I think we will actually pursue, instead, is teaching the
planner to coerce constants to the same type as the compared-to
column.  For instance, given "int2var < int4constant" the planner
will try to coerce the constant to int2 so that it can apply
int2-vs-int2 operators with an int2 index.  This falls down on
cases like "int2var < 100000" because it won't be possible to
reduce the constant to int2, whereas the above-sketched idea could
still handle that case as an indexscan.  But in terms of actual
everyday usefulness, I doubt this is a serious limitation.
        regards, tom lane

Re: int24_ops and int42_ops are bogus

From

Paul Condon

Date:

19 June 2000, 23:58:08

>



>   ------------------------------------------------------------------------
>
> Subject: Re: int24_ops and int42_ops are bogus
> Date: Mon, 19 Jun 2000 00:52:28 -0400
> From: Tom Lane <tgl@sss.pgh.pa.us>
> To: pgsql-hackers@postgreSQL.org
> References: <28999.961374238@sss.pgh.pa.us>
>
> I wrote:
> > I think we ought to assume that index manipulation deals with only
> > one datatype for any given index, and therefore these two opclasses
> > are broken by design and must be removed.
>
> I have removed these two opclasses from the system.  I had a further
> thought on the issue, which I just want to record in the archives
> in case anyone ever comes back and wants to resurrect
> int24_ops/int42_ops.
>
> The real design problem with these two opclasses is that if you want
> to have an int4 column that you might want to compare against either
> int2 or int4 constants, you have to create *two* indexes to handle
> the two cases.  The contents of the two indexes will be absolutely
> identical, so this approach is inherently silly.  The right way to
> attack it is to extend the opclass/amop information so that the
> system could understand that a plain-vanilla int4 index might be
> used with int4 vs int2 operators to compare against int2 constants
> --- or with int4 vs int8 operators to compare against int8 constants,
> etc.
>
> It would not be real difficult to extend the opclass representation
> to show these relationships, I think.  The hard part is that btree
> (and probably the other index types) is sloppy about whether it is
> comparing index entries or externally-supplied values and which side
> of the comparison is which.  Cleaning that up would be painful and
> maybe impractical --- but if it could be done it'd be nifty.
>
> The path I think we will actually pursue, instead, is teaching the
> planner to coerce constants to the same type as the compared-to
> column.  For instance, given "int2var < int4constant" the planner
> will try to coerce the constant to int2 so that it can apply
> int2-vs-int2 operators with an int2 index.  This falls down on
> cases like "int2var < 100000" because it won't be possible to
> reduce the constant to int2, whereas the above-sketched idea could

But since ALL int2var values in the table are in fact less than 100000, this expression is easily optimized to TRUE.
And,I think, similar optimizations can be
 
found for other out of range values.

>
> still handle that case as an indexscan.  But in terms of actual
> everyday usefulness, I doubt this is a serious limitation.
>
>                         regards, tom lane
>