Thread: int24_ops and int42_ops are bogus
Our documentation claims (eg in the CREATE INDEX ref page) that : The int24_ops operator class is useful for constructing indices on int2 : data, and doing comparisons against int4 data in query : qualifications. Similarly, int42_ops support indices on int4 data that : is to be compared against int2 data in queries. But as far as I can tell, it is not actually possible for these opclasses to work as claimed, and never has been. The reason is that there is only one set of associated operators for an opclass. To have an opclass that works as suggested above, you would need *two* sets of operators identified for the opclass. For example, in the case of int24_ops, you'd need to point at both of: 1. int2 vs. int4 operators (eg, int24lt) --- the planner must see these in order to know that an "int2 < int4" WHERE clausehas any relevance to the index. 2. int2 vs. int2 operators (eg, int2lt) --- the index access method itself needs these for internal operations on the index,such as comparing a new datum to the ones already in the index for insertion. Currently we only reference the first set of operators, which means that internal operations are wrong for these opclasses. Thus, for example: create table foo (f1 int4); create unique index foo42i on foo (f1 int42_ops); insert into foo values(65537); insert into foo values(1); ERROR: Cannot insert a duplicate key into unique index foo42i In the case of btree operations it's barely possible that we could get around this by using the three-way comparison support procedure (int2cmp or int4cmp in these cases) for *all* internal comparisons in the index, and being careful to use the amop operators --- the right way round! --- for all comparisons to external values. The btree code is not that careful now, and I'm not sure it can be made that careful; it's not clear that the low-level operations can tell whether the key they are working with is an about-to-be-inserted value (same type as the index entries) or a comparison key (not same type as the index entries). Even if we could make it work, it'd be horribly fragile in the face of future code changes --- people are just too used to assuming that "a < b" and "b > a" are equivalent ways of coding a test. And we don't have any way of automatically checking the code, given that all these values are Datum as far as the compiler knows. I think we ought to assume that index manipulation deals with only one datatype for any given index, and therefore these two opclasses are broken by design and must be removed. Comments? regards, tom lane
> I think we ought to assume that index manipulation deals with only > one datatype for any given index, and therefore these two opclasses > are broken by design and must be removed. Agreed. They are weird. -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
I wrote: > I think we ought to assume that index manipulation deals with only > one datatype for any given index, and therefore these two opclasses > are broken by design and must be removed. I have removed these two opclasses from the system. I had a further thought on the issue, which I just want to record in the archives in case anyone ever comes back and wants to resurrect int24_ops/int42_ops. The real design problem with these two opclasses is that if you want to have an int4 column that you might want to compare against either int2 or int4 constants, you have to create *two* indexes to handle the two cases. The contents of the two indexes will be absolutely identical, so this approach is inherently silly. The right way to attack it is to extend the opclass/amop information so that the system could understand that a plain-vanilla int4 index might be used with int4 vs int2 operators to compare against int2 constants --- or with int4 vs int8 operators to compare against int8 constants, etc. It would not be real difficult to extend the opclass representation to show these relationships, I think. The hard part is that btree (and probably the other index types) is sloppy about whether it is comparing index entries or externally-supplied values and which side of the comparison is which. Cleaning that up would be painful and maybe impractical --- but if it could be done it'd be nifty. The path I think we will actually pursue, instead, is teaching the planner to coerce constants to the same type as the compared-to column. For instance, given "int2var < int4constant" the planner will try to coerce the constant to int2 so that it can apply int2-vs-int2 operators with an int2 index. This falls down on cases like "int2var < 100000" because it won't be possible to reduce the constant to int2, whereas the above-sketched idea could still handle that case as an indexscan. But in terms of actual everyday usefulness, I doubt this is a serious limitation. regards, tom lane
> > ------------------------------------------------------------------------ > > Subject: Re: int24_ops and int42_ops are bogus > Date: Mon, 19 Jun 2000 00:52:28 -0400 > From: Tom Lane <tgl@sss.pgh.pa.us> > To: pgsql-hackers@postgreSQL.org > References: <28999.961374238@sss.pgh.pa.us> > > I wrote: > > I think we ought to assume that index manipulation deals with only > > one datatype for any given index, and therefore these two opclasses > > are broken by design and must be removed. > > I have removed these two opclasses from the system. I had a further > thought on the issue, which I just want to record in the archives > in case anyone ever comes back and wants to resurrect > int24_ops/int42_ops. > > The real design problem with these two opclasses is that if you want > to have an int4 column that you might want to compare against either > int2 or int4 constants, you have to create *two* indexes to handle > the two cases. The contents of the two indexes will be absolutely > identical, so this approach is inherently silly. The right way to > attack it is to extend the opclass/amop information so that the > system could understand that a plain-vanilla int4 index might be > used with int4 vs int2 operators to compare against int2 constants > --- or with int4 vs int8 operators to compare against int8 constants, > etc. > > It would not be real difficult to extend the opclass representation > to show these relationships, I think. The hard part is that btree > (and probably the other index types) is sloppy about whether it is > comparing index entries or externally-supplied values and which side > of the comparison is which. Cleaning that up would be painful and > maybe impractical --- but if it could be done it'd be nifty. > > The path I think we will actually pursue, instead, is teaching the > planner to coerce constants to the same type as the compared-to > column. For instance, given "int2var < int4constant" the planner > will try to coerce the constant to int2 so that it can apply > int2-vs-int2 operators with an int2 index. This falls down on > cases like "int2var < 100000" because it won't be possible to > reduce the constant to int2, whereas the above-sketched idea could But since ALL int2var values in the table are in fact less than 100000, this expression is easily optimized to TRUE. And,I think, similar optimizations can be found for other out of range values. > > still handle that case as an indexscan. But in terms of actual > everyday usefulness, I doubt this is a serious limitation. > > regards, tom lane >