Re: New thoughts about indexing cross-type comparisons - Mailing list pgsql-hackers
From | Dave Smith |
---|---|
Subject | Re: New thoughts about indexing cross-type comparisons |
Date | |
Msg-id | 3F67444A.9010407@candata.com Whole thread Raw |
In response to | New thoughts about indexing cross-type comparisons (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: New thoughts about indexing cross-type comparisons
|
List | pgsql-hackers |
If this is only dealing with constants, why not just explicitly add a cast to the constant of the column type at the planner level. It would solve this problem as well ... create table test (f int2); select * from test where f=cast('1981928928921' as int2); ERROR: pg_atoi: error reading "1981928928921": Numerical result out of range select * from test where f=1981928928921; f --- (0 rows) Tom Lane wrote: > We've spent much effort trying to solve the "int8col = 42 doesn't use > an index" class of problems. AFAIR, all the recent tries have focused > on trying to get the parser to choose an index-compatible operator > initially. (In this example, that would mean promoting 42 to int8 so > that int8 = int8 would be chosen at parse time.) While I have not > completely given up hope on that approach, it's clearly fraught with > potential for unwanted side-effects. > > The other place we could try to fix it is to improve either the planner > or the index AMs themselves to cope with cross-type comparisons for > themselves. I have shied away from that thought because it seemed > unreasonably difficult, but with the parser-side solution looking harder > and harder, maybe it's time to re-evaluate. > > There actually was code in the planner at one time to substitute > index-compatible operators for non-index-compatible operators. > The algorithm was basically "if I have indexcolumn OP somevalue, where > OP isn't a member of the index opclass but also somevalue is of a > different datatype than indexcolumn, then look to see if somevalue can > be binary-coerced to the datatype of indexcolumn. If so, look for an > operator of the same name as OP and accepting the indexcolumn datatype > on both sides. If found, and it's a member of the index opclass, then > use that operator instead of the original." > > This algorithm was wrong on both practical and theoretical levels; > in the first place it's not very helpful to only be able to handle > binary-compatible transformations, and in the second place there isn't > any good guarantee that it's not changing the semantics when it replaces > the operator. For instance int4 < and oid < do not act the same. > Depending on equality of operator names was a bad idea even then, and > would be quite unworkable now in the world of schema search paths. > > However, those objections really stem from the planner not having enough > information to know when the transformation was safe to make. What if > we gave it that information? > > After some thought I think the most practical approach is to make a new > system catalog for "secondary members of index opclasses". A secondary > member is not one of the operators that the index can handle directly, > but it can be transformed into one of the primary members. The new > catalog would be called, say, pg_amsecop, and would have columns like > opclass operatorid replacementop ltransform rtransform > where the first two are the primary key. The idea is when we have a > WHERE expression "indexcol OP somevalue", and we can't find OP in the > index's operator class (ie, there's no pg_amop entry for that operator > and opclass), we next look to see if the opclass/operator combination > appears in pg_amsecop. If so, then we are allowed to replace the > "indexcol OP somevalue" expression by "ltransform(indexcol) > REPLACEMENTOP rtransform(somevalue)". ltransform and rtransform are > pg_proc OIDs of cast functions, or zero where no run-time cast is > needed. replacementop is the OID of the substitute operator, which > presumably is one of the primary members of the index opclass. Now we > have an indexable expression. The presence of the entry in pg_amsecop > represents the opclass designer's promise to us that this is a valid, > semantics-preserving transformation. > > This design might be overly general --- for example, I doubt there can > be any need to apply a cast function to the indexable column. We could > eliminate the ltransform column and probably also halve the number of > entries in the table if we insist that the indexable column be on the > left before we start looking (in other words, "somevalue OP indexcol" > must be commuted before we look for the operator in pg_amsecop, not after). > > This is not a 100% solution to our problems. I don't think we could use > it to solve the problem for int2 columns ("int2col = 42") because it'd > be unsafe to promise that an int4-to-int2 cast could be inserted into > an expression without changing the behavior. So we'd still want to look > at having small integer constants be initially typed as int2, which > leaves us with a number of unsolved issues, as noted here: > http://archives.postgresql.org/pgsql-hackers/2002-11/msg00468.php > So maybe the "secondary operator" idea will help, or maybe it won't do > much for us. > > Comments? Does this spur any better ideas? > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faqs/FAQ.html >
pgsql-hackers by date: