New thoughts about indexing cross-type comparisons - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | New thoughts about indexing cross-type comparisons |
Date | |
Msg-id | 18030.1063729837@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: New thoughts about indexing cross-type comparisons
Re: New thoughts about indexing cross-type comparisons |
List | pgsql-hackers |
We've spent much effort trying to solve the "int8col = 42 doesn't use an index" class of problems. AFAIR, all the recent tries have focused on trying to get the parser to choose an index-compatible operator initially. (In this example, that would mean promoting 42 to int8 so that int8 = int8 would be chosen at parse time.) While I have not completely given up hope on that approach, it's clearly fraught with potential for unwanted side-effects. The other place we could try to fix it is to improve either the planner or the index AMs themselves to cope with cross-type comparisons for themselves. I have shied away from that thought because it seemed unreasonably difficult, but with the parser-side solution looking harder and harder, maybe it's time to re-evaluate. There actually was code in the planner at one time to substitute index-compatible operators for non-index-compatible operators. The algorithm was basically "if I have indexcolumn OP somevalue, where OP isn't a member of the index opclass but also somevalue is of a different datatype than indexcolumn, then look to see if somevalue can be binary-coerced to the datatype of indexcolumn. If so, look for an operator of the same name as OP and accepting the indexcolumn datatype on both sides. If found, and it's a member of the index opclass, then use that operator instead of the original." This algorithm was wrong on both practical and theoretical levels; in the first place it's not very helpful to only be able to handle binary-compatible transformations, and in the second place there isn't any good guarantee that it's not changing the semantics when it replaces the operator. For instance int4 < and oid < do not act the same. Depending on equality of operator names was a bad idea even then, and would be quite unworkable now in the world of schema search paths. However, those objections really stem from the planner not having enough information to know when the transformation was safe to make. What if we gave it that information? After some thought I think the most practical approach is to make a new system catalog for "secondary members of index opclasses". A secondary member is not one of the operators that the index can handle directly, but it can be transformed into one of the primary members. The new catalog would be called, say, pg_amsecop, and would have columns like opclass operatorid replacementop ltransform rtransform where the first two are the primary key. The idea is when we have a WHERE expression "indexcol OP somevalue", and we can't find OP in the index's operator class (ie, there's no pg_amop entry for that operator and opclass), we next look to see if the opclass/operator combination appears in pg_amsecop. If so, then we are allowed to replace the "indexcol OP somevalue" expression by "ltransform(indexcol) REPLACEMENTOP rtransform(somevalue)". ltransform and rtransform are pg_proc OIDs of cast functions, or zero where no run-time cast is needed. replacementop is the OID of the substitute operator, which presumably is one of the primary members of the index opclass. Now we have an indexable expression. The presence of the entry in pg_amsecop represents the opclass designer's promise to us that this is a valid, semantics-preserving transformation. This design might be overly general --- for example, I doubt there can be any need to apply a cast function to the indexable column. We could eliminate the ltransform column and probably also halve the number of entries in the table if we insist that the indexable column be on the left before we start looking (in other words, "somevalue OP indexcol" must be commuted before we look for the operator in pg_amsecop, not after). This is not a 100% solution to our problems. I don't think we could use it to solve the problem for int2 columns ("int2col = 42") because it'd be unsafe to promise that an int4-to-int2 cast could be inserted into an expression without changing the behavior. So we'd still want to look at having small integer constants be initially typed as int2, which leaves us with a number of unsolved issues, as noted here: http://archives.postgresql.org/pgsql-hackers/2002-11/msg00468.php So maybe the "secondary operator" idea will help, or maybe it won't do much for us. Comments? Does this spur any better ideas? regards, tom lane
pgsql-hackers by date: