Re: Fixing row comparison semantics - Mailing list pgsql-hackers
From | Martijn van Oosterhout |
---|---|
Subject | Re: Fixing row comparison semantics |
Date | |
Msg-id | 20051224095547.GA617@svana.org Whole thread Raw |
In response to | Fixing row comparison semantics (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Fixing row comparison semantics
Re: Fixing row comparison semantics |
List | pgsql-hackers |
On Fri, Dec 23, 2005 at 03:18:21PM -0500, Tom Lane wrote: > I've gotten interested again in the issue of row comparisons, eg > (a, b, c) >= (1, 2, 3) > We've discussed this before, the most comprehensive thread being > http://archives.postgresql.org/pgsql-performance/2004-07/msg00188.php > but nothing's gotten done. Unless someone's already working on this > I think I will take it up. <snip> Since this is related to the COLLATE stuff I'm working on I'd like to make a few comments. > 1. Determine which interpretations (btree strategy numbers) exist for > each pairwise operator. There must be at least one interpretation that > is common to all the operators, else fail (for instance, it doesn't help > if we can identify one operator as "<" and another as ">"). One thing my COLLATE patch does is distinguish between collations and operator classes. So the reverse operator class issue disappears because it's just a collation and doesn't need a operator class (although it won't break anything, see below). > 2. If there is more than one common interpretation, prefer the one that > uses the largest number of default opclasses. If there's a tie, we > could either reject the construct as ambiguous, or select one of the > possibilities arbitrarily ... any thoughts about that? In standard SQL, each node in a query has a collation. Columns use the collation they were given when the table was created, constants use the default for the type. It's a little more complicated than that, see the standard for details. Anyway, a collation identifies a btree operator class so this problem solves itself. For each pair of values you are comparing, determine the collation and look up the operator class to ensure you're using the same strategy type. There are minor details relating to reverse collations but they're minor. The only problem reverse operator classes bring here is that the system won't realise it and thus won't know that the index is usable. Unless the user specifies the collation as part of the query. > 3. A given operator could have the selected interpretation in more than > one opclass. Prefer the default opclass if any; otherwise, again we > have the choice of rejecting or making an arbitrary choice. If there's a problem, bail. The standard allows you to specify the collation on a per node basis so any ambiguities can be resolved by the user. So something like: (a COLLATE hungarian, b COLLATE posix, c COLLATE ignorecase) >= ('x','y','z') Would know exactly what to do (and if you could use an index)... Now, since COLLATE support is still in progress, I'm not sure how much any of this helps you. I'm up to modifying the scankeys but it's hard when you jave to keep rgrepping the tree to work out what is called from where... For other people reading this thread, the reason why it can't be decomposed into (a>=1 AND b>=2 AND c>=3) is because the standard treats the row as a unit, checking left to right, so: (4,0,0) < (5,0,0) (1,2,3) > (0,7,8) So it needs a new node type and needs to know which index to use. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
pgsql-hackers by date: