Re: Design notes for EquivalenceClasses - Mailing list pgsql-hackers
From | Gavin Sherry |
---|---|
Subject | Re: Design notes for EquivalenceClasses |
Date | |
Msg-id | Pine.LNX.4.58.0701181052040.25798@linuxworld.com.au Whole thread Raw |
In response to | Design notes for EquivalenceClasses (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Design notes for EquivalenceClasses
|
List | pgsql-hackers |
On Wed, 17 Jan 2007, Tom Lane wrote: > strict. However, we also allow equivalence clauses that appear below the > nullable side of an outer join to form EquivalenceClasses; for these > classes, the interpretation is that either all the values are equal, or > all (except pseudo-constants) have gone to null. (This requires a > limitation that non-constant members be strict, else they might not go > to null when the other members do.) Consider for example > > SELECT * > FROM a LEFT JOIN > (SELECT * FROM b JOIN c ON b.y = c.z WHERE b.y = 10) ss > ON a.x = ss.y > WHERE a.x = 42; > > We can form the below-outer-join EquivalenceClass {b.y c.z 10} and thereby > apply c.z = 10 while scanning c. (The reason we disallow outerjoin-delayed > clauses from forming EquivalenceClasses is exactly that we want to be able > to push any derived clauses as far down as possible.) But once above the > outer join it's no longer necessarily the case that b.y = 10, and thus we > cannot use such EquivalenceClasses to conclude that sorting is unnecessary > (see discussion of PathKeys below). In this example, notice also that > a.x = ss.y (really a.x = b.y) is not an equivalence clause because its > applicability to b is restricted by the outer join; thus we do not make > the mistake of concluding b.y = 42, even though we do have an equivalence > class for {a.x 42}. This seems to be correct. A nice improvement. > With a little further thought, it becomes apparent that nestloop joins > can also produce sorted output. For example, if we do a nestloop join > between outer relation A and inner relation B, then any pathkeys relevant > to A are still valid for the join result: we have not altered the order of > the tuples from A. Even more interesting, if there was an equivalence clause > A.X=B.Y, and A.X was a pathkey for the outer relation A, then we can assert > that B.Y is a pathkey for the join result; X was ordered before and still > is, and the joined values of Y are equal to the joined values of X, so Y > must now be ordered too. This is true even though we used neither an > explicit sort nor a mergejoin on Y. (Note: hash joins cannot be counted > on to preserve the order of their outer relation, because the executor > might decide to "batch" the join, so we always set pathkeys to NIL for > a hashjoin path.) Exception: a RIGHT or FULL join doesn't preserve the > ordering of its outer relation, because it might insert nulls at random > points in the ordering. I was thinking about this, but in relation to hash joins. A hash join cannot be guaranteed to produce output sorted according to the pathkey of the outer relation (as explained in the existing README). I wonder, however, if it might be useful for hash join to pass a hint that the output is known ordered (i.e., the join was not split into multiple batches). Thanks, Gavin
pgsql-hackers by date: