Re: [HACKERS] Performance improvement for joins where outer side is unique - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [HACKERS] Performance improvement for joins where outer side is unique |
Date | |
Msg-id | 21765.1489431029@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [HACKERS] Performance improvement for joins where outer side is unique (David Rowley <david.rowley@2ndquadrant.com>) |
Responses |
Re: [HACKERS] Performance improvement for joins where outer side is unique
|
List | pgsql-hackers |
[ getting back to this patch finally... ] David Rowley <david.rowley@2ndquadrant.com> writes: > I've attached a patch which implements this, though only for > MergeJoin, else I'd imagine we'd also need to ensure all proofs used > for testing the uniqueness were also hash-able too. I added some XXX > comments in analyzejoin.c around the mergeopfamilies == NIL tests to > mention that Merge Join depends on all the unique proof quals having > mergeopfamilies. This also assumes we'll never use some subset of > mergejoin-able quals for a merge join, which could be an interesting > area in the future, as we might have some btree index on a subset of > those columns to provide pre-sorted input. In short, it's a great > optimisation, but I'm a little scared we might break it one day. Umm ... it's broken already isn't it? See the comment for generate_mergejoin_paths: * We generate mergejoins if mergejoin clauses are available. We have* two ways to generate the inner path for a mergejoin:sort the cheapest* inner path, or use an inner path that is already suitably ordered for the* merge. If we haveseveral mergeclauses, it could be that there is no inner* path (or only a very expensive one) for the full list of mergeclauses,but* better paths exist if we truncate the mergeclause list (thereby discarding* some sort key requirements). So, we consider truncations of the* mergeclause list as well as the full list. (Ideally we'd consider all*subsets of the mergeclause list, but that seems way too expensive.) There's another, more subtle, way in which it could fail in sort_inner_and_outer(): * Each possible ordering of the available mergejoin clauses will generate* a differently-sorted result path at essentiallythe same cost. We have* no basis for choosing one over another at this level of joining, but* some sort ordersmay be more useful than others for higher-level* mergejoins, so it's worth considering multiple orderings.** Actually,it's not quite true that every mergeclause ordering will* generate a different path order, because some of the clausesmay be* partially redundant (refer to the same EquivalenceClasses). Therefore,* what we do is convert the mergeclauselist to a list of canonical* pathkeys, and then consider different orderings of the pathkeys. I'm fairly sure that it's possible to end up with fewer pathkeys than there are mergeclauses in this code path. Now, you might be all right anyway given that the mergeclauses must refer to the same ECs in such a case --- maybe they're fully redundant and we can take testing the included clause as proving the omitted one(s) too. I'm not certain right now what I meant by "partially redundant" in this comment. But in any case, it's moot for the present purpose because generate_mergejoin_paths certainly breaks your assumption. regards, tom lane
pgsql-hackers by date: