Re: Should Oracle outperform PostgreSQL on a complex - Mailing list pgsql-performance
From | Mark Kirkwood |
---|---|
Subject | Re: Should Oracle outperform PostgreSQL on a complex |
Date | |
Msg-id | 43A5DF23.5010504@paradise.net.nz Whole thread Raw |
In response to | Re: Should Oracle outperform PostgreSQL on a complex (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: Should Oracle outperform PostgreSQL on a complex
|
List | pgsql-performance |
Simon Riggs wrote: > On Sat, 2005-12-17 at 13:13 -0500, Tom Lane wrote: > >>Simon Riggs <simon@2ndquadrant.com> writes: >> >>>On Fri, 2005-12-16 at 23:28 -0500, Bruce Momjian wrote: >>> >>>>How are star joins different from what we do now? >> >>>Methods: >>>1. join all N small tables together in a cartesian product, then join to >>>main Large table once (rather than N times) >> >>Of course, the reason the current planner does not think of this is that >>it does not consider clauseless joins unless there is no alternative. > > > Understood > > >> The plan you currently get for >>this sort of scenario is typically a nest of hash joins: >> >> QUERY PLAN >>------------------------------------------------------------------------ >> Hash Join (cost=2.25..4652.25 rows=102400 width=16) >> Hash Cond: ("outer".f1 = "inner".f1) >> -> Hash Join (cost=1.12..3115.12 rows=102400 width=12) >> Hash Cond: ("outer".f2 = "inner".f1) >> -> Seq Scan on fact (cost=0.00..1578.00 rows=102400 width=8) >> -> Hash (cost=1.10..1.10 rows=10 width=4) >> -> Seq Scan on d2 (cost=0.00..1.10 rows=10 width=4) >> -> Hash (cost=1.10..1.10 rows=10 width=4) >> -> Seq Scan on d1 (cost=0.00..1.10 rows=10 width=4) >>(9 rows) >> >>This involves only one scan of the fact table. As each row is pulled up >>through the nest of hash joins, we hash one dimension key and join to >>one small table at each level. > > > Understood > > >>This is at worst the same amount of work >>as hashing all the keys at once and probing a single cartesian-product >>hashtable, probably less work (fewer wasted key-comparisons). And >>definitely less memory. You do have to keep your eye on the ball that >>you don't waste a lot of overhead propagating the row up through >>multiple join levels, but we got rid of most of the problem there in >>8.1 via the introduction of "virtual tuple slots". If this isn't fast >>enough yet, it'd make more sense to invest effort in further cutting the >>executor's join overhead (which of course benefits *all* plan types) >>than in trying to make the planner choose a star join. > > > That join type is used when an index-organised table is available, so > that a SeqScan of the larger table can be avoided. > > I'd say the plan would make sense if the columns of the cartesian > product match a multi-column index on the larger table that would not > ever be used unless sufficient columns are restricted in each lookup. > That way you are able to avoid the SeqScan that occurs for the multiple > nested Hash Join case. (Clearly, normal selectivity rules apply on the > use of the index in this way). > > So I think that plan type still can be effective in some circumstances. > Mind you: building an N-way index on a large table isn't such a good > idea, unless you can partition the tables and still use a join. Which is > why I've not centred on this case as being important before now. > > My understanding: Teradata and DB2 use this. > FWIW - I think DB2 uses the successive fact RID buildup (i.e method 2), unfortunately I haven't got a working copy of DB2 in front of me to test. Cheers Mark
pgsql-performance by date: