Re: Strange query optimization in 7.3.2 - Mailing list pgsql-general
From | Tom Lane |
---|---|
Subject | Re: Strange query optimization in 7.3.2 |
Date | |
Msg-id | 5249.1050385185@sss.pgh.pa.us Whole thread Raw |
In response to | Strange query optimization in 7.3.2 (Alec Mitchell <apm13@columbia.edu>) |
Responses |
Re: Strange query optimization in 7.3.2
|
List | pgsql-general |
Alec Mitchell <apm13@columbia.edu> writes: > On Monday 14 April 2003 04:08 pm, Tom Lane wrote: >> Could you see your way to sending me a dump of these tables for testing >> purposes? You could exclude the columns not used in the FROM/WHERE >> clauses, if that is needed to satisfy privacy worries. > I've attached dumps of the relevant tables (not including the stops > table (s), it is huge, and doesn't come into play until after the bad > planner estimate). Indeed, there's a problem here with nulls --- there is a case in eqjoinsel() that didn't account for nulls, and should've. I've applied the attached patch to fix it. This reduces the estimate of the tr/t/m join size from 1119 to 332, which is still large compared to the true size of 52, but it's a lot better. Without the stops table, I can't really tell if this changes the complete plan or not --- could you apply the patch and find out? BTW, the remaining error seems to be because the tr.group_num = 1 constraint skews the fraction of matching "trailer" values. I fear there's little chance of persuading the system to figure that out in the near future --- it can't be done without cross-column correlation statistics, which we do not keep. regards, tom lane *** src/backend/utils/adt/selfuncs.c.orig Sat Mar 22 20:49:13 2003 --- src/backend/utils/adt/selfuncs.c Tue Apr 15 01:13:01 2003 *************** *** 1552,1578 **** { /* * We do not have MCV lists for both sides. Estimate the join ! * selectivity as MIN(1/nd1, 1/nd2). This is plausible if we ! * assume that the values are about equally distributed: a ! * given tuple of rel1 will join to either 0 or N2/nd2 rows of ! * rel2, so total join rows are at most N1*N2/nd2 giving a ! * join selectivity of not more than 1/nd2. By the same logic ! * it is not more than 1/nd1, so MIN(1/nd1, 1/nd2) is an upper ! * bound. Using the MIN() means we estimate from the point of ! * view of the relation with smaller nd (since the larger nd ! * is determining the MIN). It is reasonable to assume that ! * most tuples in this rel will have join partners, so the ! * bound is probably reasonably tight and should be taken ! * as-is. * * XXX Can we be smarter if we have an MCV list for just one * side? It seems that if we assume equal distribution for the * other side, we end up with the same answer anyway. */ if (nd1 > nd2) ! selec = 1.0 / nd1; else ! selec = 1.0 / nd2; } if (have_mcvs1) --- 1552,1584 ---- { /* * We do not have MCV lists for both sides. Estimate the join ! * selectivity as MIN(1/nd1,1/nd2)*(1-nullfrac1)*(1-nullfrac2). ! * This is plausible if we assume that the join operator is ! * strict and the non-null values are about equally distributed: ! * a given non-null tuple of rel1 will join to either zero or ! * N2*(1-nullfrac2)/nd2 rows of rel2, so total join rows are at ! * most N1*(1-nullfrac1)*N2*(1-nullfrac2)/nd2 giving a join ! * selectivity of not more than (1-nullfrac1)*(1-nullfrac2)/nd2. ! * By the same logic it is not more than ! * (1-nullfrac1)*(1-nullfrac2)/nd1, so the expression with MIN() ! * is an upper bound. Using the MIN() means we estimate from the ! * point of view of the relation with smaller nd (since the larger ! * nd is determining the MIN). It is reasonable to assume that ! * most tuples in this rel will have join partners, so the bound ! * is probably reasonably tight and should be taken as-is. * * XXX Can we be smarter if we have an MCV list for just one * side? It seems that if we assume equal distribution for the * other side, we end up with the same answer anyway. */ + double nullfrac1 = stats1->stanullfrac; + double nullfrac2 = stats2->stanullfrac; + + selec = (1.0 - nullfrac1) * (1.0 - nullfrac2); if (nd1 > nd2) ! selec /= nd1; else ! selec /= nd2; } if (have_mcvs1)
pgsql-general by date: