Proposed fixes for planner regressions from June to release - Mailing list pgsql-patches
From | Tom Lane |
---|---|
Subject | Proposed fixes for planner regressions from June to release |
Date | |
Msg-id | 13943.1165780664@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Proposed fixes for planner regressions from June torelease
|
List | pgsql-patches |
Arjen van der Meijden reported here: http://archives.postgresql.org/pgsql-performance/2006-12/msg00041.php that 8.2rc1 was noticeably slower on his query mix than a CVS pull of 3-June-2006 had been. I've been looking into it with him off-list, and I believe that the problem is one of poor planning due to misestimation of costs for queries with large IN-lists (that is, indexable ScalarArrayOpExpr conditions with large array constants). Specifically the planner tends to choose bitmapscans when it shouldn't. This code is all new in 8.2 so it's not exactly surprising that there might be a few glitches. His development snapshot coincidentally avoided the problem because that was just before we tweaked the planner to favor indexscans (including bitmap scans) more on the inside of nestloops. The attached proposed patch brings Arjen's query mix back to being as fast or faster than the 3-June snapshot. There are four elements to the patch; in order of increasing controversialness they are: 1. Fix a flat-out logic error in genericcostestimate()'s handling of the CPU cost estimate for a scan involving a ScalarArrayOpExpr indexqual (which results in multiple physical indexscans within the BitmapIndexScan plan node). numIndexTuples is per physical scan, so it has to be multiplied by num_sa_scans to get the correct per-plan-node-execution cost estimate. 2. Fix another logic error that occurred when a ScalarArrayOpExpr qual applies to a lower-order index column that doesn't have equality quals on all the index columns to its left. In this situation the ScalarArrayOpExpr doesn't contribute to reducing the fraction of the index that has to be scanned, but genericcostestimate() was still dividing numIndexTuples by num_sa_scans. This is a case of "premature optimization is the root of all evil" :-( ... I was trying to avoid calculating num_sa_scans twice, but in fact it's necessary to compute it separately in btcostestimate and genericcostestimate, since lower-order ScalarArrayOpExprs should be included in the count for the latter's purposes and not the former's. So, do the correction of numIndexTuples in btcostestimate, and don't let genericcostestimate adjust a passed-in estimate. 3. In costsize.c, charge a small amount extra per tuple retrieved by a bitmap indexscan (I set it at 0.1 * cpu_operator_cost), on the grounds that entering the tuple into the bitmap should cost something. The real reason for doing this though is that for the case where a nestloop with inner indexscan expects to retrieve a single tuple from the inner relation on each iteration, 8.2 release is computing exactly the same cost (within roundoff error) to do the inner scan as a plain or bitmap indexscan --- and depending on how the roundoff error goes, it not infrequently chooses the bitmap scan. This is obviously silly, since a bitmap scan has more overhead than a plain indexscan, and no way to win when retrieving a single heap tuple. There is more than one way we could deal with this problem, but adding an explicit CPU-cost charge for the extra overhead seems reasonable. 4. In genericcostestimate(), add a CPU-cost charge to reflect the CPU effort involved in initiating an indexscan (for instance, the analysis of the index keys that btree does). To some extent this charge also covers the cost of descending the btree. Before 8.2 we had effectively priced those costs in by counting an explicit disk access to the btree metapage, which was clearly too high --- but it seems that zero is too low, too. I set the number at 100 * cpu_operator_cost, which may sound high but it seems about right on the strength of Arjen's measurements. It's possible that we should make it vary with the number of indexquals; anyone have an opinion about that? Comments in general? I'm a bit worried that these changes will bias us against indexscans too much, but there's no sign of that in Arjen's results. regards, tom lane Index: src/backend/optimizer/path/costsize.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/optimizer/path/costsize.c,v retrieving revision 1.169 diff -c -r1.169 costsize.c *** src/backend/optimizer/path/costsize.c 11 Nov 2006 01:14:19 -0000 1.169 --- src/backend/optimizer/path/costsize.c 10 Dec 2006 19:19:56 -0000 *************** *** 614,619 **** --- 614,626 ---- { *cost = ((IndexPath *) path)->indextotalcost; *selec = ((IndexPath *) path)->indexselectivity; + /* + * Charge a small amount per retrieved tuple to reflect the costs of + * manipulating the bitmap. This is mostly to make sure that a bitmap + * scan doesn't look to be the same cost as an indexscan to retrieve + * a single tuple. + */ + *cost += 0.1 * cpu_operator_cost * ((IndexPath *) path)->rows; } else if (IsA(path, BitmapAndPath)) { Index: src/backend/utils/adt/selfuncs.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/utils/adt/selfuncs.c,v retrieving revision 1.214 diff -c -r1.214 selfuncs.c *** src/backend/utils/adt/selfuncs.c 4 Oct 2006 00:29:59 -0000 1.214 --- src/backend/utils/adt/selfuncs.c 10 Dec 2006 19:19:56 -0000 *************** *** 4673,4686 **** * way in order to get the right answer for partial indexes. */ if (numIndexTuples <= 0.0) numIndexTuples = *indexSelectivity * index->rel->tuples; ! /* ! * The estimate obtained so far counts all the tuples returned by all ! * scans of ScalarArrayOpExpr calls. We want to consider the per-scan ! * number, so adjust. This is a handy place to round to integer, too. ! */ ! numIndexTuples = rint(numIndexTuples / num_sa_scans); /* * We can bound the number of tuples by the index size in any case. Also, --- 4673,4690 ---- * way in order to get the right answer for partial indexes. */ if (numIndexTuples <= 0.0) + { numIndexTuples = *indexSelectivity * index->rel->tuples; ! /* ! * The above calculation counts all the tuples visited across all ! * scans induced by ScalarArrayOpExpr nodes. We want to consider the ! * average per-indexscan number, so adjust. This is a handy place to ! * round to integer, too. (If caller supplied tuple estimate, it's ! * responsible for handling these considerations.) ! */ ! numIndexTuples = rint(numIndexTuples / num_sa_scans); ! } /* * We can bound the number of tuples by the index size in any case. Also, *************** *** 4786,4792 **** * evaluated once at the start of the scan to reduce them to runtime keys * to pass to the index AM (see nodeIndexscan.c). We model the per-tuple * CPU costs as cpu_index_tuple_cost plus one cpu_operator_cost per ! * indexqual operator. * * Note: this neglects the possible costs of rechecking lossy operators * and OR-clause expressions. Detecting that that might be needed seems --- 4790,4798 ---- * evaluated once at the start of the scan to reduce them to runtime keys * to pass to the index AM (see nodeIndexscan.c). We model the per-tuple * CPU costs as cpu_index_tuple_cost plus one cpu_operator_cost per ! * indexqual operator. Because we have numIndexTuples as a per-scan ! * number, we have to multiply by num_sa_scans to get the correct result ! * for ScalarArrayOpExpr cases. * * Note: this neglects the possible costs of rechecking lossy operators * and OR-clause expressions. Detecting that that might be needed seems *************** *** 4801,4807 **** qual_arg_cost = 0; *indexStartupCost = qual_arg_cost; *indexTotalCost += qual_arg_cost; ! *indexTotalCost += numIndexTuples * (cpu_index_tuple_cost + qual_op_cost); /* * Generic assumption about index correlation: there isn't any. --- 4807,4826 ---- qual_arg_cost = 0; *indexStartupCost = qual_arg_cost; *indexTotalCost += qual_arg_cost; ! *indexTotalCost += numIndexTuples * num_sa_scans * (cpu_index_tuple_cost + qual_op_cost); ! ! /* ! * We also add a CPU-cost component to represent the general costs of ! * starting an indexscan (such as analysis of btree index keys). This ! * is estimated at 100x cpu_operator_cost, which is a bit arbitrary but ! * seems the right order of magnitude. ! * ! * Although this is startup cost with respect to any one scan, we add ! * it to the "total" cost component because it's only very interesting ! * in the many-ScalarArrayOpExpr-scan case, and there it will be paid ! * over the life of the scan node. ! */ ! *indexTotalCost += num_sa_scans * 100.0 * cpu_operator_cost; /* * Generic assumption about index correlation: there isn't any. *************** *** 4829,4834 **** --- 4848,4854 ---- int indexcol; bool eqQualHere; bool found_saop; + double num_sa_scans; ListCell *l; /* *************** *** 4852,4857 **** --- 4872,4878 ---- indexcol = 0; eqQualHere = false; found_saop = false; + num_sa_scans = 1; foreach(l, indexQuals) { RestrictInfo *rinfo = (RestrictInfo *) lfirst(l); *************** *** 4928,4933 **** --- 4949,4963 ---- Assert(op_strategy != 0); /* not a member of opclass?? */ if (op_strategy == BTEqualStrategyNumber) eqQualHere = true; + /* count up number of SA scans induced by indexBoundQuals only */ + if (IsA(clause, ScalarArrayOpExpr)) + { + ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause; + int alength = estimate_array_length(lsecond(saop->args)); + + if (alength > 1) + num_sa_scans *= alength; + } indexBoundQuals = lappend(indexBoundQuals, rinfo); } *************** *** 4949,4954 **** --- 4979,4990 ---- index->rel->relid, JOIN_INNER); numIndexTuples = btreeSelectivity * index->rel->tuples; + /* + * As in genericcostestimate(), we have to adjust for any + * ScalarArrayOpExpr quals included in indexBoundQuals, and then + * round to integer. + */ + numIndexTuples = rint(numIndexTuples / num_sa_scans); } genericcostestimate(root, index, indexQuals, outer_rel, numIndexTuples,
pgsql-patches by date: