Home > mailing lists
Fuzzy cost comparison to eliminate redundant planning work - Mailing list pgsql-hackers

From	Tom Lane
Subject	Fuzzy cost comparison to eliminate redundant planning work
Date	March 28, 2004 13:34:02
Msg-id	16695.1080495239@sss.pgh.pa.us Whole thread Raw
Responses	Re: Fuzzy cost comparison to eliminate redundant planning Re: Fuzzy cost comparison to eliminate redundant planning Re: Fuzzy cost comparison to eliminate redundant planning
List	pgsql-hackers
Tree view
I've been looking at the planner performance problem exhibited by
Eric Brown:
http://archives.postgresql.org/pgsql-performance/2004-03/msg00273.php

While a nine-way join is inherently going to take some time to plan
(if you don't constrain the search space with JOIN), it seemed to me
that this particular query was taking even longer than I'd expect.

I dug into it a little bit and found that the problem is that the
planner keeps many more paths than it really ought to.  For example,
at one random stopping point I found a list of paths that included
these three unordered paths:
  {NESTPATH   :startup_cost 0.01   :total_cost 1634.63   :pathkeys <>   }  {NESTPATH   :startup_cost 0.005  :total_cost
1642.65  :pathkeys <>   }  {NESTPATH   :startup_cost 0.00   :total_cost 1650.66   :pathkeys <>   }
 

There were also a lot of ordered paths that had similarly
trivially-different cost estimates.

The reason these are all being kept is that add_path() will keep paths
of the same ordering (same pathkeys) if they rank differently on startup
cost than they do on total cost.  Now that's a good rule in the
abstract, but a human looking at these numbers would certainly say that
it's being taken to extremes.  There isn't enough difference in these
startup costs to justify treating them as different ... especially given
the inherent inaccuracy of the estimates.

As an experiment, I made the attached patch that converts add_path's
cost comparisons to "fuzzy" comparisons --- two paths are considered to
have the same cost if it differs by less than 1% of the smaller
total_cost.  I found that this reduced the planning time of Eric's
query by about 40%, without changing the resulting plan.  On more
typical queries where the relations all have different statistics,
I doubt it would have as much effect, but I'm inclined to apply the
change anyway.

Comments?
        regards, tom lane

*** src/backend/optimizer/util/pathnode.c.orig    Tue Mar  2 11:42:20 2004
--- src/backend/optimizer/util/pathnode.c    Sun Mar 28 11:53:33 2004
***************
*** 85,90 ****
--- 85,160 ---- }  /*
+  * compare_fuzzy_path_costs
+  *      Return -1, 0, or +1 according as path1 is cheaper, the same cost,
+  *      or more expensive than path2 for the specified criterion.
+  *
+  * This differs from compare_path_costs in that we consider the costs the
+  * same if they agree to within a "fuzz factor".  This is used by add_path
+  * to avoid keeping both of a pair of paths that really have insignificantly
+  * different cost.
+  */
+ static int
+ compare_fuzzy_path_costs(Path *path1, Path *path2, CostSelector criterion)
+ {
+     Cost        fuzz;
+ 
+     /*
+      * The fuzz factor is set at one percent of the smaller total_cost, but
+      * not less than 0.01 cost units (just in case total cost is zero).
+      *
+      * XXX does this percentage need to be user-configurable?
+      */
+     fuzz = Min(path1->total_cost, path2->total_cost) * 0.01;
+     fuzz = Max(fuzz, 0.01);
+ 
+     if (criterion == STARTUP_COST)
+     {
+         if (Abs(path1->startup_cost - path2->startup_cost) > fuzz)
+         {
+             if (path1->startup_cost < path2->startup_cost)
+                 return -1;
+             else
+                 return +1;
+         }
+ 
+         /*
+          * If paths have the same startup cost (not at all unlikely),
+          * order them by total cost.
+          */
+         if (Abs(path1->total_cost - path2->total_cost) > fuzz)
+         {
+             if (path1->total_cost < path2->total_cost)
+                 return -1;
+             else
+                 return +1;
+         }
+     }
+     else
+     {
+         if (Abs(path1->total_cost - path2->total_cost) > fuzz)
+         {
+             if (path1->total_cost < path2->total_cost)
+                 return -1;
+             else
+                 return +1;
+         }
+ 
+         /*
+          * If paths have the same total cost, order them by startup cost.
+          */
+         if (Abs(path1->startup_cost - path2->startup_cost) > fuzz)
+         {
+             if (path1->startup_cost < path2->startup_cost)
+                 return -1;
+             else
+                 return +1;
+         }
+     }
+     return 0;
+ }
+ 
+ /*  * compare_path_fractional_costs  *      Return -1, 0, or +1 according as path1 is cheaper, the same cost,  *
ormore expensive than path2 for fetching the specified fraction
 
***************
*** 215,243 ****         bool        remove_old = false; /* unless new proves superior */         int
costcmp;
 
!         costcmp = compare_path_costs(new_path, old_path, TOTAL_COST);          /*          * If the two paths compare
differentlyfor startup and total          * cost, then we want to keep both, and we can skip the (much          *
slower)comparison of pathkeys.    If they compare the same,          * proceed with the pathkeys comparison.  Note:
thistest relies
 
!          * on the fact that compare_path_costs will only return 0 if both
!          * costs are equal (and, therefore, there's no need to call it
!          * twice in that case).          */         if (costcmp == 0 ||
!             costcmp == compare_path_costs(new_path, old_path,
!                                           STARTUP_COST))         {             switch
(compare_pathkeys(new_path->pathkeys,old_path->pathkeys))             {                 case PATHKEYS_EQUAL:
        if (costcmp < 0)                         remove_old = true;        /* new dominates old */
else
!                         accept_new = false;        /* old equals or dominates
        * new */                     break;                 case PATHKEYS_BETTER1:                     if (costcmp <=
0)
--- 285,331 ----         bool        remove_old = false; /* unless new proves superior */         int
costcmp;
 
!         /*
!          * As of Postgres 7.5, we use fuzzy cost comparison to avoid wasting
!          * cycles keeping paths that are really not significantly different
!          * in cost.
!          */
!         costcmp = compare_fuzzy_path_costs(new_path, old_path, TOTAL_COST);          /*          * If the two paths
comparedifferently for startup and total          * cost, then we want to keep both, and we can skip the (much
*slower) comparison of pathkeys.    If they compare the same,          * proceed with the pathkeys comparison.  Note:
thistest relies
 
!          * on the fact that compare_fuzzy_path_costs will only return 0 if
!          * both costs are effectively equal (and, therefore, there's no need
!          * to call it twice in that case).          */         if (costcmp == 0 ||
!             costcmp == compare_fuzzy_path_costs(new_path, old_path,
!                                                 STARTUP_COST))         {             switch
(compare_pathkeys(new_path->pathkeys,old_path->pathkeys))             {                 case PATHKEYS_EQUAL:
        if (costcmp < 0)                         remove_old = true;        /* new dominates old */
 
+                     else if (costcmp > 0)
+                         accept_new = false;        /* old dominates new */                     else
!                     {
!                         /*
!                          * Same pathkeys, and fuzzily the same cost, so
!                          * keep just one --- but we'll do an exact cost
!                          * comparison to decide which.
!                          */
!                         if (compare_path_costs(new_path, old_path,
!                                                TOTAL_COST) < 0)
!                             remove_old = true; /* new dominates old */
!                         else
!                             accept_new = false;    /* old equals or dominates
        * new */
 
+                     }                     break;                 case PATHKEYS_BETTER1:                     if
(costcmp<= 0)
pgsql-hackers by date:
From: Diego Montenegro
Date: 28 March 2004, 13:30:48
Subject: Flush to Disk
From: Manfred Spraul
Date: 28 March 2004, 14:21:20
Subject: Re: Flush to Disk
Fuzzy cost comparison to eliminate redundant planning work - Mailing list pgsql-hackers

Previous

Next