Re: Less selective index chosen unexpectedly - Mailing list pgsql-bugs
From | David Rowley |
---|---|
Subject | Re: Less selective index chosen unexpectedly |
Date | |
Msg-id | CAApHDvp8TYccf9mN_7ZAogh2jZYeNA-8UPCQZLBhuef1z=rhkg@mail.gmail.com Whole thread Raw |
In response to | Re: Less selective index chosen unexpectedly (James Coleman <jtc331@gmail.com>) |
Responses |
Re: Less selective index chosen unexpectedly
|
List | pgsql-bugs |
On Thu, 20 May 2021 at 02:54, James Coleman <jtc331@gmail.com> wrote: > What I'm interested in here is that, at least in this case (and I'm > theorizing it's more generalizable, but that's why I wanted to get > discussion on it) the risk of getting selectivity wrong is quite high. > And it seems to me that a more specific index (though larger and > costlier to descend) is generally lower risk, albeit higher cost in > best cases for the less specific index. In this particular example it > wouldn't even matter which permutation of column ordering you have in > the more specific index -- it's always guaranteed to return the first > row that's found by descending the tree (excluding vacuumable rows). Unfortunately we don't currently include risk in our cost model. Today's planner is fairly happy to dance blindfolded at the top of tall cliffs. Unfortunately we fall off them a little too often, such as in cases like you describe. It would be good to one day give the planner some proper lessons in safety. I have considered that we maybe should consider adding some sort of risk factor in our cost model. I mentioned something in [1] about it, but I don't really have a great idea yet as to how the "risk" cost value would be calculated. Calculating that could be hard. It might make sense to bump up the risk factor for some paths, but as we go through and add a risk level to more paths, then what we set the risk value to is suddenly much more important. It might be easy to say that the risk value should be set to what the total_cost would have to be multiplied by to get the cost for the worst possible case. However, if the worst possible case is exceedingly unlikely, then maybe that's not a great choice. The planner does often take plenty of other risks. The situation you describe here is just one of many. Another example is that if we do a join on say 3 or more conditions that those conditions could be correlated. We currently always assume they're independent and just multiply individual selectivities. We could end up thinking that the join will produce far fewer rows than it actually will. Problems can arise in subsequent joins. We may end up doing a subsequent join using a nested loop thinking that only 1 row will be produced from the first join. That is likely to turn bad if the innermost join instead produces millions of rows. Unfortunately extended statistics don't yet help us with join selectivities. We also take risks with LIMIT clauses as we assume the non-filtered rows will be completely evenly distributed through the result. That ends badly when the rows we want are right at the end. David [1] https://www.postgresql.org/message-id/CAApHDvqBoYU8aES4a0t-J15wk1wPMFJDHcyafyfHj7JqJ+u9wg@mail.gmail.com
pgsql-bugs by date: