disfavoring unparameterized nested loops - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | disfavoring unparameterized nested loops |
Date | |
Msg-id | CA+TgmoYtWXNpj6D92XxUfjT_YFmi2dWq1XXM9EY-CRcr2qmqbg@mail.gmail.com Whole thread Raw |
Responses |
Re: disfavoring unparameterized nested loops
Re: disfavoring unparameterized nested loops |
List | pgsql-hackers |
From time to time, someone tells me that they've configured enable_nestloop=false on postgresql.conf, which is a pretty bad idea since there are a significant number of cases where such plans are the only reasonable way of executing some query. However, it's no great secret that PostgreSQL's optimizer sometimes produces nested loops that are very, very, very slow, generally because it has grossly underestimated the cardinality of the inner side of the nested loop. The frustration which users experience as a result of such events is understandable. I read https://15721.courses.cs.cmu.edu/spring2020/papers/22-costmodels/p204-leis.pdf today and found out that the authors of that paper did something a bit more nuanced which, in their experiments, was very successful. It sounds like what they basically did is disabled unparameterized nested loops. They argue that such nested loops figure to gain very little as compared with a hash join, but that they might turn out to lose a lot if the cardinality estimation is inaccurate, and they present experimental results to back up those claims. One observation that the paper makes along the way is that every system they tested is more likely to underestimate the cardinality of joins than to overestimate it, and that this tendency becomes more pronounced as the size of the join planning problem increases. On reflection, I believe this matches my experience, and it makes sense that it should be so, since it occasionally happens that the join selectivity estimate is essentially zero, and a bigger join problem is more likely to have at least one such case. On the other hand, the join selectivity estimate can never be +infinity. Hence, it's more important in general for a database system to be resilient against underestimates than to be resilient against overestimates. Being less willing to choose unparameterized nested loops is one way to move in that direction. How to do that is not very clear. One very simple thing we could do would be to introduce enable_nestloop=parameterized or enable_parameterized_nestloop=false. That is a pretty blunt instrument but the authors of the paper seem to have done that with positive results, so maybe it's actually not a dumb idea. Another approach would be to introduce a large fuzz factor for such nested loops e.g. keep them only if the cost estimate is better than the comparable hash join plan by at least a factor of N (or if no such plan is being generated for some reason). I'm not very confident that this would actually do what we want, though. In the problematic cases, a nested loop is likely to look extremely cheap, so just imagining that the cost might be higher is not very protective. Perhaps a better approach would be something like: if the estimated number of inner rows is less than 100, then re-estimate the cost of this approach and of the best available hash join on the assumption that there are 100 inner rows. If the hash join still wins, keep it; if it loses under that assumption, throw it out. I think it's likely that this approach would eliminate a large number of highly risky nested loop plans, probably even with s/100/10/g, without causing many other problems (except perhaps increased planner CPU consumption ... but maybe that's not too bad). Just to be clear, I do understand that there are cases where no Hash Join is possible, but anything we do here could be made to apply only when a hash join is in fact possible. We could also think about making the point of comparison the best other plans of any sort rather than a hash join specifically, which feels a little more principled but might actually be worse. When a Nested Loop is a stupid idea, it's stupid precisely because the inner side is big and we could've avoided recomputing it over and over by using a Hash Join instead, not because some Merge Join based plan turns out to be better. I mean, it is possible that some Merge Join plan does turn out to be better, but that's not rage-inducing in the same way. Nobody looks at a complicated join plan that happened to use a Nested Loop and says "obviously, this is inferior to a merge join," or if they do, they're probably full of hot air. But people look at complicated join plans that happen to use a Nested Loop and say "obviously, this is inferior to a hash join" *all the time* and assuming the inner path is unparameterized, they are very often correct. Thoughts? I'd be particularly curious to hear about any cases anyone knows about where an unparameterized nested loop and a hash join with batches=1 are both possible and where the unparameterized nested loop is *much* cheaper than the hash join. Thanks, -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: