Re: Parameterized-path cost comparisons need some work - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Parameterized-path cost comparisons need some work |
Date | |
Msg-id | 27109.1334258877@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Parameterized-path cost comparisons need some work (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Parameterized-path cost comparisons need some work
|
List | pgsql-hackers |
I wrote: > So I'm back to thinking we need to look explicitly at the rowcount > comparison as well as all the existing conditions in add_path. > One annoying thing about that is that it will reduce the usefulness of > add_path_precheck, because that's called before we compute the rowcount > estimates (and indeed not having to make the rowcount estimates is one > of the major savings from the precheck). I think what we'll have to do > is assume that a difference in parameterization could result in a > difference in rowcount, and hence only a dominant path with exactly the > same parameterization can result in failing the precheck. I've been experimenting some more with this, and have observed that in the test cases I'm using, adding rowcount as an additional criterion in add_path doesn't cost much of anything: it doesn't seem to affect the runtime significantly, and it only seldom changes the keep/reject decisions. So that's good news. Unfortunately, the precheck situation is actually worse than I thought: there are plenty of cases where parameterized paths can have the exact same parameterization (that is, same sets of required outer rels) and yet have different row estimates, because one might use different join clauses than the other. All you need to be at risk is more than one join clause between the same two rels, with those clauses matching different indexes or index columns. This entirely destroys the logic of add_path_precheck as currently constituted, because it implies we can never reject a parameterized path before computing its rowcount. I said upthread that I wouldn't cry if we got rid of add_path_precheck again, but it still looks like that would cost us a noticeable hit in planning speed. I've considered three other alternatives: 1. Lobotomize add_path_precheck so it always returns true for a parameterized path. This sounds horrid, but in the test cases I'm using it seems that this only results in doing the full path construction for a very small number of additional paths. 2. Refactor so that we obtain the row estimate during the first not the second cost estimation step. This doesn't look promising; I have not actually coded and tested it, but eyeballing gprof numbers for the current code suggests it would give back a considerable percentage of the savings from having a precheck at all. 3. Rearrange plan generation so that a parameterized path always uses all join clauses available from the specified outer rels. (Any that don't work as indexquals would have to be applied as filter conditions.) If we did that, then we would be back to a situation where all paths with the same parameterization should yield the same rowcount, thus justifying letting add_path_precheck work as it does now. #3 would amount to pushing quals that would otherwise be checked at the nestloop join node down to the lowest inner-relation level where they could be checked. This is something I'd suspected would be a good idea to start with, but hadn't gotten around to implementing for non-index quals. It had not occurred to me that it might simplify cost estimation to always do that. I'm going to take a closer look at #3, but it may not be practical to try to squeeze it into 9.2; if not, I think #1 will do as a stopgap. regards, tom lane
pgsql-hackers by date: