Re: More thoughts about planner's cost estimates - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: More thoughts about planner's cost estimates |
Date | |
Msg-id | 14438.1149189309@sss.pgh.pa.us Whole thread Raw |
In response to | Re: More thoughts about planner's cost estimates (Josh Berkus <josh@agliodbs.com>) |
Responses |
Re: More thoughts about planner's cost estimates
|
List | pgsql-hackers |
Josh Berkus <josh@agliodbs.com> writes: > Yeah. I've refrained from proposing changes because it's a > pick-up-sticks. If we start modifying the model, we need to fix > *everything*, not just one item. And then educate our users that they > need to use the GUC variables in a different way. Here's the issues I see: > [ various deficiencies in our statistics gathering ] Those are all valid points, but pretty much orthogonal to what I'm on about today. The problems I'm thinking about are seen even when the planner's rowcount estimates are dead on (as indeed they mostly were in Philippe's example). > 5. random_page_cost (as previously discussed) is actually a funciton of > relatively immutable hardware statistics, and as such should not need to > exist as a GUC once the cost model is fixed. Well, it'll still exist as a GUC for the same reasons the other ones are GUCs. But I think the main reasons people have needed to twiddle it are(1) their database is completely RAM-resident (whereRPC *should* logically be 1.0), or(2) they're trying to compensate for the overestimation of nestloop indexscans. The changes I'm proposing should help with (2). > For btrees, we should be able to accurately estimate the cost of the > index access given: > a) the depth of the tree; Although logically the tree descent *ought* to be a factor, I haven't seen any evidence that we need to take it into account; in fact all the evidence points to the conclusion that we're better off ignoring it. My theory about why is that the upper levels of the tree stay in cache. I have to admit that's just handwaving, but counting additional disk hits to fetch the first index tuple is clearly not the direction the cost model needs to go in. Look again at the examples in Philippe's output: an indexscan fetching 1700 tuples (about 5 leaf pages probably) took 32x longer than one fetching 7 tuples (presumably all on the same leaf page). There's no way that a model that counts an additional couple of page fetches for tree descent is going to arrive at those numbers. And we see this over and over again: incredibly small times to fetch a single index tuple, at least on the average when the index is being hit many times in one query. > c) whether the index and tables are already in shared_mem, or else (d) > and (e) below: > d) the probability that the index pages are cached in memory, which is > composed of: > (1) the frequency with which that index is accessed, modified by > (2) whether the index is a fraction of available ram, or larger than ram > e) the probability that the relevant table pages are cached in memory, > determined by the same two factors. These would all be nice things to know, but I'm afraid it's pie in the sky. We have no reasonable way to get those numbers. (And if we could get them, there would be another set of problems, namely plan instability: the planner's choices would become very difficult to reproduce.) >> The big difficulty in modeling cache effects from multiple scans is that >> we usually don't know how many times the index will be scanned. > I think we can work around this by figuring out whether the index has > already been scanned in the plan, something we *can* know. So the first > scan will be full cost and remaining scans will be fractional cost. Uh, no, because what we're normally thinking about is independent probes into the index with different keys. For a small number of probes you have to figure that those all hit different leaf pages and aren't going to amortize well. As the number of probes goes up, you can start charging fractional costs because it's more likely you're hitting a leaf page you hit recently. The Mackert/Lohman equations do this reasonably well --- it's possible we can find something better, but those seem like a good place to start. The immediate problem is to get an infrastructure in place that gives us some numbers to apply equations to. regards, tom lane
pgsql-hackers by date: