Re: from_collapse_limit vs. geqo_threshold - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: from_collapse_limit vs. geqo_threshold |
Date | |
Msg-id | 603c8f070905251800g5b86d2dav26eca7f417d15dbf@mail.gmail.com Whole thread Raw |
In response to | Re: from_collapse_limit vs. geqo_threshold (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: from_collapse_limit vs. geqo_threshold
|
List | pgsql-hackers |
On Mon, May 25, 2009 at 6:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Now I'm still not exactly happy with GEQO, but it's surely a lot better > than it was in the fall of 2000. So on the whole it does seem that the > current relationships between from_collapse_limit, join_collapse_limit, > and geqo_threshold are based on obsolete information and should be > revisited. I don't have any data at hand to suggest specific new > default values, though. For 8.4, I'd be happy to just improve the documentation. I think this sentence could just be deleted from the section on from_collapse_limit: It is usually wise to keep this less than <xref linkend="guc-geqo-threshold">. We could put some other explanation in place of that sentence, but I'm not exactly sure what that explanation would say. I guess the point is that setting from_collapse_limit < geqo_threshold may delay GEQO planning considerably in the face of complex subqueries, because pulling up subqueries increases the size of the FROM list (I think). That could be good if you want your query plans to be more deterministic, but there's no guarantee they'll be good. Setting from_collapse_limit > geqo_threshold is basically saying that the standard planner will always have subqueries pulled up, so from_collapse_limit should be based on what the pain point will be for GEQO. I'm not sure there's a lot of point in spelling all that out, though. It more or less follows from the definition of the parameters. So, I'd be just as happy to delete the misleading hint and call it good. But I could go either way. For 8.5, it sounds like we need to do some testing to determine an appropriate set of values, but I'm not exactly sure what to test. As a practical matter, the correct level of effort depends a lot on how long the query figures to run. For OLAP queries, planning times of more than 50 ms or so start to add noticeably to the overall runtime of the query, but if the query is expected to run for several minutes, we'd presumably be happy to spend several seconds planning it, which might make it feasible to use the standard planner even for very, very big queries. I'm not 100% convinced of the value of join_collapse_limit for anything other than explicit control over the join order. I have yet to meet a PostgreSQL who thought that it was intuitive that it might matter whether you wrote A JOIN B ON P1 JOIN C ON P2 JOIN D ON P3 [etc] or A, B, C, D, [etc] WHERE P1, P2, P3. I suspect there are many people who, if they knew that the latter might optimize better than the former in some circumstances, would simply always write it in the latter fashion, which makes the whole thing look a lot like a concealed foot-gun, since whether or not it actually protects you against exponential planning-time growth has a lot to do with how you happen to like to write your queries (myself, I've switched styles in the last few years). ...Robert
pgsql-hackers by date: