Re: WIP: Upper planner pathification - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: WIP: Upper planner pathification |
Date | |
Msg-id | CA+TgmoaphpZUX5=UoeV=09_mR6i2RgvC=Bn2z=VxpWVy+o_Jeg@mail.gmail.com Whole thread Raw |
In response to | Re: WIP: Upper planner pathification (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On Mon, Mar 7, 2016 at 11:09 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> The currently-committed code generates paths where nested loops and >> hash joins get pushed beneath the Gather node, but does not generate >> paths where merge joins have been pushed beneath the Gather node. And >> the reason I didn't try to generate those paths is because I believe >> they will almost always suck. > > That's a perfectly reasonable engineering judgment (and especially so > for a first release). What I'd really like to see documented is how > that conclusion is related, or not, to the rules about how path nodes > should be decorated with parallel_safe, parallel_degree, etc annotations. > The existing documentation is barely adequate to explain what those fields > mean for primitive scan nodes; it's impossible for anyone but you to > know what they are supposed to mean for joins and higher-level nodes. It is unrelated, I think. If a path is parallel_safe, that is supposed to mean that, in theory, the plan generated from that path could be executed within a worker without crashing the server, giving wrong answers, or otherwise destroying the world. However, as an optimization, if we've already decided that the query can't ever be parallelized at all, for example because it contains write operations, we don't bother trying to set the parallel_safe flags correctly; they're just all false. Generally, a path is definitely not parallel_safe if it contains a path that is not parallel_safe; if all of the paths under it are parallel_safe, then it is also parallel_safe except when there's some unsafe computation added at the new level -- like an unsafe join qual between two safe relations. If a path is parallel_aware, that means that the plan generated by that path wants to do something different when run in parallel mode. Presumably, the difference will be that the plan will establish some shared state in the dynamic shared memory segment created to service that parallel query. For example, a sequential scan can be parallel_aware, which will allow that sequential scan to be simultaneously executed in multiple processes and return only a subset of the rows in each. A non-parallel_aware sequential scan can still be used in parallel mode; for example, consider this: Gather -> Hash Join -> Parallel Seq Scan -> Hash -> Seq Scan The outer seq scan needs to return each row only once across all workers, but the inner seq scan needs to return every row in every worker. Therefore, the outer seq scan is flagged parallel_aware and displays in the EXPLAIN output as "Parallel Seq Scan", while the inner one is not and does not. parallel_degree is a horrible kludge whose function is to communicate to the Gather node the number of workers for which it should budget. Currently, every parallel plan's leftmost descendent will be a Parallel Seq Scan, and that Parallel Seq Scan will estimate the degree of parallelism that makes sense using a simplistic, bone-headed algorithm based on the size of the table. That then bubbles up the plan tree to the Gather node, which adopts the Parallel Seq Scan's suggestion. I really hope this is going to go away eventually and be replaced by something better. Really, I think we should try to figure out the amount of parallelizable work (CPU, and effective I/O parallelism) that is going to be required per leftmost tuple and compare that to the amount of non-parallelizable work (presumably, the reset of the I/O cost) and use that to judge the optimal parallel degree. But I think that's going to take a lot of work to get right, and it ties into some other issues, like the fact that we estimate a scan of a 1MB table to have the same cost per page as a scan of a 10TB table even though the former should probably be assumed to be fully cached and the latter should probably be assumed not to be cached at all. I think a lot more thought is needed here than I've given it thus far, and one of the things that I'm hoping is that people will test parallel query and actually report the results so that we can accumulate some data on which problems are most important to go fix and, also, what the shape of those fixes might look like. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: