Re: Size of Path nodes - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Size of Path nodes |
Date | |
Msg-id | CA+TgmoYSLHYPSQ++aQwiAA2C-stKssj-Q_69p3wDvOMTqFkUtg@mail.gmail.com Whole thread Raw |
In response to | Re: Size of Path nodes (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Size of Path nodes
|
List | pgsql-hackers |
On Sat, Dec 5, 2015 at 2:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Meh. Size of first patch is a pretty bad guide to whether a particular > structural decision is going to be a good idea in the long run. Do you > need some examples of patches we've rejected because they were done in a > way that minimized the size of the patch rather than a way that made sense > from a system structural viewpoint? You can provide some examples if you like, but I thought about the issue of overall system structure and settled on this design after coming to the conclusion that this design was better from that viewpoint. If it becomes clear that I'm wrong, fine: we can change it. I think I've done a pretty good job - I'm going to go out on a limb and say an excellent job - modularizing the parallelism code so that, if it turns out I've picked the wrong design for some part of it, we can replace just that part without having a huge impact on all the rest of it. But at this point I am not by any means convinced that I've got it wrong. I've been working on this for several years and have thought about it quite deeply, discussed it extensively on and off the mailing list with anyone and everyone who was willing to participate in that discussion, and read what relevant literature I could find. That doesn't guarantee that I've got it right, but if your premise here is that I haven't thought about system structure, I'm going to say that premise is wrong. I've thought about it a lot. > I would also point out that if we'd followed this approach in the past, > nodeIndexscan.c, nodeBitmapIndexscan.c and nodeIndexonlyscan.c would be > one file, and it would be nigh unintelligible. Those files nonetheless > manage to share code where it makes sense for them to. True, but I don't think that's a good analogy. I think the best analogy to parallelism is parameterized plans. And we DON'T have separate nodes for parameterized index scans and unparameterized index scans, nor do we have separate path nodes. We have something called an index scan and it can be parameterized or unparameterized, and we keep track of that distinction during both planning and execution. So here. For example, in the case of nodeSeqscan.c, almost two-thirds of the new lines of code are to add three additional methods that aren't called in parallel aware mode, and about half of the rest are due to a SeqScanState now having one integer field that's not part of ScanState, which require mechanical changes in a bunch of places ("ps" becomes "ss.ps"). The rest amounts to 15-20 lines of real code change. It may be useful context - although you will already know this, if you have been reading the relevant threads - to know that up until a couple of months ago I actually had this exactly the way you are describing, with parallel sequential scan as a separate node type, mostly because I know that you've leaned in the direction of making things separate node types whenever there's any doubt about the matter. The main reason I changed my mind, aside from the massive code duplication that seemed to be resulting, was the realization that every scan type needs a parallel-aware mode, and that the parallel-aware mode need to degenerate back to exactly the same thing as the non-parallel-aware case when no workers are available. It makes no sense to me to say that we should double the number of executor nodes, or even the number of path nodes, just because we have parallelism. I'm sure we'd never actually double it, as there are some plan nodes like Result for which I can't foresee a parallel-aware version. But we'd add a heck of a lot of them, and in every case the parallel-aware version would be a strict superset of what the non-parallel-aware version can do. Consider Append, for example. A regular Append runs all of the child plans one after another. A parallel-aware Append, like any other parallel-aware node, will be running in multiple copies, one per worker. Well, what it should do in order to minimize contention is try to spread out the workers among the subplans so as to minimize contention, rather than throwing all the workers at the first subplan, then when that gets done throwing them all at the second subplan, and so forth. See the parallel append thread for further discussion of these issues. Now, if you are in a parallel plan, you always want this behavior: there is never any reason (at least that I can see currently) to prefer the regular Append behavior (though the architecture I've chosen could support that if it turns out we need it). And if you are not in a parallel plan, the behavior of the parallel-aware version degenerates to exactly what our current Append node does anyway. So if you separate that into two nodes, a regular Append and a Parallel Append, it's just stupid. You just have one node that is a dumber version of another node, and you have code somewhere to distinguish between them when you could just as well always use the more capable one. I believe this kind of pattern is quite general. Most - though maybe not all - parallel-aware nodes are going to do basically the same thing that they do in a regular computation with a little bit of extra smarts specific to parallelism. If we have parallel-aware nodes that do things that are noticeably different than what the non-parallel aware versions do, well, your bias in this matter is certainly known to me and I will respect it. But I can't stomach the idea that as soon as we add even one line of code to cater to parallelism to a node, we have to duplicate that node and all of the supporting infrastructure. I really doubt you would be happy with that design in the long term either, or even the medium term. We have drifted a bit from the subject of this thread. Maybe that is OK, but we should make sure to finish dealing with the original issue you raised. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: