Re: a raft of parallelism-related bug fixes - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: a raft of parallelism-related bug fixes |
Date | |
Msg-id | CA+Tgmobt+N03WguRO-2UOJRB4HccALPDC8xW8h5gSY8fVM3tJA@mail.gmail.com Whole thread Raw |
In response to | Re: a raft of parallelism-related bug fixes (Simon Riggs <simon@2ndQuadrant.com>) |
Responses |
Re: a raft of parallelism-related bug fixes
Re: a raft of parallelism-related bug fixes Re: a raft of parallelism-related bug fixes Re: a raft of parallelism-related bug fixes Re: a raft of parallelism-related bug fixes |
List | pgsql-hackers |
On Sat, Oct 17, 2015 at 9:16 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > From reading this my understanding is that there isn't a test suite included > with this commit? Right. The patches on the thread contain code that can be used for testing, but the committed code does not itself include test coverage. I welcome thoughts on how we could perform automated testing of this code. I think at least part of the answer is that I need to press on toward getting the rest of Amit's parallel sequential scan patch committed, because then this becomes a user-visible feature and I expect that to make it much easier to find whatever bugs remain. A big part of the difficulty in testing this up until now is that I've been building towards, hey, we have parallel query. Until we actually do, you need to write C code to test this, which raises the bar considerably. Now, that does not mean we shouldn't test this in other ways, and of course I have, and so have Amit and other people from the community - of late, Noah Misch and Haribabu Kommi have found several bugs through code inspection and testing, which included some of the same ones that I was busy finding and fixing using the test code attached to this thread. That's one of the reasons why I wanted to press forward with getting the fixes for those bugs committed. It's just a waste of everybody's time if we re-finding known bugs for which fixes already exist. But the question of how to test this in the buildfarm is a good one, and I don't have a complete answer. Once the rest of this goes in, which I hope will be soon, we can EXPLAIN or EXPLAIN ANALYZE or just straight up run parallel queries in the regression test suite and see that they behave as expected. But I don't expect that to provide terribly good test coverage. One idea that I think would provide *excellent* test coverage is to take the test code included on this thread and run it on the buildfarm. The idea of the code is to basically run the regression test suite with every parallel-eligible query forced to unnecessarily use parallelism. Turning that and running 'make check' found, directly or indirectly, all of these bugs. Doing that on the whole buildfarm would probably find more. However, I'm pretty sure that we don't want to switch the *entire* buildfarm to using lots of unnecessary parallelism. What we might be able to do is have some critters that people spin up for this precise purpose. Just like we currently have CLOBBER_CACHE_ALWAYS buildfarm members, we could have GRATUITOUSLY_PARALLEL buildfarm members. If Andrew is willing to add buildfarm support for that option and a few people are willing to run critters in that mode, I will be happy - more than happy, really - to put the test code into committable form, guarded by a #define, and away we go. Of course, other ideas for testing are also welcome. > I've tried to review the Gather node commit and I note that the commit > message contains a longer description of the functionality in that patch > than any comments in the patch as a whole. No design comments, no README, no > file header comments. For such a major feature that isn't acceptable - I > would reject a patch from others on that basis alone (and have done so). We > must keep the level of comments high if we are to encourage wider > participation in the project. It's good to have your perspective on how this can be improved, and I'm definitely willing to write more documentation. Any lack in that area is probably due to being too close to the subject area, having spent several years on parallelism in general, and 200+ emails on parallel sequential scan in particular. Your point about the lack of a good header file comment for execParallel.c is a good one, and I'll rectify that next week. It's worth noting, though, that the executor files in general don't contain great gobs of comments, and the executor README even has this vintage 2001 comment: XXX a great deal more documentation needs to be written here... Well, yeah. It's taken me a long time to understand how the executor actually works, and there are parts of it - particularly related to EvalPlanQual - that I still don't fully understand. So some of the lack of comments in, for example, nodeGather.c is because it copies the style of other executor nodes, like nodeSeqscan.c. It's not exactly clear to me what more to document there. You either understand what a rescan node is, in which case the code for each node's rescan method tends to be fairly self-evident, or you don't - but that clearly shouldn't be re-explained in every file. So I guess what I'm saying is I could use some advice on what kinds things would be most useful to document, and where to put that documentation. Right now, the best explanation of how parallelism works is in src/backend/access/transam/README.parallel -- but, as you rightly point out, that doesn't cover the executor bits. Should we have SGML documentation under "VII. Internals" that explains what's under the hood in the same way that we have sections for "Database Physical Storage" and "PostgreSQL Coding Conventions"? Should the stuff in the existing README.parallel be moved there? Or I could just add some words to src/backend/executor/README to cover the parallel executor stuff, if that is preferred. Advice? Also, regardless of how we document what's going on at the code level, I think we probably should have a section *somewhere* in the main SGML documentation that kind of explains the general concepts behind parallel query from a user/DBA perspective. But I don't know where to put it. Under "Server Administration"? Exactly what to explain there needs some thought, too. I'm sort of wondering if we need two chapters in the documentation on this, one that covers it from a user/DBA perspective and the other of which covers it from a hacker perspective. But then maybe the hacker stuff should just go in README files. I'm not sure. I may have to try writing some of this and see how it goes, but advice is definitely appreciated. I am happy to definitively commit to writing whatever documentation the community feels is necessary here, and I will do that certainly before end of development for 9.6 and hopefully much sooner than that. I will do that even if I don't get any specific feedback on what to write and where to put it, but the more feedback I get, the better the result will probably be. Some of the reason this hasn't been done already is because we're still getting the infrastructure into place, and we're fixing and adjusting things as we go along, so while the overall picture isn't changing much, there are bits of the design that are still in flux as we realize, oh, crap, that was a dumb idea. As we get a clearer idea what will be in 9.6, it will get easier to present the overall picture in a coherent way. > So reviewing patch 13 isn't possible without prior knowledge. The basic question for patch 13 is whether ephemeral record types can occur in executor tuples in any contexts that I haven't identified. I know that a tuple table slot can contain have a column that is of type record or record[], and those records can themselves contain attributes of type record or record[], and so on as far down as you like. I *think* that's the only case. For example, I don't believe that a TupleTableSlot can contain a *named* record type that has an anonymous record buried down inside of it somehow. But I'm not positive I'm right about that. > Hoping we'll be able to find some time on this at PGConf.eu; thanks for > coming over. Sure thing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: