Re: [HACKERS] Parallel bitmap heap scan - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: [HACKERS] Parallel bitmap heap scan |
Date | |
Msg-id | CAEepm=0ROZ=3OfWeKV+n+GzrhhgNbjdHAmmHygr0Sp=BrF6oNQ@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Parallel bitmap heap scan (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] Parallel bitmap heap scan
Re: [HACKERS] Parallel bitmap heap scan |
List | pgsql-hackers |
On Sun, Feb 19, 2017 at 10:34 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sun, Feb 19, 2017 at 9:59 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote: >> I can imagine it can get executed over and over if plan is something like below. >> >> NestLoopJoin >> -> SeqScan >> -> Gather >> -> Parallel Bitmap Heap Scan >> >> But in such case every time the Inner node of the NLJ will be >> rescanned i.e. Gather will be rescanned which in turn shutdown >> workers. > > Yeah, but it looks like ExecReScanGather gets rid of the workers, but > reuses the existing DSM. I'm not quite sure what happens to the DSA. > It looks like it probably just hangs around from the previous > iteration, which means that any allocations will also hang around. Yes, it hangs around. Being able to reuse state in a rescan is a feature: you might be able to reuse a hash table or a bitmap. In the Parallel Shared Hash patch, the last participant to detach from the shared hash table barrier gets a different return code and runs some cleanup code. The alternative was to make the leader wait for the workers to finish accessing the hash table so that it could do that. I had it that way in an early version, but my goal is to minimise synchronisation points so now it's just 'last to leave turns out the lights' with no waiting. One practical problem that came up was the need for executor nodes to get a chance to do that kind of cleanup before the DSM segment is detached. In my patch series I introduced a new node API ExecNodeDetach to allow for that. Andres objected that the need for that is evidence that the existing protocol is broken and should be fixed instead. I'm looking into that. On Sun, Feb 19, 2017 at 9:59 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote: > So basically, what I want to propose is that Only during > ExecReScanBitmapHeapScan we can free all the DSA pointers because at > that time we can be sure that all the workers have completed there > task and we are safe to free. (And we don't free any DSA memory at > ExecEndBitmapHeapScan). I think this works. I also grappled a bit with the question of whether it's actually worth trying to free DSA memory when you're finished with it, eating precious CPU cycles at end of a join, or just letting the the executor's DSA area get nuked at end of parallel execution. As you say, there is a special case for rescans to avoid leaks. I described this as a potential approach in a TODO note in my v5 patch, but currently my code just does the clean-up every time on the grounds that it's simple and hasn't shown up as a performance problem yet. Some hand-wavy thoughts on this topic in the context of hash joins: The argument for cleaning up sooner rather than later would be that it could reduce the total peak memory usage of large execution plans. Is that a reasonable goal and can we achieve it? I suspect the answer is yes in theory but no in practice, and we don't even try to achieve it in non-parallel queries as far as I know. My understanding is that PostgreSQL's planner can generate left-deep, bushy and right-deep hash join plans: N nested left-deep hash joins: Each hash join is on the outer side of its parent, supplying tuples to probe the parent hash table. Their probe phases overlap, so all N hash tables must exist and be fully loaded at the same time. N nested right-deep hash joins: Each hash join is on the inner side of its parent, supplying tuples to build the hash table of its parent. Theoretically you only need two full hash tables in memory at peak, because you'll finish probing each one while build its parent's hash table and then not need the child's hash table again (unless you need to rescan). N nested bushy hash joins: Somewhere in between. But we don't actually take advantage of that opportunity to reduce peak memory today. We always have N live hash tables and don't free them until standard_ExecutorEnd runs ExecProcEnd on the top of the plan. Perhaps we could teach hash tables to free themselves ASAP at the end of their probe phase unless they know a rescan is possible. But that just opens a whole can of worms: if we care about total peak memory usage, should it become a planner goal that might favour right-deep hash joins? I guess similar questions must arise for bitmap heap scan and anything else holding memory that it doesn't technically need anymore, and parallel query doesn't really change anything about the situation, except maybe that Gather nodes provide a point of scoping somewhere in between 'eager destruction' and 'hog all the space until end of plan' which makes things a bit better. I don't know anywhere near enough about query planners to say whether such ideas about planning are reasonable, and am quite aware that it's difficult terrain, and I have other fish to fry, so I'm going to put down the can opener and back away. -- Thomas Munro http://www.enterprisedb.com
pgsql-hackers by date: