Home > mailing lists

Re: [HACKERS] Parallel bitmap heap scan - Mailing list pgsql-hackers

From	Thomas Munro
Subject	Re: [HACKERS] Parallel bitmap heap scan
Date	February 20, 2017 03:25:15
Msg-id	CAEepm=0ROZ=3OfWeKV+n+GzrhhgNbjdHAmmHygr0Sp=BrF6oNQ@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] Parallel bitmap heap scan (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: [HACKERS] Parallel bitmap heap scan Re: [HACKERS] Parallel bitmap heap scan
List	pgsql-hackers

Tree view

On Sun, Feb 19, 2017 at 10:34 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Feb 19, 2017 at 9:59 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>> I can imagine it can get executed over and over if plan is something like below.
>>
>> NestLoopJoin
>>     -> SeqScan
>>     -> Gather
>>         -> Parallel Bitmap Heap Scan
>>
>> But in such case every time the Inner node of the NLJ will be
>> rescanned i.e. Gather will be rescanned which in turn shutdown
>> workers.
>
> Yeah, but it looks like ExecReScanGather gets rid of the workers, but
> reuses the existing DSM.  I'm not quite sure what happens to the DSA.
> It looks like it probably just hangs around from the previous
> iteration, which means that any allocations will also hang around.

Yes, it hangs around.  Being able to reuse state in a rescan is a
feature: you might be able to reuse a hash table or a bitmap.

In the Parallel Shared Hash patch, the last participant to detach from
the shared hash table barrier gets a different return code and runs
some cleanup code.  The alternative was to make the leader wait for
the workers to finish accessing the hash table so that it could do
that.  I had it that way in an early version, but my goal is to
minimise synchronisation points so now it's just 'last to leave turns
out the lights' with no waiting.

One practical problem that came up was the need for executor nodes to
get a chance to do that kind of cleanup before the DSM segment is
detached.  In my patch series I introduced a new node API
ExecNodeDetach to allow for that.  Andres objected that the need for
that is evidence that the existing protocol is broken and should be
fixed instead.  I'm looking into that.

On Sun, Feb 19, 2017 at 9:59 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
> So basically, what I want to propose is that Only during
> ExecReScanBitmapHeapScan we can free all the DSA pointers because at
> that time we can be sure that all the workers have completed there
> task and we are safe to free. (And we don't free any DSA memory at
> ExecEndBitmapHeapScan).

I think this works.

I also grappled a bit with the question of whether it's actually worth
trying to free DSA memory when you're finished with it, eating
precious CPU cycles at end of a join, or just letting the the
executor's DSA area get nuked at end of parallel execution.  As you
say, there is a special case for rescans to avoid leaks.  I described
this as a potential approach in a TODO note in my v5 patch, but
currently my code just does the clean-up every time on the grounds
that it's simple and hasn't shown up as a performance problem yet.

Some hand-wavy thoughts on this topic in the context of hash joins:

The argument for cleaning up sooner rather than later would be that it
could reduce the total peak memory usage of large execution plans.  Is
that a reasonable goal and can we achieve it?  I suspect the answer is
yes in theory but no in practice, and we don't even try to achieve it
in non-parallel queries as far as I know.

My understanding is that PostgreSQL's planner can generate left-deep,
bushy and right-deep hash join plans:

N nested left-deep hash joins:  Each hash join is on the outer side of
its parent, supplying tuples to probe the parent hash table.  Their
probe phases overlap, so all N hash tables must exist and be fully
loaded at the same time.

N nested right-deep hash joins:  Each hash join is on the inner side
of its parent, supplying tuples to build the hash table of its parent.
Theoretically you only need two full hash tables in memory at peak,
because you'll finish probing each one while build its parent's hash
table and then not need the child's hash table again (unless you need
to rescan).

N nested bushy hash joins:  Somewhere in between.

But we don't actually take advantage of that opportunity to reduce
peak memory today.  We always have N live hash tables and don't free
them until standard_ExecutorEnd runs ExecProcEnd on the top of the
plan.  Perhaps we could teach hash tables to free themselves ASAP at
the end of their probe phase unless they know a rescan is possible.
But that just opens a whole can of worms:  if we care about total peak
memory usage, should it become a planner goal that might favour
right-deep hash joins?  I guess similar questions must arise for
bitmap heap scan and anything else holding memory that it doesn't
technically need anymore, and parallel query doesn't really change
anything about the situation, except maybe that Gather nodes provide a
point of scoping somewhere in between 'eager destruction' and 'hog all
the space until end of plan' which makes things a bit better.  I don't
know anywhere near enough about query planners to say whether such
ideas about planning are reasonable, and am quite aware that it's
difficult terrain, and I have other fish to fry, so I'm going to put
down the can opener and back away.

-- 
Thomas Munro
http://www.enterprisedb.com

pgsql-hackers by date:

From: Kouhei Kaigai
Date: 20 February 2017, 03:25:08
Subject: Re: ParallelFinish-hook of FDW/CSP (Re: [HACKERS] Steps insideExecEndGather)

From: "Tsunakawa, Takayuki"
Date: 20 February 2017, 03:34:43
Subject: Re: [HACKERS] Adding new output parameter of pg_stat_statements toidentify operation of the query.

Re: [HACKERS] Parallel bitmap heap scan - Mailing list pgsql-hackers

Previous

Next