Home > mailing lists

Re: [HACKERS] Small improvement to parallel query docs - Mailing list pgsql-hackers

From	David Rowley
Subject	Re: [HACKERS] Small improvement to parallel query docs
Date	February 14, 2017 00:43:56
Msg-id	CAKJS1f_1=kJGYR-VOAiMiS=zwWLT=wr8t8X0hiQ4NYSgG37Nhg@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] Small improvement to parallel query docs (Brad DeJong <Brad.Dejong@infor.com>)
Responses	Re: [HACKERS] Small improvement to parallel query docs
List	pgsql-hackers

Tree view

On 14 February 2017 at 10:10, Brad DeJong <Brad.Dejong@infor.com> wrote:
> Robert Haas wrote:
>
>> +    <literal>COUNT(*)</>, each worker must compute subtotals which later must
>> +    be combined to produce an overall total in order to produce the final
>> +    answer.  If the query involves a <literal>GROUP BY</> clause,
>> +    separate subtotals must be computed for each group seen by each parallel
>> +    worker. Each of these subtotals must then be combined into an overall
>> +    total for each group once the parallel aggregate portion of the plan is
>> +    complete.  This means that queries which produce a low number of groups
>> +    relative to the number of input rows are often far more attractive to the
>> +    query planner, whereas queries which don't collect many rows into each
>> +    group are less attractive, due to the overhead of having to combine the
>> +    subtotals into totals, of which cannot run in parallel.
>
>> I don't think "of which cannot run in parallel" is good grammar.  I'm somewhat unsure whether the rest is an
improvementor not.  Other opinions?
 
>
> Does this read any more clearly?
>
> +    <literal>COUNT(*)</>, each worker must compute subtotals which are later
> +    combined in order to produce an overall total for the final answer.  If
> +    the query involves a <literal>GROUP BY</> clause, separate subtotals
> +    must be computed for each group seen by each parallel worker.  After the
> +    parallel aggregate portion of the plan is complete, there is a serial step
> +    where the group subtotals from all of the parallel workers are combined
> +    into an overall total for each group.  Because of the overhead of combining
> +    the subtotals into totals, plans which produce few groups relative to the
> +    number of input rows are often more attractive to the query planner
> +    than plans which produce many groups relative to the number of input rows.

Actually looking over this again I think it's getting into too much
detail which is already described in the next paragraph (of which I
think is very clear). I propose we just remove the whole paragraph,
and mention about the planning and estimated number of groups stuff in
another new paragraph.

I've attached a patch to this effect, which also just removes the text
about why we don't support Merge Join. I felt something needed written
in its place, so I mentioned that identical hash tables are created in
each worker. This is perhaps not required, but the paragraph seemed a
bit empty without it.  I also noticed a mistake "based on a column
taken from the inner table", this "inner" I assume should be "outer"
since it surely must be talking of a parameterised index scan?, in
which case the parameter is from the outer side, not the inner.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

parallel_doc_fixes_v2.patch

pgsql-hackers by date:

From: Craig Ringer
Date: 14 February 2017, 00:29:48
Subject: Re: [HACKERS] COPY IN/BOTH vs. extended query mode

From: Brad DeJong
Date: 14 February 2017, 00:56:36
Subject: Re: [HACKERS] Small improvement to parallel query docs

Re: [HACKERS] Small improvement to parallel query docs - Mailing list pgsql-hackers

Attachment

Previous

Next