Home > mailing lists

Re: POC: GROUP BY optimization - Mailing list pgsql-hackers

From	Andrei Lepikhov
Subject	Re: POC: GROUP BY optimization
Date	February 22, 2024 04:18:42
Msg-id	890ed877-e2c0-448a-93b8-b07ccf3a2b37@postgrespro.ru Whole thread Raw
In response to	Re: POC: GROUP BY optimization (Richard Guo <guofenglinux@gmail.com>)
Responses	Re: POC: GROUP BY optimization
List	pgsql-hackers

Tree view

On 22/2/2024 09:09, Richard Guo wrote:
> 
> On Wed, Feb 21, 2024 at 6:20 PM Alexander Korotkov <aekorotkov@gmail.com 
> <mailto:aekorotkov@gmail.com>> wrote:
> 
>     Hi, Richard!
> 
>      > What do you think about the revisions for the test cases?
> 
>     I've rebased your patch upthread.  Did some minor beautifications.
> 
>      > * The table 'btg' is inserted with 10000 tuples, which seems a bit
>      > expensive for a test.  I don't think we need such a big table to test
>      > what we want.
> 
>     Your patch reduces the number of rows to 1000 tuples.  I found it
>     possible to further reduce it to 100 tuples.  That also allowed me to
>     save the plan in the test case introduced by e1b7fde418.
> 
>     Please check if you're OK with the patch attached.
> 
> 
> I looked through the v2 patch and have two comments.
> 
> * The test case under "Check we don't pick aggregate path key instead of
> grouping path key" does not have EXPLAIN to show the plan.  So how can
> we ensure we do not mistakenly select the aggregate pathkeys instead of
> the grouping pathkeys?
I confirm it works correctly. I am not sure about the stability of the 
zeros number in the output on different platforms here:
        avg
--------------------
  4.0000000000000000
  5.0000000000000000
It was why I'd used the format() function before. So, may we elaborate 
on the test and restrict the output?
> 
> * I don't think the test case introduced by e1b7fde418 is still needed,
> because we already have one under "Utilize the ordering of merge join to
> avoid a full Sort operation".  This kind of test case is just to ensure
> that we are able to utilize the ordering of the subplans underneath.  So
> it should be parallel to the test cases for utilize the ordering of
> index scan and subquery scan.
I confirm, this version also checks ec_sortref initialization in the 
case when ec are contructed from WHERE clause. Generally, I like more 
one test for one issue instead of one test for all at once. But it works 
and I don't have any reason to dispute it.

Also, I'm unsure about removing the disabling of the 
max_parallel_workers_per_gather parameter. Have you discovered the 
domination of the current plan over the partial one? Do the cost 
fluctuations across platforms not trigger a parallel plan?

What's more, I suggest to address here the complaint from [1]. As I see, 
cost difference between Sort and IncrementalSort strategies in that case 
is around 0.5. To make the test more stable I propose to change it a bit 
and add a limit:
SELECT count(*) FROM btg GROUP BY z, y, w, x LIMIT 10;
It makes efficacy of IncrementalSort more obvious difference around 10 
cost points.

[1] 
https://www.postgresql.org/message-id/CACG=ezaYM1tr6Lmp8PRH1aeZq=rBKXEoTwgzMcLaD5MPhfW0Lg@mail.gmail.com

-- 
regards,
Andrei Lepikhov
Postgres Professional

pgsql-hackers by date:

From: Ashutosh Bapat
Date: 22 February 2024, 04:06:06
Subject: Re: RFC: Logging plan of the running query

From: Robert Haas
Date: 22 February 2024, 04:25:42
Subject: Re: DSA_ALLOC_NO_OOM doesn't work

Re: POC: GROUP BY optimization - Mailing list pgsql-hackers

Previous

Next