Re: POC: GROUP BY optimization - Mailing list pgsql-hackers
From | Andrei Lepikhov |
---|---|
Subject | Re: POC: GROUP BY optimization |
Date | |
Msg-id | 890ed877-e2c0-448a-93b8-b07ccf3a2b37@postgrespro.ru Whole thread Raw |
In response to | Re: POC: GROUP BY optimization (Richard Guo <guofenglinux@gmail.com>) |
Responses |
Re: POC: GROUP BY optimization
|
List | pgsql-hackers |
On 22/2/2024 09:09, Richard Guo wrote: > > On Wed, Feb 21, 2024 at 6:20 PM Alexander Korotkov <aekorotkov@gmail.com > <mailto:aekorotkov@gmail.com>> wrote: > > Hi, Richard! > > > What do you think about the revisions for the test cases? > > I've rebased your patch upthread. Did some minor beautifications. > > > * The table 'btg' is inserted with 10000 tuples, which seems a bit > > expensive for a test. I don't think we need such a big table to test > > what we want. > > Your patch reduces the number of rows to 1000 tuples. I found it > possible to further reduce it to 100 tuples. That also allowed me to > save the plan in the test case introduced by e1b7fde418. > > Please check if you're OK with the patch attached. > > > I looked through the v2 patch and have two comments. > > * The test case under "Check we don't pick aggregate path key instead of > grouping path key" does not have EXPLAIN to show the plan. So how can > we ensure we do not mistakenly select the aggregate pathkeys instead of > the grouping pathkeys? I confirm it works correctly. I am not sure about the stability of the zeros number in the output on different platforms here: avg -------------------- 4.0000000000000000 5.0000000000000000 It was why I'd used the format() function before. So, may we elaborate on the test and restrict the output? > > * I don't think the test case introduced by e1b7fde418 is still needed, > because we already have one under "Utilize the ordering of merge join to > avoid a full Sort operation". This kind of test case is just to ensure > that we are able to utilize the ordering of the subplans underneath. So > it should be parallel to the test cases for utilize the ordering of > index scan and subquery scan. I confirm, this version also checks ec_sortref initialization in the case when ec are contructed from WHERE clause. Generally, I like more one test for one issue instead of one test for all at once. But it works and I don't have any reason to dispute it. Also, I'm unsure about removing the disabling of the max_parallel_workers_per_gather parameter. Have you discovered the domination of the current plan over the partial one? Do the cost fluctuations across platforms not trigger a parallel plan? What's more, I suggest to address here the complaint from [1]. As I see, cost difference between Sort and IncrementalSort strategies in that case is around 0.5. To make the test more stable I propose to change it a bit and add a limit: SELECT count(*) FROM btg GROUP BY z, y, w, x LIMIT 10; It makes efficacy of IncrementalSort more obvious difference around 10 cost points. [1] https://www.postgresql.org/message-id/CACG=ezaYM1tr6Lmp8PRH1aeZq=rBKXEoTwgzMcLaD5MPhfW0Lg@mail.gmail.com -- regards, Andrei Lepikhov Postgres Professional
pgsql-hackers by date: