Re: Performance Issue on Query 18 of TPC-H Benchmark - Mailing list pgsql-bugs

From David Rowley
Subject Re: Performance Issue on Query 18 of TPC-H Benchmark
Date
Msg-id CAApHDvo7najDZTg4=KMDsBTwE0+9K8=AxAgLxQTT1_AOGkYP=A@mail.gmail.com
Whole thread Raw
In response to Re: Performance Issue on Query 18 of TPC-H Benchmark  (Ba Jinsheng <bajinsheng@u.nus.edu>)
List pgsql-bugs
On Thu, 17 Oct 2024 at 07:26, Ba Jinsheng <bajinsheng@u.nus.edu> wrote:
>
> >I believe you are not allowing the optimizer to generate a different aggregation path (Group Aggregate) because it
requiresa sort operation. So I think this is not correct.
 
>
> Yes, this is what I did. I though it is what you were asking? I have not found another way to enforce HashAggregate,
soI directly modified the code. Can you eliberate why it is incorrect?
 

Like I mentioned earlier, GroupAggregate needs its input to be
guaranteed to be sorted by the group key. Above you've hacked the
planner to produce:

->  GroupAggregate  (cost=0.00..208346.45 rows=126825 width=4) (actual
time=334.397..1549.837 rows=9 loops=3)
                                                         Group Key:
lineitem_1.l_orderkey
                                                         Filter:
(sum(lineitem_1.l_quantity) > '314'::numeric)
                                                         Rows Removed
by Filter: 1500388
                                                         ->  Seq Scan
on lineitem lineitem_1  (cost=0.00..172626.23 rows=6002623 width=9)
(actual time=0.051..438.226 rows=6001215 loops=3)

This is simply not a valid plan.  GroupAggreate relies on the rows
arriving in Group Key order as it checks if the current row is in the
same group as the previous row. When it's not, that group is classes
as complete and the HAVING clause can be evaluated.  It's possible the
plan you've ended up when happens to produce the correct results
before the lineitem table is already in l_orderkey order.  Try
updating one of the earlier rows in a way that puts the heap out of
order and you'll likely notice the "Rows Removed by Filter" change. Or
try without the HAVING clause and observe the number of groups
changing.

I strongly suggest you experiment further before proposing changes in
this area. Also, this is not a valid discussion for the pgsql-bugs
mailing list.  This list is about reporting newly discovered bugs.
It's not a place to discuss proposing new ones.

David



pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #18657: Using JSON_OBJECTAGG with volatile function leads to segfault
Next
From: Andrei Lepikhov
Date:
Subject: Re: Performance Issue on Query 18 of TPC-H Benchmark