Thread: Add enable_groupagg GUC parameter to control GroupAggregate usage
Hi hackers,
When I measured the execution time of a certain query with parallel query
enabled and disabled, I found that the execution time was slower when
parallel query was enabled.
To improve the performance of the parallel query, I considered adjusting
the execution plan and attempted to switch from GroupAggregate to
HashAggregate. However, I noticed that there was no GUC parameter to
disable GroupAggregate.
Therefore, I propose adding a new GUC parameter: enable_groupagg.
Below are the results of a performance test where I disabled
GroupAggregate using enable_groupagg. In this case, the planner chose
HashAggregate instead, which improved performance by about 35 times.
# Query Execution Results (Average of 3 measurements)
- With parallel query: 39546 seconds
- With parallel query and enable_groupagg turned off: 1115 seconds
# Query and Data Used (attached to this email)
- Query: test_query.sql
- Data: create_table.sql
# The steps to run the test are as follows.
For example, on psql:
1. Create tables:
\i create_table.sql
2. Execute a query:
\i test_query.sql
3. Execute a query using the new GUC parameter:
set enable_groupagg to off;
\i test_query.sql
As a benefit to users, while there has previously been a GUC parameter
to control HashAggregate, there was no corresponding way to control
GroupAggregate. This patch addresses that, giving users more flexibility
in tuning execution plans.
I've attached a WIP patch that adds this GUC parameter. I would
appreciate any feedback, especially regarding how many test cases I
should create.
To create new test cases for enable_groupagg, I looked into existing
test cases that use enable_hashagg and found that it is used in many
places (62 places). Should I add a test case for enable_groupagg in
the same place as enable_hashagg? I think that adding a new feature
requires a minimum number of test cases, so I would appreciate your
advice.
Additionally, based on the execution plan, I suspect the slowdown in the
parallel query might be caused by misestimates related to Sort or
Gather Merge.
While resolving those misestimates would ideally improve the root issue,
I'd like to keep the focus of this thread on adding the GUC parameter.
Then, I plan to report or address the estimation problem in a separate
thread.
When I measured the execution time of a certain query with parallel query
enabled and disabled, I found that the execution time was slower when
parallel query was enabled.
To improve the performance of the parallel query, I considered adjusting
the execution plan and attempted to switch from GroupAggregate to
HashAggregate. However, I noticed that there was no GUC parameter to
disable GroupAggregate.
Therefore, I propose adding a new GUC parameter: enable_groupagg.
Below are the results of a performance test where I disabled
GroupAggregate using enable_groupagg. In this case, the planner chose
HashAggregate instead, which improved performance by about 35 times.
# Query Execution Results (Average of 3 measurements)
- With parallel query: 39546 seconds
- With parallel query and enable_groupagg turned off: 1115 seconds
# Query and Data Used (attached to this email)
- Query: test_query.sql
- Data: create_table.sql
# The steps to run the test are as follows.
For example, on psql:
1. Create tables:
\i create_table.sql
2. Execute a query:
\i test_query.sql
3. Execute a query using the new GUC parameter:
set enable_groupagg to off;
\i test_query.sql
As a benefit to users, while there has previously been a GUC parameter
to control HashAggregate, there was no corresponding way to control
GroupAggregate. This patch addresses that, giving users more flexibility
in tuning execution plans.
I've attached a WIP patch that adds this GUC parameter. I would
appreciate any feedback, especially regarding how many test cases I
should create.
To create new test cases for enable_groupagg, I looked into existing
test cases that use enable_hashagg and found that it is used in many
places (62 places). Should I add a test case for enable_groupagg in
the same place as enable_hashagg? I think that adding a new feature
requires a minimum number of test cases, so I would appreciate your
advice.
Additionally, based on the execution plan, I suspect the slowdown in the
parallel query might be caused by misestimates related to Sort or
Gather Merge.
While resolving those misestimates would ideally improve the root issue,
I'd like to keep the focus of this thread on adding the GUC parameter.
Then, I plan to report or address the estimation problem in a separate
thread.
Thanks,
Tatsuro Yamada
Tatsuro Yamada
Attachment
Hi,
# Query Execution Results (Average of 3 measurements)
- With parallel query: 39546 seconds
- With parallel query and enable_groupagg turned off: 1115 seconds
Oops, I made a mistake. The correct execution time is:
- With parallel query: 39546 ms
- With parallel query and enable_groupagg turned off: 1115 ms
Regards,
Tatsuro Yamada