Possible incorrect row estimation for Gather paths - Mailing list pgsql-hackers
From | Anthonin Bonnefoy |
---|---|
Subject | Possible incorrect row estimation for Gather paths |
Date | |
Msg-id | CAO6_Xqr9+51NxgO=XospEkUeAg-p=EjAWmtpdcZwjRgGKJ53iA@mail.gmail.com Whole thread Raw |
Responses |
Re: Possible incorrect row estimation for Gather paths
|
List | pgsql-hackers |
Hi,
While experimenting on an explain option to display all plan candidates (very rough prototype here [1]), I've noticed some discrepancies in some generated plans.
EXPLAIN (ALL_CANDIDATES) SELECT * FROM pgbench_accounts order by aid;
Plan 1
-> Gather Merge (cost=11108.32..22505.38 rows=100000 width=97)
Workers Planned: 1
-> Sort (cost=10108.31..10255.37 rows=58824 width=97)
Sort Key: aid
-> Parallel Seq Scan on pgbench_accounts (cost=0.00..2228.24 rows=58824 width=97)
Plan 2
-> Gather Merge (cost=11108.32..17873.08 rows=58824 width=97)
Workers Planned: 1
-> Sort (cost=10108.31..10255.37 rows=58824 width=97)
Sort Key: aid
-> Parallel Seq Scan on pgbench_accounts (cost=0.00..2228.24 rows=58824 width=97)
The 2 plans are similar except one Gather Merge has a lower 58K estimated rows.
The first plan is created with generate_useful_gather_paths with override_rows=false then create_gather_merge_path and will use rel->rows as the row count (so the 100K rows of pgbench_accounts).
#0: create_gather_merge_path(...) at pathnode.c:1885:30
#1: generate_useful_gather_paths(... override_rows=false) at allpaths.c:3286:11
#2: apply_scanjoin_target_to_paths(...) at planner.c:7744:3
#3: grouping_planner(...) at planner.c:1638:3
The second plan is created through create_ordered_paths then create_gather_merge_path and the number of rows is estimated to (worker_rows * parallel_workers). Since we only have 1 parallel worker, this yields 58K rows.
#0: create_gather_merge_path(...) at pathnode.c:1885:30
#1: create_ordered_paths(...) at planner.c:5287:5
#2: grouping_planner(...) at planner.c:1717:17
The 58K row estimate looks possibly incorrect. A worker row count is estimated using total_rows/parallel_divisor. The parallel_divisor will include the possible leader participation and will be 1.7 for 1 worker thus the 58K rows (100K/1.7=58K)
However, the gather merge will only do 58K*1, dropping the leader participation component.
I have a tentative patch split in two changes:
1: This is a related refactoring to remove an unnecessary and confusing assignment of rows in create_gather_merge_path. This value is never used and immediately overwritten in cost_gather_merge
2: This changes the row estimation of gather path to use worker_rows*parallel_divisor to get a more accurate estimation. Additionally, when creating a gather merge path in create_ordered_paths, we try to use the source's rel rows when available.
The patch triggered a small change in the hash_join regression test. Pre patch, we had the following candidates.
Plan 4
-> Aggregate (cost=511.01..511.02 rows=1 width=8)
-> Gather (cost=167.02..511.00 rows=2 width=0)
Workers Planned: 1
-> Parallel Hash Join (cost=167.02..510.80 rows=1 width=0)
Hash Cond: (r.id = s.id)
-> Parallel Seq Scan on simple r (cost=0.00..299.65 rows=11765 width=4)
-> Parallel Hash (cost=167.01..167.01 rows=1 width=4)
-> Parallel Seq Scan on extremely_skewed s (cost=0.00..167.01 rows=1 width=4)
Plan 5
-> Finalize Aggregate (cost=510.92..510.93 rows=1 width=8)
-> Gather (cost=510.80..510.91 rows=1 width=8)
Workers Planned: 1
-> Partial Aggregate (cost=510.80..510.81 rows=1 width=8)
-> Parallel Hash Join (cost=167.02..510.80 rows=1 width=0)
Hash Cond: (r.id = s.id)
-> Parallel Seq Scan on simple r (cost=0.00..299.65 rows=11765 width=4)
-> Parallel Hash (cost=167.01..167.01 rows=1 width=4)
-> Parallel Seq Scan on extremely_skewed s (cost=0.00..167.01 rows=1 width=4)
Post patch, the plan candidates became:
Plan 4
-> Finalize Aggregate (cost=511.02..511.03 rows=1 width=8)
-> Gather (cost=510.80..511.01 rows=2 width=8)
Workers Planned: 1
-> Partial Aggregate (cost=510.80..510.81 rows=1 width=8)
-> Parallel Hash Join (cost=167.02..510.80 rows=1 width=0)
Hash Cond: (r.id = s.id)
-> Parallel Seq Scan on simple r (cost=0.00..299.65 rows=11765 width=4)
-> Parallel Hash (cost=167.01..167.01 rows=1 width=4)
-> Parallel Seq Scan on extremely_skewed s (cost=0.00..167.01 rows=1 width=4)
Plan 5
-> Aggregate (cost=511.01..511.02 rows=1 width=8)
-> Gather (cost=167.02..511.00 rows=2 width=0)
Workers Planned: 1
-> Parallel Hash Join (cost=167.02..510.80 rows=1 width=0)
Hash Cond: (r.id = s.id)
-> Parallel Seq Scan on simple r (cost=0.00..299.65 rows=11765 width=4)
-> Parallel Hash (cost=167.01..167.01 rows=1 width=4)
-> Parallel Seq Scan on extremely_skewed s (cost=0.00..167.01 rows=1 width=4)
The FinalizeAggregate plan has an increased cost of 1 post patch due to the number of rows in the Gather node that went from 1 to 2 (rint(1 * 1.7)=2). This was enough to make the Agggregate plan cheaper. The test is to check the parallel hash join so updating it with the new cheapest plan looked fine.
Regards,
Anthonin
[1]: https://github.com/bonnefoa/postgres/tree/plan-candidates
While experimenting on an explain option to display all plan candidates (very rough prototype here [1]), I've noticed some discrepancies in some generated plans.
EXPLAIN (ALL_CANDIDATES) SELECT * FROM pgbench_accounts order by aid;
Plan 1
-> Gather Merge (cost=11108.32..22505.38 rows=100000 width=97)
Workers Planned: 1
-> Sort (cost=10108.31..10255.37 rows=58824 width=97)
Sort Key: aid
-> Parallel Seq Scan on pgbench_accounts (cost=0.00..2228.24 rows=58824 width=97)
Plan 2
-> Gather Merge (cost=11108.32..17873.08 rows=58824 width=97)
Workers Planned: 1
-> Sort (cost=10108.31..10255.37 rows=58824 width=97)
Sort Key: aid
-> Parallel Seq Scan on pgbench_accounts (cost=0.00..2228.24 rows=58824 width=97)
The 2 plans are similar except one Gather Merge has a lower 58K estimated rows.
The first plan is created with generate_useful_gather_paths with override_rows=false then create_gather_merge_path and will use rel->rows as the row count (so the 100K rows of pgbench_accounts).
#0: create_gather_merge_path(...) at pathnode.c:1885:30
#1: generate_useful_gather_paths(... override_rows=false) at allpaths.c:3286:11
#2: apply_scanjoin_target_to_paths(...) at planner.c:7744:3
#3: grouping_planner(...) at planner.c:1638:3
The second plan is created through create_ordered_paths then create_gather_merge_path and the number of rows is estimated to (worker_rows * parallel_workers). Since we only have 1 parallel worker, this yields 58K rows.
#0: create_gather_merge_path(...) at pathnode.c:1885:30
#1: create_ordered_paths(...) at planner.c:5287:5
#2: grouping_planner(...) at planner.c:1717:17
The 58K row estimate looks possibly incorrect. A worker row count is estimated using total_rows/parallel_divisor. The parallel_divisor will include the possible leader participation and will be 1.7 for 1 worker thus the 58K rows (100K/1.7=58K)
However, the gather merge will only do 58K*1, dropping the leader participation component.
I have a tentative patch split in two changes:
1: This is a related refactoring to remove an unnecessary and confusing assignment of rows in create_gather_merge_path. This value is never used and immediately overwritten in cost_gather_merge
2: This changes the row estimation of gather path to use worker_rows*parallel_divisor to get a more accurate estimation. Additionally, when creating a gather merge path in create_ordered_paths, we try to use the source's rel rows when available.
The patch triggered a small change in the hash_join regression test. Pre patch, we had the following candidates.
Plan 4
-> Aggregate (cost=511.01..511.02 rows=1 width=8)
-> Gather (cost=167.02..511.00 rows=2 width=0)
Workers Planned: 1
-> Parallel Hash Join (cost=167.02..510.80 rows=1 width=0)
Hash Cond: (r.id = s.id)
-> Parallel Seq Scan on simple r (cost=0.00..299.65 rows=11765 width=4)
-> Parallel Hash (cost=167.01..167.01 rows=1 width=4)
-> Parallel Seq Scan on extremely_skewed s (cost=0.00..167.01 rows=1 width=4)
Plan 5
-> Finalize Aggregate (cost=510.92..510.93 rows=1 width=8)
-> Gather (cost=510.80..510.91 rows=1 width=8)
Workers Planned: 1
-> Partial Aggregate (cost=510.80..510.81 rows=1 width=8)
-> Parallel Hash Join (cost=167.02..510.80 rows=1 width=0)
Hash Cond: (r.id = s.id)
-> Parallel Seq Scan on simple r (cost=0.00..299.65 rows=11765 width=4)
-> Parallel Hash (cost=167.01..167.01 rows=1 width=4)
-> Parallel Seq Scan on extremely_skewed s (cost=0.00..167.01 rows=1 width=4)
Post patch, the plan candidates became:
Plan 4
-> Finalize Aggregate (cost=511.02..511.03 rows=1 width=8)
-> Gather (cost=510.80..511.01 rows=2 width=8)
Workers Planned: 1
-> Partial Aggregate (cost=510.80..510.81 rows=1 width=8)
-> Parallel Hash Join (cost=167.02..510.80 rows=1 width=0)
Hash Cond: (r.id = s.id)
-> Parallel Seq Scan on simple r (cost=0.00..299.65 rows=11765 width=4)
-> Parallel Hash (cost=167.01..167.01 rows=1 width=4)
-> Parallel Seq Scan on extremely_skewed s (cost=0.00..167.01 rows=1 width=4)
Plan 5
-> Aggregate (cost=511.01..511.02 rows=1 width=8)
-> Gather (cost=167.02..511.00 rows=2 width=0)
Workers Planned: 1
-> Parallel Hash Join (cost=167.02..510.80 rows=1 width=0)
Hash Cond: (r.id = s.id)
-> Parallel Seq Scan on simple r (cost=0.00..299.65 rows=11765 width=4)
-> Parallel Hash (cost=167.01..167.01 rows=1 width=4)
-> Parallel Seq Scan on extremely_skewed s (cost=0.00..167.01 rows=1 width=4)
The FinalizeAggregate plan has an increased cost of 1 post patch due to the number of rows in the Gather node that went from 1 to 2 (rint(1 * 1.7)=2). This was enough to make the Agggregate plan cheaper. The test is to check the parallel hash join so updating it with the new cheapest plan looked fine.
Regards,
Anthonin
[1]: https://github.com/bonnefoa/postgres/tree/plan-candidates
Attachment
pgsql-hackers by date: