Re: Query with straightforward plan changes and becomes 6520 times slower after VACUUM ANALYZE - Mailing list pgsql-bugs
From | David Rowley |
---|---|
Subject | Re: Query with straightforward plan changes and becomes 6520 times slower after VACUUM ANALYZE |
Date | |
Msg-id | CAApHDvqRVzKuauE69-CHPBysVa28pp+s1+m1H=qhok85iMYrBA@mail.gmail.com Whole thread Raw |
In response to | Query with straightforward plan changes and becomes 6520 times slower after VACUUM ANALYZE (Jan Kort <jan.kort@genetics.nl>) |
Responses |
Re: Query with straightforward plan changes and becomes 6520 times slower after VACUUM ANALYZE
|
List | pgsql-bugs |
On Wed, 19 May 2021 at 01:36, Jan Kort <jan.kort@genetics.nl> wrote: > After VACUUM ANALYZE the plan becomes inefficient again, and does not start with the gfo_zaken_kosten primary key, theplan starts at the wrong end with an index scan on 1M rows: > > Merge Join (cost=1.48..1.59 rows=1 width=159) (actual time=619.374..619.376 rows=1 loops=1) > Merge Cond: (gfo_zaken.id = gfo_zaken_kosten.gfo_zaken_id) > -> Nested Loop (cost=0.43..96503.47 rows=1048587 width=155) (actual time=0.022..619.359 rows=9 loops=1) > Join Filter: (gfo_zaken.zaaktypecode_id = gfo_zaken_typecode.id) > Rows Removed by Join Filter: 4194316 > -> Index Scan using gfo_zakenp on gfo_zaken (cost=0.43..33587.23 rows=1048587 width=8) (actual time=0.006..141.167rows=1048587 loops=1) > -> Materialize (cost=0.00..1.06 rows=4 width=155) (actual time=0.000..0.000 rows=4 loops=1048587) > -> Seq Scan on gfo_zaken_typecode (cost=0.00..1.04 rows=4 width=155) (actual time=0.011..0.012 rows=4 loops=1) > -> Sort (cost=1.05..1.05 rows=1 width=8) (actual time=0.012..0.012 rows=1 loops=1) > Sort Key: gfo_zaken_kosten.gfo_zaken_id > Sort Method: quicksort Memory: 25kB > -> Seq Scan on gfo_zaken_kosten (cost=0.00..1.04 rows=1 width=8) (actual time=0.008..0.008 rows=1 loops=1) > Filter: (id = 13) > Rows Removed by Filter: 2 > Planning Time: 69.151 ms > Execution Time: 619.410 ms It looks like something is a bit weird with the merge join costing code. Merge join not a very good choice in this case as out of the 3 values you have in gfo_zaken_kosten, 2 are at the very start of the sorted merge join input in the gfo_zaken table and the 3rd is right at the end. That means the merge join must read all of gfo_zaken to join the 3 rows in gfo_zaken_kosten. Here's a minimal case to reproduce: drop table if exists million,three; create table million (id int primary key); create table three (id int primary key, million_id int not null); insert into million select x from generate_series(1,1000000) x; insert into three values(1,1),(2,1),(3,1000000); analyze million,three; explain analyze select * from million m inner join three t on m.id = t.million_id; Gives: Merge Join (cost=1.49..1.56 rows=3 width=12) The weird thing is that when I just put the two rows in the "three" table, that Merge Join is the planner's last choice: truncate three; insert into three values(1,1),(3,1000000); analyze three; set enable_nestloop=0; set enable_hashjoin=0; set max_parallel_Workers_per_Gather=0; explain analyze select * from million m inner join three t on m.id = t.million_id; Gives me: Merge Join (cost=1.46..32909.49 rows=2 width=12) A total cost of 32909.49 is quite a bit higher than the 1.56 of when the table had 3 rows. You'd expect the cost could only drop if we removed a row. The first choice is a parameterized nested loop: Nested Loop (cost=0.42..9.91 rows=2 width=12) So it appears that the total cost of the merge join of 1.56 is pretty unrealistic. I'll debug this and see if I can see what's going on. (I was a bit worried that this might have been down to the fairly new code that uses the presence of foreign keys to help with join selectivity estimations. It appears that's not the case.) David
pgsql-bugs by date: