Thread: BUG #16390: Regression between 12.2 and 11.6 on a recursive query : very slow and overestimation of rows
BUG #16390: Regression between 12.2 and 11.6 on a recursive query : very slow and overestimation of rows
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 16390 Logged by: JC Boggio Email address: postgresql@thefreecat.org PostgreSQL version: 12.2 Operating system: Linux/Ubuntu Description: Hello, Everything started on a big recursive query working fine with a filled database (all tables contain at least a few rows). It executes in ~120ms. But the same query, on a nearly empty DB takes ~10000ms on 12.2 and still ~100ms on 11.6 (same DB, vacuumed full and analyzed). So I started digging and simplified the query to try to understand. select version(); PostgreSQL 12.2 (Ubuntu 12.2-2.pgdg19.10+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008, 64-bit CREATE DATABASE test; \c test create table a (id int); create table b (id int); create table c (id int); insert into c values (1); vacuum full analyze; WITH RECURSIVE tmpctr1 AS ( SELECT 1 AS id FROM a JOIN b b1 ON a.id=b1.id JOIN b b2 ON a.id=b2.id --(1) ) ,rec1 AS ( SELECT c.id, 1 AS level FROM c UNION SELECT r.id, r.level+1 FROM rec1 r LEFT JOIN tmpctr1 c ON r.id=c.id LEFT JOIN tmpctr1 c2 ON r.id=c2.id --(2) WHERE r.level<20 and (c.id is not null or c2.id is not null --(3) ) ) select * from rec1; Timing: 190ms It is "only" 84ms on PG 11.6 QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------- CTE Scan on rec1 (cost=5480127.49..8057598.11 rows=128873531 width=8) (actual time=187.368..187.407 rows=1 loops=1) CTE tmpctr1 -> Merge Join (cost=539.35..7270.45 rows=414528 width=4) (actual time=0.007..0.007 rows=0 loops=1) Merge Cond: (b2.id = a.id) -> Sort (cost=179.78..186.16 rows=2550 width=4) (actual time=0.006..0.006 rows=0 loops=1) Sort Key: b2.id Sort Method: quicksort Memory: 25kB -> Seq Scan on b b2 (cost=0.00..35.50 rows=2550 width=4) (actual time=0.005..0.005 rows=0 loops=1) -> Materialize (cost=359.57..941.28 rows=32512 width=8) (never executed) -> Merge Join (cost=359.57..860.00 rows=32512 width=8) (never executed) Merge Cond: (a.id = b1.id) -> Sort (cost=179.78..186.16 rows=2550 width=4) (never executed) Sort Key: a.id -> Seq Scan on a (cost=0.00..35.50 rows=2550 width=4) (never executed) -> Sort (cost=179.78..186.16 rows=2550 width=4) (never executed) Sort Key: b1.id -> Seq Scan on b b1 (cost=0.00..35.50 rows=2550 width=4) (never executed) CTE rec1 -> Recursive Union (cost=0.00..5472857.05 rows=128873531 width=8) (actual time=187.365..187.403 rows=1 loops=1) -> Seq Scan on c (cost=0.00..1.01 rows=1 width=8) (actual time=187.360..187.363 rows=1 loops=1) -> Merge Left Join (cost=62937.61..289538.54 rows=12887353 width=8) (actual time=0.035..0.035 rows=0 loops=1) Merge Cond: (r.id = c2.id) Filter: ((c_1.id IS NOT NULL) OR (c2.id IS NOT NULL)) Rows Removed by Filter: 1 -> Sort (cost=10299.29..10314.83 rows=6218 width=12) (actual time=0.030..0.031 rows=1 loops=1) Sort Key: r.id Sort Method: quicksort Memory: 25kB -> Hash Right Join (cost=0.26..9907.48 rows=6218 width=12) (actual time=0.022..0.022 rows=1 loops=1) Hash Cond: (c_1.id = r.id) -> CTE Scan on tmpctr1 c_1 (cost=0.00..8290.56 rows=414528 width=4) (actual time=0.007..0.007 rows=0 loops=1) -> Hash (cost=0.22..0.22 rows=3 width=8) (actual time=0.005..0.005 rows=1 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> WorkTable Scan on rec1 r (cost=0.00..0.22 rows=3 width=8) (actual time=0.003..0.003 rows=1 loops=1) Filter: (level < 20) -> Materialize (cost=52638.32..54710.96 rows=414528 width=4) (actual time=0.002..0.002 rows=0 loops=1) -> Sort (cost=52638.32..53674.64 rows=414528 width=4) (actual time=0.001..0.001 rows=0 loops=1) Sort Key: c2.id Sort Method: quicksort Memory: 25kB -> CTE Scan on tmpctr1 c2 (cost=0.00..8290.56 rows=414528 width=4) (actual time=0.000..0.000 rows=0 loops=1) Planning Time: 0.574 ms JIT: Functions: 36 Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 9.996 ms, Inlining 12.520 ms, Optimization 116.679 ms, Emission 57.894 ms, Total 197.090 ms Execution Time: 198.803 ms (45 lignes) Temps : 200,336 ms You can see that in the query I have marked lines (1) (2) and (3). If I comment out line (1), exec time comes down to ~2ms and if I comment out lines (2) and (3) it falls to 42ms. I'm completely lost on this. Tables "a" and "b" are empty, so "tmpctr1" is empty also. But the estimated rows, costs and execution time are crazy. How can the Seq scan on "c" (1 row) take 187ms ? And how does the Seq scan on "a" and "b" both evaluate to 2550 rows ? And why commenting out line (1) makes such a difference ? Thanks for your help.
Re: BUG #16390: Regression between 12.2 and 11.6 on a recursive query : very slow and overestimation of rows
From
Tom Lane
Date:
PG Bug reporting form <noreply@postgresql.org> writes: > Everything started on a big recursive query working fine with a filled > database (all tables contain at least a few rows). It executes in ~120ms. > But the same query, on a nearly empty DB takes ~10000ms on 12.2 and still > ~100ms on 11.6 (same DB, vacuumed full and analyzed). FWIW, I get largely the same plan and runtime (~5ms) for this example on 11.x, 12.x, and HEAD, when using non-JIT-enabled builds. It looks like most of your runtime on 12.x is going into JIT compilation: > JIT: > Functions: 36 > Options: Inlining true, Optimization true, Expressions true, Deforming > true > Timing: Generation 9.996 ms, Inlining 12.520 ms, Optimization 116.679 ms, > Emission 57.894 ms, Total 197.090 ms It's not too surprising that the system thinks that compiling the query would be worth the effort, since it's overestimating the size of the recursive union so much. But this toy example doesn't really prove anything about that --- the misestimate is mostly because the planner won't believe that an empty table is empty. Which most of the time is a good safety feature, even if it goes off the rails in this particular case. In short: as is so often the case, examining the behavior on toy tables is not going to tell much about your real problem. Please see https://wiki.postgresql.org/wiki/Slow_Query_Questions for guidance about better ways to identify performance problems. And take the issue to pgsql-performance, because it's unlikely there's a bug here. regards, tom lane