Thread: BUG #16390: Regression between 12.2 and 11.6 on a recursive query : very slow and overestimation of rows

BUG #16390: Regression between 12.2 and 11.6 on a recursive query : very slow and overestimation of rows

From

PG Bug reporting form

Date:

25 April 2020, 15:01:04

The following bug has been logged on the website:

Bug reference:      16390
Logged by:          JC Boggio
Email address:      postgresql@thefreecat.org
PostgreSQL version: 12.2
Operating system:   Linux/Ubuntu
Description:

Hello,

Everything started on a big recursive query working fine with a filled
database (all tables contain at least a few rows). It executes in ~120ms.
But the same query, on a nearly empty DB takes ~10000ms on 12.2 and still
~100ms on 11.6 (same DB, vacuumed full and analyzed).

So I started digging and simplified the query to try to understand.

select version();
PostgreSQL 12.2 (Ubuntu 12.2-2.pgdg19.10+1) on x86_64-pc-linux-gnu, compiled
by gcc (Ubuntu 9.2.1-9ubuntu2) 9.2.1 20191008, 64-bit

CREATE DATABASE test;
\c test
create table a (id int);
create table b (id int);
create table c (id int);
insert into c values (1);
vacuum full analyze;

WITH RECURSIVE
tmpctr1 AS (
        SELECT 1 AS id
        FROM a
        JOIN b b1 ON a.id=b1.id
        JOIN b b2 ON a.id=b2.id  --(1)
    )
    ,rec1 AS (
        SELECT c.id, 1 AS level
        FROM c 
    
        UNION

        SELECT r.id, r.level+1
        FROM rec1 r
        LEFT JOIN tmpctr1 c ON r.id=c.id
        LEFT JOIN tmpctr1 c2 ON r.id=c2.id  --(2)
        WHERE r.level<20
        and (c.id is not null 
        or c2.id is not null  --(3)
        )
    )
select * from rec1;

Timing: 190ms

It is "only" 84ms on PG 11.6

                                                                QUERY PLAN
                                                               

---------------------------------------------------------------------------------------------------------------------------------------------
 CTE Scan on rec1  (cost=5480127.49..8057598.11 rows=128873531 width=8)
(actual time=187.368..187.407 rows=1 loops=1)
   CTE tmpctr1
     ->  Merge Join  (cost=539.35..7270.45 rows=414528 width=4) (actual
time=0.007..0.007 rows=0 loops=1)
           Merge Cond: (b2.id = a.id)
           ->  Sort  (cost=179.78..186.16 rows=2550 width=4) (actual
time=0.006..0.006 rows=0 loops=1)
                 Sort Key: b2.id
                 Sort Method: quicksort  Memory: 25kB
                 ->  Seq Scan on b b2  (cost=0.00..35.50 rows=2550 width=4)
(actual time=0.005..0.005 rows=0 loops=1)
           ->  Materialize  (cost=359.57..941.28 rows=32512 width=8) (never
executed)
                 ->  Merge Join  (cost=359.57..860.00 rows=32512 width=8)
(never executed)
                       Merge Cond: (a.id = b1.id)
                       ->  Sort  (cost=179.78..186.16 rows=2550 width=4)
(never executed)
                             Sort Key: a.id
                             ->  Seq Scan on a  (cost=0.00..35.50 rows=2550
width=4) (never executed)
                       ->  Sort  (cost=179.78..186.16 rows=2550 width=4)
(never executed)
                             Sort Key: b1.id
                             ->  Seq Scan on b b1  (cost=0.00..35.50
rows=2550 width=4) (never executed)
   CTE rec1
     ->  Recursive Union  (cost=0.00..5472857.05 rows=128873531 width=8)
(actual time=187.365..187.403 rows=1 loops=1)
           ->  Seq Scan on c  (cost=0.00..1.01 rows=1 width=8) (actual
time=187.360..187.363 rows=1 loops=1)
           ->  Merge Left Join  (cost=62937.61..289538.54 rows=12887353
width=8) (actual time=0.035..0.035 rows=0 loops=1)
                 Merge Cond: (r.id = c2.id)
                 Filter: ((c_1.id IS NOT NULL) OR (c2.id IS NOT NULL))
                 Rows Removed by Filter: 1
                 ->  Sort  (cost=10299.29..10314.83 rows=6218 width=12)
(actual time=0.030..0.031 rows=1 loops=1)
                       Sort Key: r.id
                       Sort Method: quicksort  Memory: 25kB
                       ->  Hash Right Join  (cost=0.26..9907.48 rows=6218
width=12) (actual time=0.022..0.022 rows=1 loops=1)
                             Hash Cond: (c_1.id = r.id)
                             ->  CTE Scan on tmpctr1 c_1
(cost=0.00..8290.56 rows=414528 width=4) (actual time=0.007..0.007 rows=0
loops=1)
                             ->  Hash  (cost=0.22..0.22 rows=3 width=8)
(actual time=0.005..0.005 rows=1 loops=1)
                                   Buckets: 1024  Batches: 1  Memory Usage:
9kB
                                   ->  WorkTable Scan on rec1 r
(cost=0.00..0.22 rows=3 width=8) (actual time=0.003..0.003 rows=1 loops=1)
                                         Filter: (level < 20)
                 ->  Materialize  (cost=52638.32..54710.96 rows=414528
width=4) (actual time=0.002..0.002 rows=0 loops=1)
                       ->  Sort  (cost=52638.32..53674.64 rows=414528
width=4) (actual time=0.001..0.001 rows=0 loops=1)
                             Sort Key: c2.id
                             Sort Method: quicksort  Memory: 25kB
                             ->  CTE Scan on tmpctr1 c2  (cost=0.00..8290.56
rows=414528 width=4) (actual time=0.000..0.000 rows=0 loops=1)
 Planning Time: 0.574 ms
 JIT:
   Functions: 36
   Options: Inlining true, Optimization true, Expressions true, Deforming
true
   Timing: Generation 9.996 ms, Inlining 12.520 ms, Optimization 116.679 ms,
Emission 57.894 ms, Total 197.090 ms
 Execution Time: 198.803 ms
(45 lignes)

Temps : 200,336 ms


You can see that in the query I have marked lines (1) (2) and (3).

If I comment out line (1), exec time comes down to ~2ms and if I comment out
lines (2) and (3) it falls to 42ms.

I'm completely lost on this. Tables "a" and "b" are empty, so "tmpctr1" is
empty also. But the estimated rows, costs and execution time are crazy.
How can the Seq scan on "c" (1 row) take 187ms ? And how does the Seq scan
on "a" and "b" both evaluate to 2550 rows ?
And why commenting out line (1) makes such a difference ?

Thanks for your help.

Re: BUG #16390: Regression between 12.2 and 11.6 on a recursive query : very slow and overestimation of rows

From

Tom Lane

Date:

25 April 2020, 21:49:40

PG Bug reporting form <noreply@postgresql.org> writes:
> Everything started on a big recursive query working fine with a filled
> database (all tables contain at least a few rows). It executes in ~120ms.
> But the same query, on a nearly empty DB takes ~10000ms on 12.2 and still
> ~100ms on 11.6 (same DB, vacuumed full and analyzed).

FWIW, I get largely the same plan and runtime (~5ms) for this example on
11.x, 12.x, and HEAD, when using non-JIT-enabled builds.  It looks like
most of your runtime on 12.x is going into JIT compilation:

>  JIT:
>    Functions: 36
>    Options: Inlining true, Optimization true, Expressions true, Deforming
> true
>    Timing: Generation 9.996 ms, Inlining 12.520 ms, Optimization 116.679 ms,
> Emission 57.894 ms, Total 197.090 ms

It's not too surprising that the system thinks that compiling the query
would be worth the effort, since it's overestimating the size of the
recursive union so much.  But this toy example doesn't really prove
anything about that --- the misestimate is mostly because the planner
won't believe that an empty table is empty.  Which most of the time is a
good safety feature, even if it goes off the rails in this particular
case.

In short: as is so often the case, examining the behavior on toy tables
is not going to tell much about your real problem.  Please see

https://wiki.postgresql.org/wiki/Slow_Query_Questions

for guidance about better ways to identify performance problems.
And take the issue to pgsql-performance, because it's unlikely
there's a bug here.

            regards, tom lane