Planner makes sub-optimal execution plan - Mailing list pgsql-performance

From Алексей Борщёв
Subject Planner makes sub-optimal execution plan
Date
Msg-id CAD2_BUOH-Y4QkJiibNjD8pSin_+qCWn+vJOYfdnQdP40jzNzVw@mail.gmail.com
Whole thread Raw
Responses Re: Planner makes sub-optimal execution plan
List pgsql-performance
Hi, all!

The execution plan was broken during upgrade PG13 to PG14
We've managed to emulate table and reproduce on PG17:

# cat /etc/*release
PRETTY_NAME="Debian GNU/Linux trixie/sid"

select version();
PostgreSQL 17.4 (Debian 17.4-1.pgdg110+2) on x86_64-pc-linux-gnu,
compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit

-- DROP TABLE IF EXISTS docum CASCADE;
CREATE TABLE docum (
    id BIGSERIAL PRIMARY KEY,
    dt TIMESTAMP,
    dt_real TIMESTAMP NOT NULL
);

-- Fill table with 12M data rows (approx. 30 sec):
DO $$
DECLARE
    batch_size INT := 50000;
    total_rows INT := 12000000;  -- 12 M records
    current_row INT := 0;
    base_date TIMESTAMP := '2024-01-01 00:00:00'::timestamp;
    cutoff_date TIMESTAMP := '2025-08-14 09:44:09.033592'::timestamp;
    time_increment INTERVAL;
BEGIN
    RAISE NOTICE 'Start. Total rows %', total_rows;
    ALTER TABLE docum SET (autovacuum_enabled = false);
    -- Calculate time increment for even distribution
    -- Data for ~590 days before cutoff_date
    time_increment := (cutoff_date - base_date) / (total_rows * 0.995);
    -- Insert sorted data for high correlation
    WHILE current_row < total_rows LOOP
        INSERT INTO docum (dt, dt_real)
        SELECT
            -- dt: almost in sync with dt_real bu with some noise and 5% NULL
            CASE
                WHEN random() < 0.05 THEN NULL
                ELSE base_date + (time_increment * (current_row + i)) +
                     (random() * INTERVAL '1 hour' - INTERVAL '30
minutes')  -- noise
            END,
            -- dt_real: monotiniacly increasing, correlation 0.999
            CASE
                -- 99.5% records before cutoff_date
                WHEN current_row < total_rows * 0.995 THEN
                    base_date + (time_increment * (current_row + i))
                -- 0.5% records after cutoff_date (~60K records)
                ELSE
                    cutoff_date + ((current_row + i - total_rows *
0.995) * INTERVAL '10 seconds')
            END
        FROM generate_series(0, LEAST(batch_size - 1, total_rows -
current_row - 1)) AS i;
        current_row := current_row + batch_size;
        -- Show progress:
        IF current_row % 500000 = 0 THEN
            RAISE NOTICE 'Processed: % rows (%.1f%%)', current_row,
(current_row::float / total_rows * 100);
        END IF;
    END LOOP;
    RAISE NOTICE 'Completed. Total rows inserted: %', current_row;
END $$;

-- Indexes:
CREATE INDEX docum_dt      ON docum(dt);
CREATE INDEX docum_dt_real ON docum(dt_real);

ALTER TABLE docum SET (autovacuum_enabled = true);

-- Gather statistics:
ALTER TABLE docum ALTER COLUMN dt SET STATISTICS 400;
ALTER TABLE docum ALTER COLUMN dt_real SET STATISTICS 400;
-- default_statistics_target = 400 will also work.

VACUUM ANALYZE docum;

-- The problematic query:
EXPLAIN (ANALYZE, VERBOSE, BUFFERS, SUMMARY, SETTINGS, TIMING)
SELECT
    MIN(docum.dt) AS "dt__min",
    MAX(docum.dt_real) AS "dt_real__max"
FROM docum
WHERE docum.dt_real >= '2025-08-14T09:44:09.033592'::timestamp;
-- The plan I've got is:
Result  (cost=8.38..8.39 rows=1 width=16) (actual
time=2660.034..2660.036 rows=1 loops=1)
  Output: (InitPlan 1).col1, (InitPlan 2).col1
  Buffers: shared hit=9358751 read=30994 written=1
  InitPlan 1
    ->  Limit  (cost=0.43..7.91 rows=1 width=8) (actual
time=2660.006..2660.007 rows=1 loops=1)
          Output: docum.dt
          Buffers: shared hit=9358747 read=30994 written=1
          ->  Index Scan using docum_dt_7ee1d676 on public.docum
(cost=0.43..420487.43 rows=56222 width=8) (actual
time=2660.004..2660.005 rows=1 loops=1)
                Output: docum.dt
                Index Cond: (docum.dt IS NOT NULL)
                Filter: (docum.dt_real >= '2025-08-14
09:44:09.033592'::timestamp without time zone)
                Rows Removed by Filter: 11342966
                Buffers: shared hit=9358747 read=30994 written=1
  InitPlan 2
    ->  Limit  (cost=0.43..0.46 rows=1 width=8) (actual
time=0.022..0.022 rows=1 loops=1)
          Output: docum_1.dt_real
          Buffers: shared hit=4
          ->  Index Only Scan Backward using docum_dt_real_2b81c58c on
public.docum docum_1  (cost=0.43..1689.22 rows=59245 width=8) (actual
time=0.021..0.021 rows=1 loops=1)
                Output: docum_1.dt_real
                Index Cond: (docum_1.dt_real >= '2025-08-14
09:44:09.033592'::timestamp without time zone)
                Heap Fetches: 0
                Buffers: shared hit=4
Settings: work_mem = '16MB', search_path = 'public, public, "$user"'
Planning:
  Buffers: shared hit=12
Planning Time: 0.148 ms
Execution Time: 2660.056 ms

This plan is not good:
Execution Time: 2660.056 ms, But cost=8.38..8.39 only - looks too small!
On this step:
 ->  Index Scan using docum_dt_7ee1d676 on public.docum
(cost=0.43..420487.43 ...
Higher cost = 420487 looks adequate, but it was lost somehow and did
not count into total query plan cost.

-- There is a possible fix for query: just add count(*) column to SELECT:
EXPLAIN (ANALYZE, VERBOSE, BUFFERS, SUMMARY, SETTINGS, TIMING)
SELECT count(*) as cnt,
    MIN(docum.dt) AS "dt__min",
    MAX(docum.dt_real) AS "dt_real__max"
FROM docum
WHERE docum.dt_real >= '2025-08-14T09:44:09.033592'::timestamp;

Aggregate  (cost=2512.56..2512.57 rows=1 width=24) (actual
time=10.179..10.180 rows=1 loops=1)
  Output: count(*), min(dt), max(dt_real)
  Buffers: shared hit=383 read=165
  ->  Index Scan using docum_dt_real_2b81c58c on public.docum
(cost=0.43..2068.22 rows=59245 width=16) (actual time=0.013..6.430
rows=59999 loops=1)
        Output: id, dt, dt_real
        Index Cond: (docum.dt_real >= '2025-08-14
09:44:09.033592'::timestamp without time zone)
        Buffers: shared hit=383 read=165
Settings: work_mem = '16MB', search_path = 'public, public, "$user"'
Planning:
  Buffers: shared hit=9 read=1
Planning Time: 0.927 ms
Execution Time: 10.199 ms

-- This plan cost = 2512 is much higher, but execution time is way better!

Did You manage to reproduce this plan?
Can I help You with more details?



pgsql-performance by date:

Previous
From: Renan Alves Fonseca
Date:
Subject: Re: Any way to get nested loop index joins on CTEs?
Next
From: Alena Rybakina
Date:
Subject: Re: Planner makes sub-optimal execution plan