Planner issue with BitmapScan recheck on external TOAST - Mailing list pgsql-hackers

From Jim Nasby
Subject Planner issue with BitmapScan recheck on external TOAST
Date
Msg-id A5EF3D27-5202-4CB1-B26A-BC36CC973243@upgrade.com
Whole thread Raw
List pgsql-hackers
I’ve been testing use of a BRIN index on record creation date (timestamptz) on a snapshot of a production system. Note
thatafter creating the BRIN index the number of buffers being accessed jumps from 23838 to 191663. Based on what
EXPLAINis showing, I believe the issue is that the planner doesn’t understand that each additional row that goes
throughthe (repsrv_account_ids(data) && …) recheck results in fetching at least one TOAST chunk. (I’d like to know if
myassumption about TOAST is correct here; it’s the only thing I can think of to explain these block numbers from the
2ndEXPLAIN…) 

   Heap Blocks: exact=11024
   Buffers: shared hit=191663
   ->  BitmapAnd  (cost=4903.00..4903.00 rows=14930 width=0) (actual time=74.704..74.705 rows=0 loops=1)
         Buffers: shared hit=1926


Unfortunately I haven’t been able to create an independent repo of this issue, so this report is based on PG 16 (most
recentI can test in production). repsrv_account_ids() is a function that extracts a field from a JSONB document (the
datafield). create_date is timestamptz. There’s 17 other fields in the table that I’m omitting (I can share if needed,
butwould need to talk to some folks over here about it). 

Indexes:
    "task_execution_pkey" PRIMARY KEY, btree (id)
    "task_execution__create_date_brin" brin (create_date)
    "task_execution_create_date_idx" btree (create_date)

explain (analyze,buffers) SELECT 1

FROMtask_execution te
                                                                                          WHERE (te.create_date BETWEEN
'2024-7-1'::timestamptzAND '2024-9-5'::timestamptz)
                                                            and repsrv_account_ids(te.data) &&
'{3000003,4000003,5000003,3000001,5000006,8805604}'::text[]
                                                       ; 
                                                                          QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1236.73..103675.01 rows=11291 width=4) (actual time=11.356..41.246 rows=9303 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=23838
   ->  Parallel Bitmap Heap Scan on task_execution te  (cost=236.73..101545.91 rows=4705 width=4) (actual
time=6.659..34.198rows=3101 loops=3) 
         Recheck Cond: (repsrv_account_ids(data) && '{3000003,4000003,5000003,3000001,5000006,8805604}'::text[])
         Filter: ((create_date >= '2024-07-01 00:00:00+00'::timestamp with time zone) AND (create_date <= '2024-09-05
00:00:00+00'::timestampwith time zone)) 
         Rows Removed by Filter: 5638
         Heap Blocks: exact=14066
         Buffers: shared hit=23838
         ->  Bitmap Index Scan on task_execution__account_ids  (cost=0.00..233.91 rows=26469 width=0) (actual
time=7.304..7.304rows=26218 loops=1) 
               Index Cond: (repsrv_account_ids(data) && '{3000003,4000003,5000003,3000001,5000006,8805604}'::text[])
               Buffers: shared hit=32
 Planning:
   Buffers: shared hit=1
 Planning Time: 0.188 ms
 Execution Time: 41.791 ms
(17 rows)

CREATE INDEX task_execution__create_date_brin ON task_execution USING brin (create_date) WITH (pages_per_range=8);
CREATE INDEX

explain (analyze,buffers) SELECT 1

FROMtask_execution te
                                                                                          WHERE (te.create_date BETWEEN
'2024-7-1'::timestamptzAND '2024-9-5'::timestamptz)
                                                            and repsrv_account_ids(te.data) &&
'{3000003,4000003,5000003,3000001,5000006,8805604}'::text[]
                                                       ; 

QUERY PLAN 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on task_execution te  (cost=4903.00..65471.86 rows=11291 width=4) (actual time=76.688..774.362
rows=9303loops=1) 
   Recheck Cond: ((repsrv_account_ids(data) && '{3000003,4000003,5000003,3000001,5000006,8805604}'::text[]) AND
(create_date>= '2024-07-01 00:00:00+00'::timestamp with time zone) AND (create_date <= '2024-09-05
00:00:00+00'::timestampwith time zone)) 
   Rows Removed by Index Recheck: 2895
   Heap Blocks: exact=11024
   Buffers: shared hit=191663
   ->  BitmapAnd  (cost=4903.00..4903.00 rows=14930 width=0) (actual time=74.704..74.705 rows=0 loops=1)
         Buffers: shared hit=1926
         ->  Bitmap Index Scan on task_execution__account_ids  (cost=0.00..233.91 rows=26469 width=0) (actual
time=5.103..5.103rows=26218 loops=1) 
               Index Cond: (repsrv_account_ids(data) && '{3000003,4000003,5000003,3000001,5000006,8805604}'::text[])
               Buffers: shared hit=32
         ->  Bitmap Index Scan on task_execution__create_date_brin  (cost=0.00..4663.20 rows=1461738 width=0) (actual
time=68.380..68.380rows=1999200 loops=1) 
               Index Cond: ((create_date >= '2024-07-01 00:00:00+00'::timestamp with time zone) AND (create_date <=
'2024-09-0500:00:00+00'::timestamp with time zone)) 
               Buffers: shared hit=1894
 Planning:
   Buffers: shared hit=19
 Planning Time: 1.519 ms
 Execution Time: 775.039 ms
(17 rows)

SELECT :stat_fields FROM pg_stats WHERE tablename='task_execution' AND schemaname='copy' AND attname in ('create_date',
'data');
 schemaname |   tablename    |   attname   | null_frac | avg_width | n_distinct  | correlation
------------+----------------+-------------+-----------+-----------+-------------+-------------
 copy       | task_execution | create_date |         0 |         8 | -0.72454053 |  0.57420146
 copy       | task_execution | data        |         0 |        67 |          -1 |  0.75746745
(2 rows)

SELECT relpages, reltuples FROM pg_class WHERE oid = 'copy.task_execution'::regclass;
 relpages |  reltuples
----------+--------------
   340251 | 2.601468e+06
(1 row)


pgsql-hackers by date:

Previous
From: Ranier Vilela
Date:
Subject: Re: define pg_structiszero(addr, s, r)
Next
From: Robert Haas
Date:
Subject: Re: Reduce one comparison in binaryheap's sift down