BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker |
Date | |
Msg-id | 18349-83d33dd3d0c855c3@postgresql.org Whole thread Raw |
Responses |
Re: BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 18349 Logged by: Alexey Ermakov Email address: alexey.ermakov@dataegret.com PostgreSQL version: 16.2 Operating system: Rocky Linux release 9.3 (Blue Onyx) Description: Hello! I found a problem with memory consumption of parallel worker when postgres 16.2 runs complex analytical query. After simplifying I was able to reproduce it with following query with just 2 joins: 2024-02-16 12:14:25.669 UTC [425424] ERROR: XX000: invalid DSA memory alloc request size 1811939328 2024-02-16 12:14:25.669 UTC [425424] LOCATION: dsa_allocate_extended, dsa.c:690 2024-02-16 12:14:25.669 UTC [425424] STATEMENT: SELECT * FROM history h LEFT JOIN coupon cpn ON cpn.recloc = h.recloc AND cpn.version = h.version LEFT JOIN free_text i2 ON i2.recloc = cpn.recloc AND i2.part = 1 WHERE h.date_operation >= '2024-02-13' and h.date_operation < '2024-02-14'; 2024-02-16 12:14:25.669 UTC [425424] DEBUG: 00000: shmem_exit(1): 5 before_shmem_exit callbacks to make 2024-02-16 12:14:25.669 UTC [425424] LOCATION: shmem_exit, ipc.c:240 Here is some memory/parallel workers- related settings: max_parallel_workers = 12 max_parallel_workers_per_gather = 2 hash_mem_multiplier = 2 work_mem = '4MB' When I increase work_mem to 16MB then query runs successfully (same plan): Gather (cost=1082271.57..8357367.15 rows=26292632 width=345) (actual time=50735.451..64000.782 rows=1325161 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=4102227 read=7004436, temp read=1808967 written=1809044 I/O Timings: shared read=37739.143, temp read=4809.969 write=9139.669 -> Parallel Hash Right Join (cost=1081271.57..5727103.95 rows=10955263 width=345) (actual time=50716.477..61271.276 rows=441720 loops=3) Hash Cond: ((i2.recloc)::text = (cpn.recloc)::text) Buffers: shared hit=4102227 read=7004436, temp read=1808967 written=1809044 I/O Timings: shared read=37739.143, temp read=4809.969 write=9139.669 -> Parallel Seq Scan on free_text i2 (cost=0.00..4160436.50 rows=74411061 width=63) (actual time=0.196..37944.130 rows=60240671 loops=3) Filter: (part = 1) Rows Removed by Filter: 160726692 Buffers: shared hit=158869 read=7004436 I/O Timings: shared read=37739.143 -> Parallel Hash (cost=1077675.12..1077675.12 rows=220524 width=282) (actual time=1406.500..1406.503 rows=209531 loops=3) Buckets: 131072 Batches: 8 Memory Usage: 30496kB Buffers: shared hit=3943254, temp written=10316 I/O Timings: temp write=58.043 -> Nested Loop Left Join (cost=1.14..1077675.12 rows=220524 width=282) (actual time=0.048..1239.847 rows=209531 loops=3) Buffers: shared hit=3943254 -> Parallel Index Scan using history_index_date on history h (cost=0.57..42000.62 rows=156296 width=90) (actual time=0.021..76.927 rows=198024 loops=3) Index Cond: ((date_operation >= '2024-02-13 00:00:00'::timestamp without time zone) AND (date_operation < '2024-02-14 00:00:00'::timestamp without time zone)) Buffers: shared hit=260503 -> Index Scan using coupon_index_rl on coupon cpn (cost=0.57..6.62 rows=1 width=192) (actual time=0.004..0.005 rows=1 loops=594071) Index Cond: ((recloc)::text = (h.recloc)::text) Filter: (version = h.version) Rows Removed by Filter: 2 Buffers: shared hit=3682751 Planning: Buffers: shared hit=72 Planning Time: 0.506 ms Execution Time: 64060.476 ms (32 rows) Interesting thing: after query fails it take some time (seconds) before psql gives control back. It turns out that postgresql written 1M+ (!) temporary files (most of them are empty) and cleaning up takes some time. With bigger work_mem it won't write so many files. Maybe memory issue somehow related with number of files. In PostgreSQL 16.2 there were fix for similar bug: [1],[2]. Could it be that there is some issue with it ? I wasn't able to make reproducible test case so far but may be given information is enough... [1] https://github.com/postgres/postgres/commit/2a67b5a60ee68892bb028587ddc6de7650822480 [2] https://www.postgresql.org/message-id/flat/16925-ec96d83529d0d629%40postgresql.org Thanks, Alexey Ermakov
pgsql-bugs by date: