BUG #17844: Memory consumption for memoize node - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #17844: Memory consumption for memoize node |
Date | |
Msg-id | 17844-d2f6f9e75a622bed@postgresql.org Whole thread Raw |
Responses |
Re: BUG #17844: Memory consumption for memoize node
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 17844 Logged by: Alexey Ermakov Email address: alexey.ermakov@dataegret.com PostgreSQL version: 14.7 Operating system: Ubuntu 20.04 Description: Hello, recently I found interesting query which consume a lot of memory (much more than expected with current work_mem and hash_mem_multiplier settings) and triggers OOM killer. After simplifying query looks like there is a problem in part where we join table on condition like: id = (((test_json.big_json_column)::json ->> 'id'::text))::integer and there is a memoize node in the plan. Without memoize node query works fine. I was able to create reproducible test case (on machine with default config and postgresql 14.7): ------------------------------------------------------------------------------ create extension pgcrypto; --create function to generate dummy data CREATE OR REPLACE FUNCTION generate_random_string(size INT) RETURNS TEXT AS $$ DECLARE characters TEXT := 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789'; bytes BYTEA := gen_random_bytes(size); l INT := length(characters); i INT := 0; output TEXT := ''; BEGIN WHILE i < size LOOP output := output || substr(characters, get_byte(bytes, i) % l + 1, 1); i := i + 1; END LOOP; RETURN output; END; $$ LANGUAGE plpgsql VOLATILE; --create table with 200k rows and text column with content like `{"random_string":"....", "id":1}` create table test_json as select id, ('{"random_string":"' || generate_random_string(500) || '", "id":' || id || '}')::text as big_json_column from generate_series(1,200000) gs(id); --duplicate rows and update statistics (to trigger memoize node) insert into test_json select * from test_json; analyze test_json; --another table with similar structure and 100k rows create table test_json2 as select id, ('{"random_string":"' || generate_random_string(500) || '", "id":' || id || '}')::json as big_json_column from generate_series(1,100000) gs(id); alter table test_json2 add primary key(id); --table sizes: test_json - 223MB, test_json2 - 56MB --change settings to trigger plan with memoize node set work_mem = '128MB'; set enable_hashjoin = off; set enable_mergejoin = off; set jit = off; explain (analyze, buffers) select test_json.id, test_json2.id from test_json left join test_json2 on test_json2.id = ((test_json.big_json_column::json)->>'id')::int; Nested Loop Left Join (cost=0.32..108047.22 rows=400000 width=8) (actual time=0.031..3155.261 rows=400000 loops=1) Buffers: shared hit=415739 read=12834 I/O Timings: read=18.254 -> Seq Scan on test_json (cost=0.00..32572.00 rows=400000 width=540) (actual time=0.005..57.248 rows=400000 loops=1) Buffers: shared hit=15738 read=12834 I/O Timings: read=18.254 -> Memoize (cost=0.32..0.34 rows=1 width=4) (actual time=0.008..0.008 rows=0 loops=400000) Cache Key: (((test_json.big_json_column)::json ->> 'id'::text))::integer Cache Mode: logical Hits: 200000 Misses: 200000 Evictions: 0 Overflows: 0 Memory Usage: 16797kB Buffers: shared hit=400001 -> Index Only Scan using test_json2_pkey on test_json2 (cost=0.30..0.33 rows=1 width=4) (actual time=0.001..0.001 rows=0 loops=200000) Index Cond: (id = (((test_json.big_json_column)::json ->> 'id'::text))::integer) Heap Fetches: 0 Buffers: shared hit=400001 Planning Time: 0.069 ms Execution Time: 3227.078 ms ------------------------------------------------------------------------------ During execution I looked on "avail Mem" in top output on test machine to check how much memory process consume. It looked different each time, usually hundreds of MB, sometime around 1.5GB (which is even bigger than table size). I was able to trigger OOM killer with this query and bigger test_json table with similar data. I'm wondering: 1) Is it a known bug ? Does it relate to json parsing somehow ? 2) Is it possible to show such memory consumption in explain (analyze, buffers) output for easier troubleshooting ? -- Thanks, Alexey Ermakov
pgsql-bugs by date: