Home > mailing lists

Re: Reference to - BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker - Mailing list pgsql-bugs

From	Craig Milhiser
Subject	Re: Reference to - BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker
Date	October 2, 2024 03:12:41
Msg-id	CA+wnhO3kT7mxbadbteppSZVcs3FRPvGL5xTY2DbrWSGohmhyVw@mail.gmail.com Whole thread Raw
In response to	Re: Reference to - BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker (Thomas Munro <thomas.munro@gmail.com>)
Responses	Re: Reference to - BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker
List	pgsql-bugs

Tree view

> Since you're building from source, you could try applying the patch
>posted by Andrei Lephikov:
> https://www.postgresql.org/message-id/7d763a6d-fad7-49b6-beb0-86f99ce4a6eb%40postgrespro.ru

This did not work for me. I am running out of memory.

I applied the patch, make clean, make, make check, sudo make install. I am running out of the box Postgres configuration.

Memory below uses "free -m".

Before loading Postgres
total used free shared buff/cache available
Mem: 31388 669 30467 2 639 30719
Swap: 0 0 0

After loading
total used free shared buff/cache available
Mem: 31388 672 30464 14 651 30715
Swap: 0 0 0

I go into psql
set max_parallel_workers_per_gather = 0;
run the query multiple times, takes 9.5 seconds at steady state, returns 20 rows.

Memory is still available

total used free shared buff/cache available
Mem: 31388 921 22547 142 8460 30466
Swap: 0 0 0

In the same psql session, set max_parallel_workers_per_gather = 2; then run the query again. This runs for 1 minute then:

2024-10-01 18:28:45.883 UTC [2586] LOG: background worker "parallel worker" (PID 4465) was terminated by signal 9: Killed
2024-10-01 18:28:45.883 UTC [2586] DETAIL: Failed process was running: SELECT
...
2024-10-01 18:28:45.883 UTC [2586] LOG: terminating any other active server processes
2024-10-01 18:28:46.620 UTC [2586] LOG: all server processes terminated; reinitializing

I got this as close to the end as I could
total used free shared buff/cache available
Mem: 31388 31014 535 1955 2156 373
Swap: 0 0 0

Though OOM conditions often means all bets are off for behavior, I tried something different. I rebooted, started Postgres then run the query. I do not set parallel_... = 0 and run the query which populated the cache. The machine exhausts memory again but usually "hangs". I need to restart. Below is the frozen screen
total used free shared buff/cache available
Mem: 31388 31317 240 1955 2140 70
Swap: 0 0 0

I ran these sequences multiple times. I also analyzed the data again just to make sure.

I reverted the patch to make sure I am reproducing the issue. I get the same 1.8GB allocation failure with parallel. Without parallel the query takes ~10 seconds. The patch increased the single worker performance for this query for out of the box configuration by 5%.

Thanks

On Sun, Sep 29, 2024 at 9:15 PM Thomas Munro <thomas.munro@gmail.com> wrote:

On Mon, Sep 30, 2024 at 12:03 PM Craig Milhiser <craig@milhiser.com> wrote:
> I reproduced the issue on v17. I downloaded the source tarball, built it, passed tests, put my production database, analyzed and ran the query. As you expected, the same issue occurred. I have opened the incident with the AWS team as well.

Since you're building from source, you could try applying the patch
posted by Andrei Lephikov:

https://www.postgresql.org/message-id/7d763a6d-fad7-49b6-beb0-86f99ce4a6eb%40postgrespro.ru

I suspect we may want to limit it to a smaller number than that, as
mentioned already, and I think we should also apply the same cap to
the initial estimate (Andrei's patch only caps it when it decides to
increase it, not for the initial nbatch number). I can write a patch
like that in a few days when I return from travelling, and we can aim
to get it into the November release, but I suspect Andrei's patch
might already avoid the error for your case.

pgsql-bugs by date:

From: "David G. Johnston"
Date: 01 October 2024, 22:00:29
Subject: Re: Linux OOM killer

From: Andrei Lepikhov
Date: 02 October 2024, 03:35:04
Subject: Re: BUG #18643: EXPLAIN estimated rows mismatch

Re: Reference to - BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker - Mailing list pgsql-bugs

Previous

Next