Re: Reference to - BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker - Mailing list pgsql-bugs
From | Craig Milhiser |
---|---|
Subject | Re: Reference to - BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker |
Date | |
Msg-id | CA+wnhO3kT7mxbadbteppSZVcs3FRPvGL5xTY2DbrWSGohmhyVw@mail.gmail.com Whole thread Raw |
In response to | Re: Reference to - BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: Reference to - BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker
|
List | pgsql-bugs |
> Since you're building from source, you could try applying the patch
>posted by Andrei Lephikov:
> https://www.postgresql.org/message-id/7d763a6d-fad7-49b6-beb0-86f99ce4a6eb%40postgrespro.ru
>posted by Andrei Lephikov:
> https://www.postgresql.org/message-id/7d763a6d-fad7-49b6-beb0-86f99ce4a6eb%40postgrespro.ru
This did not work for me. I am running out of memory.
I applied the patch, make clean, make, make check, sudo make install. I am running out of the box Postgres configuration.
Memory below uses "free -m".
Before loading Postgres
total used free shared buff/cache available
Mem: 31388 669 30467 2 639 30719
Swap: 0 0 0
After loading
total used free shared buff/cache available
Mem: 31388 672 30464 14 651 30715
Swap: 0 0 0
I go into psql
set max_parallel_workers_per_gather = 0;
run the query multiple times, takes 9.5 seconds at steady state, returns 20 rows.
Memory is still available
total used free shared buff/cache available
Mem: 31388 921 22547 142 8460 30466
Swap: 0 0 0
In the same psql session, set max_parallel_workers_per_gather = 2; then run the query again. This runs for 1 minute then:
2024-10-01 18:28:45.883 UTC [2586] LOG: background worker "parallel worker" (PID 4465) was terminated by signal 9: Killed
2024-10-01 18:28:45.883 UTC [2586] DETAIL: Failed process was running: SELECT
...
2024-10-01 18:28:45.883 UTC [2586] LOG: terminating any other active server processes
2024-10-01 18:28:46.620 UTC [2586] LOG: all server processes terminated; reinitializing
I got this as close to the end as I could
total used free shared buff/cache available
Mem: 31388 31014 535 1955 2156 373
Swap: 0 0 0
Though OOM conditions often means all bets are off for behavior, I tried something different. I rebooted, started Postgres then run the query. I do not set parallel_... = 0 and run the query which populated the cache. The machine exhausts memory again but usually "hangs". I need to restart. Below is the frozen screen
total used free shared buff/cache available
Mem: 31388 31317 240 1955 2140 70
Swap: 0 0 0
I ran these sequences multiple times. I also analyzed the data again just to make sure.
I reverted the patch to make sure I am reproducing the issue. I get the same 1.8GB allocation failure with parallel. Without parallel the query takes ~10 seconds. The patch increased the single worker performance for this query for out of the box configuration by 5%.
I applied the patch, make clean, make, make check, sudo make install. I am running out of the box Postgres configuration.
Memory below uses "free -m".
Before loading Postgres
total used free shared buff/cache available
Mem: 31388 669 30467 2 639 30719
Swap: 0 0 0
After loading
total used free shared buff/cache available
Mem: 31388 672 30464 14 651 30715
Swap: 0 0 0
I go into psql
set max_parallel_workers_per_gather = 0;
run the query multiple times, takes 9.5 seconds at steady state, returns 20 rows.
Memory is still available
total used free shared buff/cache available
Mem: 31388 921 22547 142 8460 30466
Swap: 0 0 0
In the same psql session, set max_parallel_workers_per_gather = 2; then run the query again. This runs for 1 minute then:
2024-10-01 18:28:45.883 UTC [2586] LOG: background worker "parallel worker" (PID 4465) was terminated by signal 9: Killed
2024-10-01 18:28:45.883 UTC [2586] DETAIL: Failed process was running: SELECT
...
2024-10-01 18:28:45.883 UTC [2586] LOG: terminating any other active server processes
2024-10-01 18:28:46.620 UTC [2586] LOG: all server processes terminated; reinitializing
I got this as close to the end as I could
total used free shared buff/cache available
Mem: 31388 31014 535 1955 2156 373
Swap: 0 0 0
Though OOM conditions often means all bets are off for behavior, I tried something different. I rebooted, started Postgres then run the query. I do not set parallel_... = 0 and run the query which populated the cache. The machine exhausts memory again but usually "hangs". I need to restart. Below is the frozen screen
total used free shared buff/cache available
Mem: 31388 31317 240 1955 2140 70
Swap: 0 0 0
I ran these sequences multiple times. I also analyzed the data again just to make sure.
I reverted the patch to make sure I am reproducing the issue. I get the same 1.8GB allocation failure with parallel. Without parallel the query takes ~10 seconds. The patch increased the single worker performance for this query for out of the box configuration by 5%.
Thanks
On Sun, Sep 29, 2024 at 9:15 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Mon, Sep 30, 2024 at 12:03 PM Craig Milhiser <craig@milhiser.com> wrote:
> I reproduced the issue on v17. I downloaded the source tarball, built it, passed tests, put my production database, analyzed and ran the query. As you expected, the same issue occurred. I have opened the incident with the AWS team as well.
Since you're building from source, you could try applying the patch
posted by Andrei Lephikov:
https://www.postgresql.org/message-id/7d763a6d-fad7-49b6-beb0-86f99ce4a6eb%40postgrespro.ru
I suspect we may want to limit it to a smaller number than that, as
mentioned already, and I think we should also apply the same cap to
the initial estimate (Andrei's patch only caps it when it decides to
increase it, not for the initial nbatch number). I can write a patch
like that in a few days when I return from travelling, and we can aim
to get it into the November release, but I suspect Andrei's patch
might already avoid the error for your case.
pgsql-bugs by date: