Re: BUG #18675: Postgres is not realasing memory causing OOM - Mailing list pgsql-bugs

From Maciej Jaros
Subject Re: BUG #18675: Postgres is not realasing memory causing OOM
Date
Msg-id CAOn6LZqgcnXFAdkoxRUTnH5B0g_JN0EmCNERvRREpT0_ZCvBog@mail.gmail.com
Whole thread Raw
In response to Re: BUG #18675: Postgres is not realasing memory causing OOM  (Tomas Vondra <tomas@vondra.me>)
Responses Re: BUG #18675: Postgres is not realasing memory causing OOM
Re: BUG #18675: Postgres is not realasing memory causing OOM
List pgsql-bugs

Thanks, Tomas. That helped me understand the problem and the comments a bit more. To answer some questions about our setup and possible causes of the issues:

  • We are not using any non-standard extensions. We use PL/pgSQL in some maintenance scripts, but that extension is built-in, so I guess you didn't mean that.
  • This is mostly a default configuration (aside from memory and CPU adjustments). More specifically all of the JIT options are in their default states (none are set).
  • For context about connections, we have 10 app servers (Tomcat, JDK 17, Hibernate 5) connecting to the database in question, each with around 20 active connections in practice (pooling, though some schedules might add a bit). There is also a data warehouse with pooling and should not exceed 20 connections. So, in practice, we have around 250 connections, not the 500 we have in max_connections setting. Also most of the connections are idle most of the time. So at least our max_connections is quite conservative, I think.
  • We terminate all queries running longer than 30 minutes. Typical queries are below 1 second, and Java responses are mostly limited to 20 seconds. Additionally, most queries have a limit of 25 (25 items per page).
  • The application is in use from 8 am to 6-7 pm, and it is mostly idle at night. There is some maintenance at night (including vacuum and vacuumlo). RAM availability stays flat at night, and I would expect it to drop at some point around 7-8 pm. RAM usage on separate app servers does drop after hours.

So, yes, that RAM usage is strange, and that’s why I reported it. It doesn’t seem like this is a problem unique to us. I found questions about memory usage on Stack Overflow, like this one, for example: How to limit the memory available for PostgreSQL server. There is a comment there that seems to describe what could be a bug (looks like a bug to me). Maybe not a bug-bug, but definitely an unwanted behavior:

Note that even if postgres logically releases memory it has allocated, it may not be returned to operating system depending on the malloc()/free() implementation of your execution environment. That may result in multiple PostgreSQL processes getting over the limit due use of hash aggregation as described above and the memory is never released to OS even though PostgreSQL isn't actually using it either. This happens because technically malloc() may use brk() behind the scenes and releasing memory back to OS is only possible only in some special cases.

So, that comment led me to suggest adding some kind of process. I called it a garbage collector, but maybe David is right; maybe that’s not accurate. Anyway, that process, in my view, could try to actually release memory to the system to prevent the OOM killer from doing its bidding. Is that possible? I don’t know, don't know inner workings of PG. I also don’t understand why calling free would not release memory. I’m also not sure if that description of malloc/free is accurate, but it does seem to align with what I’m seeing.


pon., 28 paź 2024 o 20:26 Tomas Vondra <tomas@vondra.me> napisał(a):
On 10/28/24 19:07, Maciej Jaros wrote:
> David G. Johnston (28.10.2024 14:42):
>> On Monday, October 28, 2024, PG Bug reporting form
>> <noreply@postgresql.org> wrote:
>>
>>     The following bug has been logged on the website:
>>
>>     Bug reference:      18675
>>     Logged by:          Maciej Jaros
>>     Email address:      eccenux@gmail.com
>>     PostgreSQL version: 16.4
>>     Operating system:   Ubuntu 22.04
>>     Description:        
>>
>>
>>     or maybe
>>     PostgreSQL should include garbage collection?
>>
>>
>> Garbage collection is typically used in relation to a programming
>> language feature to make writing applications in those languages
>> easier.  Applications themselves don’t really implement garbage
>> collection.  And C, the language PostgreSQL, is written in, doesn’t
>> have garbage collection. To our knowledge, though, there are no
>> significant memory leaks in supported versions.
>>
>>
>>     RAMforPG = shared_buffers + (temp_buffers + work_mem) *
>>     max_connections;
>>
>>
>> The expression: work_mem * max_connections is incorrect.  See the doc
>> for work_mem for how it is used.
>>
>> There is so much more info needed to conclude there is a bug here -
>> which there probably is not.  Exploring the query and tuning the
>> system is better discussed on the -general mailing list.
>>
>> David J.
>>
>
> Could you share what would be the correct expression to calculate or at
> least estimate max RAM usage then? I've checked and haven't found
> anything in the docs. I've found that expression in user space. I know
> autovac might need to be accounted for, but as said we are not using it.
> How would this estimation of 20GB go to 50GB?
>

Unfortunately there's no universal formula, because it depends on what
queries you run. For example a query that needs to do 10 sorts may need
to use 10 x work_mem, and so on. Yes, this is unfortunate, we'd like to
have a per-session memory limit, but we don't have that. So the only
recommendation is to set these limits conservatively, not too close to
the available memory limit.

Also, if you really found a memory leak, these formulas are pointless. A
memory leak is usually about "breaking" such limits, and we may not even
know about all memory that gets allocated (for external libraries).

> There just seem to be no limit in RAM usage so it does seem like a
> memory leak. It just grows until there is no more RAM available an we
> restart the service. There are same operations, same connections
> (pooling on the Java side) and it just grows everyday. It seem to be a
> memory leak.It doesn't seem to have an end.
>

The question is how you define a memory leak. All memory allocated by a
query (using "our" infrastructure) is tied to a "memory context" and
should be released at the end. It's possible for a query to allocate a
lot of memory, perhaps even not release right away, but it should be
released at the end of a query. I'm not going to rule out a bug that
breaks this (e.g. by using a long-lived memory context), but it's very
unlikely we'd not find that pretty soon.

Also, the memory leak seems to be permanent - in your chart the memory
usage grows over a week. Presumably your queries are shorter than that,
so that's not consistent with this type of memory leak.

What I think might be more likely is that you're using something that
allocates memory by directly calling malloc() - say, an extension using
some external library, etc. These allocations are completely outside our
control, and if not freed explicitly, would be a "normal" memory leak.

This is why people were asking about JIT earlier. JIT relies on llvm,
which allocates memory directly, and so a bug in LLVM could leak memory
like this. The first thing to do is to disable JIT, and see if the
memory leak goes away. If it does, then we know it's either a bug in
LLVM, or in how we use it (e.g. we may not trigger cleanup).

But all this is just a wild guess. We don't even know if you're using
some other extensions which might also leak memory, etc.


regards

--
Tomas Vondra

pgsql-bugs by date:

Previous
From: David Rowley
Date:
Subject: Re: BUG #18677: numeric values in arrays are stored incorrectly
Next
From: Tom Lane
Date:
Subject: Re: BUG #18675: Postgres is not realasing memory causing OOM