Re: Debugging leaking memory in Postgresql 13.2/Postgis 3.1 - Mailing list pgsql-general
From | Stephan Knauss |
---|---|
Subject | Re: Debugging leaking memory in Postgresql 13.2/Postgis 3.1 |
Date | |
Msg-id | 6ea52e56-b401-7716-f592-a8fdc98df667@stephans-server.de Whole thread Raw |
In response to | Re: Debugging leaking memory in Postgresql 13.2/Postgis 3.1 (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Debugging leaking memory in Postgresql 13.2/Postgis 3.1
|
List | pgsql-general |
On 30.03.2021 20:46, Tom Lane wrote: > Stephan Knauss <pgsql@stephans-server.de> writes: >> The wiki suggested to dump MemoryContext states for more details, but >> something strange happens when attaching gdb. It seems that the process >> is immediately killed and I can no longer dump such details. > (I think the -v option is the one that matters on Linux, not -d > as you might guess). The idea here is that the backends would > get an actual ENOMEM failure from malloc() before reaching the > point where the kernel's OOM-kill behavior takes over. Given > that, they'd dump memory maps to stderr of their own accord, > and you could maybe get some insight as to what's leaking. > This'd also reduce the severity of the problem when it does > happen. Hello Tom, the output below looks similar to the OOM output you expected. Can you give a hint how to interpret the results? I had a backend which had a larger amount of memory allocated already. So I gave "gcore -a" a try. In contrast to the advertised behavior, the process did not continue to run but I got a core file at least. Probably related to gcore just calling gdb attach which somehow triggers a SIGKILL of all backends. With 4.2GB in size it hopefully has most of the relevant memory structures are there. Without a running process I still can not call MemoryContextStats(), but I found a macro which claims to decode the memory structure post mortem: https://www.cybertec-postgresql.com/en/checking-per-memory-context-memory-consumption/ This gave me the following memory structure: How should it be interpreted? It looks like the size is bytes as it calculates with pointers. But the numbers look a bit small, given that I had a backend with roughly 6GB RSS memory. I thought it might print overall size and then indent and print the memory of children, but the numbers do indicate this is not the case, having a higher level smaller size than children: CachedPlanSource: 67840 unnamed prepared statement: 261920 So how to read it and any indication why I have a constantly increasing memory footprint? Is there any indication where multiple gigabytes are allocated? root@0ec98d20bda2:/# gdb /usr/lib/postgresql/13/bin/postgres core.154218 <gdb-context GNU gdb (Debian 8.2.1-2+b3) 8.2.1 Copyright (C) 2018 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-linux-gnu". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from /usr/lib/postgresql/13/bin/postgres...Reading symbols from /usr/lib/debug/.build-id/31/ae2853776500091d313e76cf679017e697884b.debug...done. done. warning: core file may not match specified executable file. [New LWP 154218] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `postgres: osm gis 172.20.0.3(51894) idle'. #0 0x00007fc01cfa07b7 in epoll_wait (epfd=4, events=0x55f403584080, maxevents=maxevents@entry=1, timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30 30 ../sysdeps/unix/sysv/linux/epoll_wait.c: No such file or directory. (gdb) >>>> > > >>>(gdb) (gdb) >>>> > > >>>>> > > >>(gdb) (gdb) TopMemoryContext: 109528 dynahash: 7968 HandleParallelMessages: 7968 dynahash: 7968 dynahash: 7968 dynahash: 7968 dynahash: 24392 dynahash: 24352 RowDescriptionContext: 24352 MessageContext: 7968 dynahash: 7968 dynahash: 32544 TransactionAbortContext: 32544 dynahash: 7968 TopPortalContext: 7968 dynahash: 16160 CacheMemoryContext: 1302944 CachedPlan: 138016 CachedPlanSource: 67840 unnamed prepared statement: 261920 index info: 1824 index info: 1824 index info: 3872 index info: 1824 index info: 1824 index info: 3872 index info: 3872 index info: 3872 index info: 1824 index info: 3872 relation rules: 32544 index info: 1824 index info: 1824 index info: 1824 index info: 3872 relation rules: 24352 index info: 3872 index info: 3872 index info: 1824 index info: 3872 index info: 3872 index info: 3872 index info: 1824 index info: 3872 index info: 1824 index info: 3872 relation rules: 32544 index info: 1824 index info: 2848 index info: 1824 index info: 3872 index info: 3872 index info: 3872 index info: 3872 index info: 3872 index info: 3872 index info: 3872 index info: 1824 index info: 3872 index info: 1824 index info: 1824 relation rules: 32544 index info: 1824 index info: 2848 index info: 1824 index info: 800 index info: 1824 index info: 800 index info: 800 index info: 2848 index info: 1824 index info: 800 index info: 800 index info: 800 index info: 2848 index info: 1824 index info: 1824 --Type <RET> for more, q to quit, c to continue without paging-- index info: 2848 index info: 1824 index info: 1824 index info: 800 index info: 1824 index info: 800 index info: 800 index info: 800 index info: 2848 index info: 2848 index info: 1824 index info: 1824 index info: 800 index info: 800 index info: 2848 index info: 800 index info: 1824 index info: 1824 index info: 800 index info: 1824 index info: 1824 index info: 1824 index info: 800 index info: 1824 index info: 1824 index info: 1824 index info: 800 index info: 2848 index info: 2848 index info: 2848 index info: 800 index info: 800 index info: 1824 index info: 1824 index info: 1824 index info: 800 index info: 1824 index info: 1824 index info: 2848 index info: 1824 index info: 1824 index info: 1824 index info: 1824 index info: 800 index info: 1824 index info: 2848 index info: 800 index info: 1824 index info: 800 index info: 1824 index info: 1824 index info: 800 index info: 1824 index info: 1824 index info: 1824 index info: 800 index info: 1824 index info: 2848 index info: 1824 index info: 1824 index info: 1824 index info: 1824 index info: 1824 index info: 1824 index info: 1824 WAL record construction: 49544 dynahash: 7968 MdSmgr: 7968 dynahash: 16160 dynahash: 103896 ErrorContext: 7968 (gdb) quit root@0ec98d20bda2:/# cat gdb-context define sum_context_blocks set $context = $arg0 set $block = ((AllocSet) $context)->blocks set $size = 0 while ($block) set $size = $size + (((AllocBlock) $block)->endptr - ((char *) $block)) set $block = ((AllocBlock) $block)->next end printf "%s: %d\n",((MemoryContext)$context)->name, $size end define walk_contexts set $parent_$arg0 = ($arg1) set $indent_$arg0 = ($arg0) set $i_$arg0 = $indent_$arg0 while ($i_$arg0) printf " " set $i_$arg0 = $i_$arg0 - 1 end sum_context_blocks $parent_$arg0 set $child_$arg0 = ((MemoryContext) $parent_$arg0)->firstchild set $indent_$arg0 = $indent_$arg0 + 1 while ($child_$arg0) walk_contexts $indent_$arg0 $child_$arg0 set $child_$arg0 = ((MemoryContext) $child_$arg0)->nextchild end end walk_contexts 0 TopMemoryContext
pgsql-general by date: