Thread: Out of Memory
Hello, I'm running postgres 9.6.10 on Centos 7. Seeing the occasional out of memory error trying to run a query. In the logs I see something like this: Grand total: 462104832 bytes in 795 blocks; 142439136 free (819860 chunks); 319665696 used 2018-09-20 18:08:01 UTC xxxx 5ba3e1a2.7a8a dbname ERROR: out of memory 2018-09-20 18:08:01 UTC xxxx 5ba3e1a2.7a8a dbname DETAIL: Failed on request of size 2016. If I have 142439136 free, then why am I failing on a request of size 2016? Am I misunderstanding here? Thanks, Greig Wise -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
My first two guesses are ulimit or numa. numactl --hardware will show your how many nodes your box has and if you're exhausting any of them.
On Thu, Sep 20, 2018, 6:11 PM greigwise <greigwise@comcast.net> wrote:
Hello, I'm running postgres 9.6.10 on Centos 7. Seeing the occasional out
of memory error trying to run a query. In the logs I see something like
this:
Grand total: 462104832 bytes in 795 blocks; 142439136 free (819860 chunks);
319665696 used
2018-09-20 18:08:01 UTC xxxx 5ba3e1a2.7a8a dbname ERROR: out of memory
2018-09-20 18:08:01 UTC xxxx 5ba3e1a2.7a8a dbname DETAIL: Failed on request
of size 2016.
If I have 142439136 free, then why am I failing on a request of size 2016?
Am I misunderstanding here?
Thanks,
Greig Wise
--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
ulimit -a for the postgres user shows memory unlimited. numactl --hardware gives command not found. Thanks again. Greig -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
greigwise <greigwise@comcast.net> writes: > Hello, I'm running postgres 9.6.10 on Centos 7. Seeing the occasional out > of memory error trying to run a query. In the logs I see something like > this: > Grand total: 462104832 bytes in 795 blocks; 142439136 free (819860 chunks); > 319665696 used > 2018-09-20 18:08:01 UTC xxxx 5ba3e1a2.7a8a dbname ERROR: out of memory > 2018-09-20 18:08:01 UTC xxxx 5ba3e1a2.7a8a dbname DETAIL: Failed on request > of size 2016. > If I have 142439136 free, then why am I failing on a request of size 2016? The free space must be in contexts other than the one that last little request wanted space in. Overall, you've got about 460MB of space consumed in that session, so it's not *that* surprising that you got OOM. (At least, it's unsurprising on a 32-bit machine. If the server is 64-bit I'd have thought the kernel would be a bit more liberal.) But anyway, this looks like a mighty inefficient usage pattern at best, and maybe a memory leak at worst. Can you create a self-contained test case that does this? regards, tom lane
Well, we are 64-bit.... I'll see if I can make some kind of self contained test to repeat it. Thanks, Greig -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
Well, I've been unsuccessful so far on creating a standalone test. I have put some scripting in place to capture some additional information on the server with the out of memory issues. I have a script which just periodically dumps the output of free -m to a text file. So, the output of free -m immediately before and after the out of memory error looks like this: Just before: total used free shared buff/cache available Mem: 14877 2978 132 3553 11766 7943 Swap: 0 0 0 Just after: total used free shared buff/cache available Mem: 14877 2946 649 3548 11280 7982 Swap: 0 0 0 If I have nearly 8 GB of memory left, why am I getting out of memory errors? Thanks, Greig -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
greigwise <greigwise@comcast.net> writes: > If I have nearly 8 GB of memory left, why am I getting out of memory errors? Probably the postmaster is running under restrictive ulimit settings. regards, tom lane
Tom Lane-2 wrote > greigwise < > greigwise@ > > writes: >> If I have nearly 8 GB of memory left, why am I getting out of memory >> errors? > > Probably the postmaster is running under restrictive ulimit settings. > > regards, tom lane If I login as the user which runs postmaster, I get this: -bash-4.2$ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 59341 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 4096 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited Max memory size is unlimited as is virtual memory... is there something else there I should be concerned with in regard to out of memory? Thanks again. Greig -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
On Tue, 25 Sep 2018 11:34:19 -0700 (MST) greigwise <greigwise@comcast.net> wrote: > Well, I've been unsuccessful so far on creating a standalone test. > > I have put some scripting in place to capture some additional information on > the server with the out of memory issues. I have a script which just > periodically dumps the output of free -m to a text file. > > So, the output of free -m immediately before and after the out of memory > error looks like this: > > Just before: > total used free shared buff/cache > available > Mem: 14877 2978 132 3553 11766 > 7943 > Swap: 0 0 0 > > Just after: > total used free shared buff/cache > available > Mem: 14877 2946 649 3548 11280 > 7982 > Swap: 0 0 0 > > If I have nearly 8 GB of memory left, why am I getting out of memory errors? Doesn't the default NUMA setup mean that it can't actually allocate all the available memory to a single NUMA zone (or whatever it's called)? Or am I talking ancient Linux history with that? -- Bill Moran
On Tue, Sep 25, 2018 at 2:05 PM PT <wmoran@potentialtech.com> wrote:
On Tue, 25 Sep 2018 11:34:19 -0700 (MST)
greigwise <greigwise@comcast.net> wrote:
> Well, I've been unsuccessful so far on creating a standalone test.
>
> I have put some scripting in place to capture some additional information on
> the server with the out of memory issues. I have a script which just
> periodically dumps the output of free -m to a text file.
>
> So, the output of free -m immediately before and after the out of memory
> error looks like this:
>
> Just before:
> total used free shared buff/cache
> available
> Mem: 14877 2978 132 3553 11766
> 7943
> Swap: 0 0 0
>
> Just after:
> total used free shared buff/cache
> available
> Mem: 14877 2946 649 3548 11280
> 7982
> Swap: 0 0 0
>
> If I have nearly 8 GB of memory left, why am I getting out of memory errors?
Doesn't the default NUMA setup mean that it can't actually allocate all
the available memory to a single NUMA zone (or whatever it's called)?
Or am I talking ancient Linux history with that?
--
Bill Moran
By the way we have similar issues running in our smaller stage environment. 9.5.6 on CentOS 7.2
Only ever my stage environment on smaller AWS t2 boxen. Memory looks fine but Postgres says it has no access to any. And my settings are really low.
Not sure if it's the same issue, but we run into it a few times a year in my stage environment. Also running postgres in unlimited mode for ulimit.
Tory
There is also this: -bash-4.2$ prlimit -p 6590 RESOURCE DESCRIPTION SOFT HARD UNITS AS address space limit unlimited unlimited bytes CORE max core file size 0 unlimited blocks CPU CPU time unlimited unlimited seconds DATA max data size unlimited unlimited bytes FSIZE max file size unlimited unlimited blocks LOCKS max number of file locks held unlimited unlimited MEMLOCK max locked-in-memory address space 65536 65536 bytes MSGQUEUE max bytes in POSIX mqueues 819200 819200 bytes NICE max nice prio allowed to raise 0 0 NOFILE max number of open files 1024 4096 NPROC max number of processes 4096 59341 RSS max resident set size unlimited unlimited pages RTPRIO max real-time priority 0 0 RTTIME timeout for real-time tasks unlimited unlimited microsecs SIGPENDING max number of pending signals 59341 59341 STACK max stack size 8388608 unlimited bytes Is it possible that the fact that my stack size is limited is what is causing my issue? Thanks, Greig -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
greigwise <greigwise@comcast.net> writes: > Is it possible that the fact that my stack size is limited is what is > causing my issue? No. If you were hitting that limit you'd get a message specifically talking about stack. regards, tom lane
Tom Lane-2 wrote > greigwise < > greigwise@ > > writes: >> Is it possible that the fact that my stack size is limited is what is >> causing my issue? > > No. If you were hitting that limit you'd get a message specifically > talking about stack. > > regards, tom lane Well, darn. I'm at a loss... any suggestions of what I should do next to troubleshoot this? Thanks. Greig -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
I think I figured it out: vm.overcommit_memory = 2 vm.overcommit_ratio = 50 Only allows me to use 50% of my RAM... ugh! I have 16 GB, so when only 8 is left, I start seeing OOM. Will increase this setting and see if it helps. Thanks everyone for the help. Greig -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
All right.. one more thing here. Any suggestions for how to set overcommit on a postgres db server with 16 GB of RAM and no swap? I think I want vm.overcommit_memory = 2, as I understand that prevents the OOM killer from zapping me. Is 100% the right way to go for overcommit_ratio? Is there a drawback to this? Thanks again. Greig -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
greigwise wrote: > All right.. one more thing here. Any suggestions for how to set overcommit > on a postgres db server with 16 GB of RAM and no swap? I think I want > vm.overcommit_memory = 2, as I understand that prevents the OOM killer from > zapping me. Is 100% the right way to go for overcommit_ratio? Is there a > drawback to this? vm.overcommit_memory = 2 vm_overcommit_ratio = 100 Linux commits (swap * overcommit_ratio * RAM / 100), so without any swap the correct value would be 100. I don't know how safe it is to got entirely without swap. I would add a little bit just to be sure. Yours, Laurenz Albe
## Laurenz Albe (laurenz.albe@cybertec.at): > vm.overcommit_memory = 2 > vm_overcommit_ratio = 100 > > Linux commits (swap * overcommit_ratio * RAM / 100), ^ That should be a "+". See Documentation/sysctl/vm.txt and Documentation/vm/overcommit-accounting.rst in your kernel tree. > so without any swap the correct value would be 100. > > I don't know how safe it is to got entirely without swap. That's completely safe. Even more: if your production machine ever needs swap, it's already mostly dead in the water. So there's also no benefit in having the kernel copy stuff to swap "just in case" (read: set sysctl vm.swappiness to 0) and use the swap disk space for something more productive. If your memory usage ever exceeds available memory, something already went horribly wrong, and you want to know about it right now. A good way to know about this kind of mistakes is having stuff fail hard, obviously and loudly (i.e. Out Of Memory). Do not try to save some day by swapping: stuff is "sometimes slow" and if you really need the swap, everything crawls to a halt anyway. Of course, this does not hold for personal workstations and the like where you might want to use the swap space for suspend to disk, or have some buffer in case of runaway processes. Regards, Christoph -- Spare Space.
Christoph Moench-Tegeder wrote: > ## Laurenz Albe (laurenz.albe@cybertec.at): > > > vm.overcommit_memory = 2 > > vm_overcommit_ratio = 100 > > > > Linux commits (swap * overcommit_ratio * RAM / 100), > > ^ > That should be a "+". Yes; shame on me for careless typing, and thank you for the correction. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
> On Sep 27, 2018, at 3:45 PM, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > Christoph Moench-Tegeder wrote: >> ## Laurenz Albe (laurenz.albe@cybertec.at): >> >>> vm.overcommit_memory = 2 >>> vm_overcommit_ratio = 100 >>> >>> Linux commits (swap * overcommit_ratio * RAM / 100), >> >> ^ >> That should be a "+". > > Yes; shame on me for careless typing, and thank you for the > correction. Are there any parentheses needed in that formula?
Rob Sargent wrote: > > Christoph Moench-Tegeder wrote: > > > ## Laurenz Albe (laurenz.albe@cybertec.at): > > > > > > > vm.overcommit_memory = 2 > > > > vm_overcommit_ratio = 100 > > > > > > > > Linux commits (swap * overcommit_ratio * RAM / 100), > > > > > > ^ > > > That should be a "+". > > > > Yes; shame on me for careless typing, and thank you for the > > correction. > > Are there any parentheses needed in that formula? No. It is swap space plus a certain percentage of RAM. I don't know how the Linux developers came up with that weird formula. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
On 2018-09-28 07:23:59 +0200, Laurenz Albe wrote: > Rob Sargent wrote: > > > Christoph Moench-Tegeder wrote: > > > > ## Laurenz Albe (laurenz.albe@cybertec.at): > > > > > > > > > vm.overcommit_memory = 2 > > > > > vm_overcommit_ratio = 100 > > > > > > > > > > Linux commits (swap * overcommit_ratio * RAM / 100), > > > > > > > > ^ > > > > That should be a "+". > > > > > > Yes; shame on me for careless typing, and thank you for the > > > correction. > > > > Are there any parentheses needed in that formula? > > No. It is swap space plus a certain percentage of RAM. > > I don't know how the Linux developers came up with that > weird formula. I suspect they copied it from some other Unix. Traditionally, Unix allocated all memory in the swap space. You could say that the swap space was the "real memory" and RAM was a cache for that (Hence the rule that swap must be at least as large as RAM and should preferrably be 2 or 4 times the size of RAM). So, when Unixes started to allow allocating more virtual memory than swap space, they were "overcommitting". But for Linux that doesn't make much sense, since a page lived either in RAM /or/ in swap right from the start, so the limit was always RAM+swap, not swap alone, and you are only overcommitting if you exceeded the size of the sum. The overcommitment in Linux is of a different kind: Linux uses copy on write whereever it can (e.g. when forking processes, but also when mallocing memory), and a CoW page may or may not be written in the future. It only needs additional space when it's actually written, so by counting the page only once (hoping that there will be enough space if and when that page is written) the kernel is overcommitting memory. hp -- _ | Peter J. Holzer | we build much bigger, better disasters now |_|_) | | because we have much more sophisticated | | | hjp@hjp.at | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>