Re: [ADMIN]openvz and shared memory trouble - Mailing list pgsql-general
From | Adrian Klaver |
---|---|
Subject | Re: [ADMIN]openvz and shared memory trouble |
Date | |
Msg-id | 5339865C.9020209@aklaver.com Whole thread Raw |
In response to | Re: [ADMIN]openvz and shared memory trouble (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [ADMIN]openvz and shared memory trouble
|
List | pgsql-general |
On 03/31/2014 08:01 AM, Tom Lane wrote: > Adrian Klaver <adrian.klaver@aklaver.com> writes: >> On 03/31/2014 04:12 AM, Willy-Bas Loos wrote: >>> I'm still worried that it's like Tom Lane said in another discussion:"So >>> basically, you've got a broken kernel here: it claimed to give PG circa >>> (135MB) of memory, but what's actually there is only about (128MB). I >>> don't see any connection between those numbers and the shmmax/shmall >>> settings, either --- so I think this must be some busted implementation >>> of a VM-level limitation." >>> (here: >>> http://www.postgresql.org/message-id/CAK3UJREBcyVBtr8D7vMfU=uDdkjXkrPnGcuy8erYB0tMfKe1LA@mail.gmail.com) >>> >>> And it makes me wonder what else may be issues that arise from that. But >>> especially, what i can do about it. > > FWIW, I went back and re-read that message while perusing this thread, > and this time it struck me that there was a significant bit of evidence > I'd overlooked: namely, that the buffer block array is by no means the > last thing in Postgres' shared memory segment. There are a bunch of > other shared data structures allocated after it, some of which almost > certainly had to have been touched by the startup subprocess. The gdb > output makes it clear that the kernel stopped providing memory at > 0xb6c4b000; but either it resumed doing so further on, or the whole shared > memory segment *had* been provisioned originally, and then part of it > got unmapped again while the startup process was running. > > So it's still clearly a kernel bug, but it seems less likely that it is > triggered by some static limit on shared memory size. Perhaps instead, > the kernel had been filling in pages for the shared segment on-demand, > and then when it got to some limit it refused to do so anymore and allowed > a SIGBUS to happen instead. > >> I do not use openvz so I do not have a test bed to try out, but this >> page seems to be related to your problem: >> http://openvz.org/Resource_shortage >> or if you want more detail and a link to what looks to a replacement for >> beancounters: >> http://openvz.org/Setting_UBC_parameters > > If this software's idea of resource management is to allow SIGBUS to > happen upon attempting to use memory that had been successfully granted, > then it's a piece of junk that you should get rid of ASAP. (No, I > don't like Linux's OOM-kill solution to resource overcommit either.) At this point the memory allocation as a problem is as much conjecture as anything else, at least to me. So what is causing SIGBUS is an open question in my mind. > > regards, tom lane > > -- Adrian Klaver adrian.klaver@aklaver.com
pgsql-general by date: