Home > mailing lists

Re: [ADMIN]openvz and shared memory trouble - Mailing list pgsql-general

From	Adrian Klaver
Subject	Re: [ADMIN]openvz and shared memory trouble
Date	March 31, 2014 14:38:39
Msg-id	53397DE7.2070903@aklaver.com Whole thread Raw
In response to	Re: [ADMIN]openvz and shared memory trouble (Willy-Bas Loos <willybas@gmail.com>)
Responses	Re: [ADMIN]openvz and shared memory trouble
List	pgsql-general

Tree view

On 03/31/2014 04:12 AM, Willy-Bas Loos wrote:
>
> On Sat, Mar 29, 2014 at 6:17 PM, Adrian Klaver
> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote:
>
>     On 03/29/2014 08:19 AM, Willy-Bas Loos wrote:
>
>         The error that shows up is a Bus error.
>         That's on the replication slave.
>         Here's the log about it:
>         2014-03-29 12:41:33 CET db: ip: us: FATAL:  could not receive
>         data from
>         WAL stream: server closed the connection unexpectedly
>                   This probably means the server terminated abnormally
>                   before or while processing the request.
>
>         cp: cannot stat
>         `/data/postgresql/9.1/main/__wal_archive/__00000001000000720000000A':
>         No
>         such file or directory
>         2014-03-29 12:41:33 CET db: ip: us: LOG:  unexpected pageaddr
>         71/E9DA0000 in log file 114, segment 10, offset 14286848
>         cp: cannot stat
>         `/data/postgresql/9.1/main/__wal_archive/__00000001000000720000000A':
>         No
>         such file or directory
>         2014-03-29 12:41:33 CET db: ip: us: LOG:  streaming replication
>         successfully connected to primary
>         2014-03-29 12:41:48 CET db: ip: us: LOG:  startup process (PID
>         17452)
>         was terminated by signal 7: Bus error
>         2014-03-29 12:41:48 CET db: ip: us: LOG:  terminating any other
>         active
>         server processes
>         2014-03-29 12:41:48 CET db:wbloos ip:[local] us:wbloos WARNING:
>         terminating connection because of crash of another server process
>         2014-03-29 12:41:48 CET db:wbloos ip:[local] us:wbloos DETAIL:  The
>         postmaster has commanded this server process to roll back the
>         current
>         transaction and exit, because another server process exited
>         abnormally
>         and possibly corrupted shared memory.
>         2014-03-29 12:41:48 CET db:wbloos ip:[local] us:wbloos HINT:  In a
>         moment you should be able to reconnect to the database and
>         repeat your
>         command.
>
>
>     Well what I am seeing are WAL log errors. One saying no file is
>     present, the other pointing at a possible file corruption.
>
> Those are normal notices, nothing to worry about.

Well other then they cause the standby to reconnect to the primary,
during which a crash occurs.

>
>     Shared memory problems are offered as a possible cause only. Right
>     now I would say we are seeing only half the picture. The Postgres
>     logs from the same time period for the primary server, as well as
>     the system logs for the openvz container would help fill in the
>     other half of the picture.
>
>
> Here's the log from the primary postgres server:
> 2014-03-29 12:41:29 CET db:wbloos ip:[local] us:wbloos NOTICE:  ALTER
> TABLE will create implicit sequence "test_x_seq" for serial column "test.x"
> 2014-03-29 12:41:33 CET db:[unknown] ip:xxx.xxx.xxx.xxx us:replication
> LOG:  SSL renegotiation failure
> 2014-03-29 12:41:33 CET db:[unknown] ip:xxx.xxx.xxx.xxx us:replication
> LOG:  SSL error: unexpected record
> 2014-03-29 12:41:33 CET db:[unknown] ip:xxx.xxx.xxx.xxx us:replication
> LOG:  could not send data to client: Connection reset by peer
> 2014-03-29 12:41:48 CET db:[unknown] ip:xxx.xxx.xxx.xxx us:replication
> LOG:  could not receive data from client: Connection reset by peer
> 2014-03-29 12:41:48 CET db:[unknown] ip:xxx.xxx.xxx.xxx us:replication
> LOG:  unexpected EOF on standby connection
>
> (the SSL renegotiation failure happens all the time, without the crash)
>
> And here's the syslog form the container:
> Mar 29 12:41:01 mycontainer snmpd[8819]: Connection from UDP:
> [xxx.xxx.xxx.xxx]:59090->[xxx.xxx.xxx.xxx]
> Mar 29 12:42:30 mycontainer snmpd[8819]: Connection from UDP:
> [xxx.xxx.xxx.xxx]:35949->[xxx.xxx.xxx.xxx]
>
> The log on the host doesn't say anything interesting either.
>
>     A cursory look at memory management in openvz shows it is different
>     from other virtualization software and physical machines. Whether
>     that is a problem would seem to be dependent on where you are on the
>     learning curve:)
>
> That sounds like "there is a solution to the problem, all you have to do
> is find out what it is". There doesn't seem to be a variable in the
> beancounters or anywhere else that can prevent the bus error from happening.
> There's seems to be no separate way of guaranteeing shared memory.
> There's no OOM killer active either, nor is host or server running short
> of memory.

At this point I am not sure it is even obvious what is causing the
error, so finding a solution would be a hit or miss affair at best.

>
> I'm still worried that it's like Tom Lane said in another discussion:"So
> basically, you've got a broken kernel here: it claimed to give PG circa
> (135MB) of memory, but what's actually there is only about (128MB). I
> don't see any connection between those numbers and the shmmax/shmall
> settings, either --- so I think this must be some busted implementation
> of a VM-level limitation."
> (here:
> http://www.postgresql.org/message-id/CAK3UJREBcyVBtr8D7vMfU=uDdkjXkrPnGcuy8erYB0tMfKe1LA@mail.gmail.com)
>
> And it makes me wonder what else may be issues that arise from that. But
> especially, what i can do about it.

I do not use openvz so I do not have a test bed to try out, but this
page seems to be related to your problem:

http://openvz.org/Resource_shortage

or if you want more detail and a link to what looks to a replacement for
beancounters:

http://openvz.org/Setting_UBC_parameters

>
> Cheers,
>
> WBL
>
> --
> "Quality comes from focus and clarity of purpose" -- Mark Shuttleworth


--
Adrian Klaver
adrian.klaver@aklaver.com

pgsql-general by date:

From: Shaun Thomas
Date: 31 March 2014, 13:26:26
Subject: Re: PSQL log file

From: Adrian Klaver
Date: 31 March 2014, 14:56:05
Subject: Re: [ADMIN]openvz and shared memory trouble

Re: [ADMIN]openvz and shared memory trouble - Mailing list pgsql-general

Previous

Next