Thread: postgresql-server exiting abnormally after an OS upgrade

postgresql-server exiting abnormally after an OS upgrade

From

Hugo Osvaldo Barrera

Date:

12 February 2015, 22:42:58

Hi,

I've been running postgresql-server for some time on an OpenBSD box, and the
latest OS upgrade (which also upgraded from 9.3 to 9.4) seems to have broke=
n it
in some way. I tried downgrading back to 9.3.6, but the issue persists, so =
it
looks like something else is broken, and not postgresql per se.

Of course, I did the proper upgrade procedure, pg_dumpall, move everything =
out
of the way, and reinit the db.

I also tried *not* reimporting the db on my last wipeout, and keeping the
completely empty database. It crashed too upon connection attempts (which f=
ail
to authenticate because I haven't even created the users), so I guess the d=
ata
is no problem. The database is pretty small, and very few concurrent users =
(the
*whole* backup is ~400kB).

Here's the error which repeats ad infinitum:

    LOG:  database system was shut down at 2015-02-12 14:41:29 GMT
    LOG:  database system is ready to accept connections
    LOG:  autovacuum launcher started
    LOG:  server process (PID 21205) was terminated by signal 6: Abort trap
    LOG:  terminating any other active server processes
    WARNING:  terminating connection because of crash of another server pro=
cess
    DETAIL:  The postmaster has commanded this server process to roll back =
the current transaction and exit, because another server process exited abn=
ormally and possibly corrupted shared memory.
    HINT:  In a moment you should be able to reconnect to the database and =
repeat your command.
    LOG:  all server processes terminated; reinitializing
    LOG:  database system was interrupted; last known up at 2015-02-12 14:4=
2:04 GMT
    LOG:  database system was not properly shut down; automatic recovery in=
 progress
    LOG:  record with zero length at 0/1690A90
    LOG:  redo is not required
    LOG:  database system is ready to accept connections
    LOG:  autovacuum launcher started

I've searchd all over, and found little or nothing of help. Nothing seemed =
to
change the situation.

I believe that these settings *are* relevant, so:

    kern.seminfo.semmni=3D256
    kern.seminfo.semmns=3D512
    kern.seminfo.semmnu=3D256
    kern.seminfo.semmap=3D256

    kern.shminfo.shmmax=3D134217728
    kern.shminfo.shmseg=3D256

I also tried setting shared_buffers =3D 16MB, which, while small, discards
several potential configuration issues.

Note that before upgrading to the latest OS version, pg worked fine. Also
relevant, other users of the same release have confirmed not having this is=
sue
and pg working fine for them.

Any pointer?

Thanks,

--=20
Hugo Osvaldo Barrera
A: Because we read from top to bottom, left to right.
Q: Why should I start my reply below the quoted text?

Re: postgresql-server exiting abnormally after an OS upgrade

From

Heikki Linnakangas

Date:

13 February 2015, 13:29:55

On 02/13/2015 12:42 AM, Hugo Osvaldo Barrera wrote:
> Here's the error which repeats ad infinitum:
>
>      LOG:  database system was shut down at 2015-02-12 14:41:29 GMT
>      LOG:  database system is ready to accept connections
>      LOG:  autovacuum launcher started
>      LOG:  server process (PID 21205) was terminated by signal 6: Abort trap
>      LOG:  terminating any other active server processes
>      WARNING:  terminating connection because of crash of another server process
>      DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because
anotherserver process exited abnormally and possibly corrupted shared memory. 
>      HINT:  In a moment you should be able to reconnect to the database and repeat your command.

Can you get a core dump and a stack trace from it?

- Heikki

Re: server exiting abnormally after an OS upgrade

From

Hugo Osvaldo Barrera

Date:

13 February 2015, 14:03:36

On 2015-02-13 15:29, Heikki Linnakangas wrote:
> On 02/13/2015 12:42 AM, Hugo Osvaldo Barrera wrote:
> >Here's the error which repeats ad infinitum:
> >
> >     LOG:  database system was shut down at 2015-02-12 14:41:29 GMT
> >     LOG:  database system is ready to accept connections
> >     LOG:  autovacuum launcher started
> >     LOG:  server process (PID 21205) was terminated by signal 6: Abort =
trap
> >     LOG:  terminating any other active server processes
> >     WARNING:  terminating connection because of crash of another server=
 process
> >     DETAIL:  The postmaster has commanded this server process to roll b=
ack the current transaction and exit, because another server process exited=
 abnormally and possibly corrupted shared memory.
> >     HINT:  In a moment you should be able to reconnect to the database =
and repeat your command.
>=20
> Can you get a core dump and a stack trace from it?
>=20
> - Heikki
>=20
>=20
>=20
> --=20
> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs

I'm slightly unsure how to do this - the docs give some references on how to
select a single process and attempt to attach gdb to it, but they seem to be
too short lived for me to do anything (<1s between errors), so I can't real=
ly
work with that.

Do you have any suggestions for this?

I'll try to enable core dumps in the meantime and get back to you on this,
thank,

--=20
Hugo Osvaldo Barrera
A: Because we read from top to bottom, left to right.
Q: Why should I start my reply below the quoted text?

Re: postgresql-server exiting abnormally after an OS upgrade

From

Hugo Osvaldo Barrera

Date:

16 February 2015, 20:21:39

On 2015-02-13 15:29, Heikki Linnakangas wrote:
> On 02/13/2015 12:42 AM, Hugo Osvaldo Barrera wrote:
> >Here's the error which repeats ad infinitum:
> >
> >     LOG:  database system was shut down at 2015-02-12 14:41:29 GMT
> >     LOG:  database system is ready to accept connections
> >     LOG:  autovacuum launcher started
> >     LOG:  server process (PID 21205) was terminated by signal 6: Abort =
trap
> >     LOG:  terminating any other active server processes
> >     WARNING:  terminating connection because of crash of another server=
 process
> >     DETAIL:  The postmaster has commanded this server process to roll b=
ack the current transaction and exit, because another server process exited=
 abnormally and possibly corrupted shared memory.
> >     HINT:  In a moment you should be able to reconnect to the database =
and repeat your command.
>=20
> Can you get a core dump and a stack trace from it?
>=20
> - Heikki
>=20

It certainly took me a bit, but, with some help, I managed to get one:

  (gdb) bt
  #0  0x0000110a2815b92a in kill () at <stdin>:2
  #1  0x0000110a28195119 in abort () at /usr/src/lib/libc/stdlib/abort.c:53
  #2  0x0000110a2816a238 in memcpy (dst0=3D0xfb8d4, src0=3D0x6, length=3D0)=
 at /usr/src/lib/libc/string/memcpy.c:65
  #3  0x000011080cf8d1b1 in check_ip (raddr=3D0x110a899f7918, addr=3D0x110a=
899f9058, mask=3D0x110a899f9158) at hba.c:704
  #4  0x000011080cf90a04 in check_hba (port=3D0x110a899f7800) at hba.c:1718
  #5  0x000011080cf91d34 in hba_getauthmethod (port=3D0x110a899f7800) at hb=
a.c:2256
  #6  0x000011080cf88eb3 in ClientAuthentication (port=3D0x110a899f7800) at=
 auth.c:307
  #7  0x000011080d1edf5d in PerformAuthentication (port=3D0x110a899f7800) a=
t postinit.c:223
  #8  0x000011080d1eeae7 in InitPostgres (in_dbname=3D0x110af4508c00 "virts=
tart-dev", dboid=3D0,=20
      username=3D0x110af4508be0 "virtstart-dev", out_dbname=3D0x0) at posti=
nit.c:688
  #9  0x000011080d0a3eb1 in PostgresMain (argc=3D1, argv=3D0x110af4508c20, =
dbname=3D0x110af4508c00 "virtstart-dev",=20
      username=3D0x110af4508be0 "virtstart-dev") at postgres.c:3749
  #10 0x000011080d033537 in BackendRun (port=3DCould not find the frame bas=
e for "BackendRun".
  ) at postmaster.c:4155
  #11 0x000011080d032be8 in BackendStartup (port=3D0x110a899f7800) at postm=
aster.c:3829
  #12 0x000011080d02f2d0 in ServerLoop () at postmaster.c:1597
  #13 0x000011080d02e968 in PostmasterMain (argc=3D3, argv=3D0x7f7ffffd9658=
) at postmaster.c:1244
  #14 0x000011080cf96dc8 in main (argc=3DCould not find the frame base for =
"main".
  ) at main.c:228
  Current language:  auto; currently asm

I stopped pg once it crashed and then bt'd the core. Does this help at all?=
 Is
there something else I can provide?

Thanks,

--=20
Hugo Osvaldo Barrera
A: Because we read from top to bottom, left to right.
Q: Why should I start my reply below the quoted text?

Re: postgresql-server exiting abnormally after an OS upgrade

From

Tom Lane

Date:

16 February 2015, 20:55:48

Hugo Osvaldo Barrera <hugo@barrera.io> writes:
> On 2015-02-13 15:29, Heikki Linnakangas wrote:
>> Can you get a core dump and a stack trace from it?

> It certainly took me a bit, but, with some help, I managed to get one:

>   (gdb) bt
>   #0  0x0000110a2815b92a in kill () at <stdin>:2
>   #1  0x0000110a28195119 in abort () at /usr/src/lib/libc/stdlib/abort.c:53
>   #2  0x0000110a2816a238 in memcpy (dst0=0xfb8d4, src0=0x6, length=0) at /usr/src/lib/libc/string/memcpy.c:65
>   #3  0x000011080cf8d1b1 in check_ip (raddr=0x110a899f7918, addr=0x110a899f9058, mask=0x110a899f9158) at hba.c:704
>   #4  0x000011080cf90a04 in check_hba (port=0x110a899f7800) at hba.c:1718

Huh.  You didn't say whether this is 9.3 or 9.4, but it doesn't matter
because that code is the same in both; the crash is happening here:

static bool
check_ip(SockAddr *raddr, struct sockaddr * addr, struct sockaddr * mask)
{
    ...
        /*
         * If we're connected on IPv6 but the file specifies an IPv4 address
         * to match against, promote the latter to an IPv6 address before
         * trying to match the client's address.
         */
        struct sockaddr_storage addrcopy,
                    maskcopy;

        memcpy(&addrcopy, &addr, sizeof(addrcopy));
--->    memcpy(&maskcopy, &mask, sizeof(maskcopy));

Looking at this, the code seems wrong on its face: the memcpy sources
should be "addr" and "mask", not "&addr" and "&mask".  Apparently, no one
has ever used or tested this case since it was put in :-(.  I find that
a bit astonishing given the steady creep of IPv6 support, but it's hard
to see how the code could possibly work correctly as-is.

Probably the reason why this was triggered by an OS upgrade is that your
system is now choosing to make the DB connection over IPv6 not IPv4.
Or maybe it was doing that right along but you now have a memcpy that is
doing more checking of its arguments than before (this is evidently an
argument sanity check and not a straight SIGSEGV).

Anyway you could work around it for now by being sure to list IPv6
equivalent addresses before IPv4 addresses in your pg_hba.conf.
We'll fix it properly in the next releases.

            regards, tom lane

Re: postgresql-server exiting abnormally after an OS upgrade

From

Hugo Osvaldo Barrera

Date:

16 February 2015, 21:31:29

On 2015-02-16 15:55, Tom Lane wrote:
> Hugo Osvaldo Barrera <hugo@barrera.io> writes:
> > On 2015-02-13 15:29, Heikki Linnakangas wrote:
> >> Can you get a core dump and a stack trace from it?
>=20
> > It certainly took me a bit, but, with some help, I managed to get one:
>=20
> >   (gdb) bt
> >   #0  0x0000110a2815b92a in kill () at <stdin>:2
> >   #1  0x0000110a28195119 in abort () at /usr/src/lib/libc/stdlib/abort.=
c:53
> >   #2  0x0000110a2816a238 in memcpy (dst0=3D0xfb8d4, src0=3D0x6, length=
=3D0) at /usr/src/lib/libc/string/memcpy.c:65
> >   #3  0x000011080cf8d1b1 in check_ip (raddr=3D0x110a899f7918, addr=3D0x=
110a899f9058, mask=3D0x110a899f9158) at hba.c:704
> >   #4  0x000011080cf90a04 in check_hba (port=3D0x110a899f7800) at hba.c:=
1718
>=20
> Huh.  You didn't say whether this is 9.3 or 9.4, but it doesn't matter
> because that code is the same in both; the crash is happening here:
>=20

It's 9.4, the issue appeared after an OS upgrade which was accompanied by an
upgrade to 9.3 I *did* try to downgrade back to 9.3 at that point, but got =
the
same issue.

> static bool
> check_ip(SockAddr *raddr, struct sockaddr * addr, struct sockaddr * mask)
> {
>     ...
>         /*
>          * If we're connected on IPv6 but the file specifies an IPv4 addr=
ess
>          * to match against, promote the latter to an IPv6 address before
>          * trying to match the client's address.
>          */
>         struct sockaddr_storage addrcopy,
>                     maskcopy;
>=20
>         memcpy(&addrcopy, &addr, sizeof(addrcopy));
> --->    memcpy(&maskcopy, &mask, sizeof(maskcopy));
>=20
> Looking at this, the code seems wrong on its face: the memcpy sources
> should be "addr" and "mask", not "&addr" and "&mask".  Apparently, no one
> has ever used or tested this case since it was put in :-(.  I find that
> a bit astonishing given the steady creep of IPv6 support, but it's hard
> to see how the code could possibly work correctly as-is.
>=20
> Probably the reason why this was triggered by an OS upgrade is that your
> system is now choosing to make the DB connection over IPv6 not IPv4.
> Or maybe it was doing that right along but you now have a memcpy that is
> doing more checking of its arguments than before (this is evidently an
> argument sanity check and not a straight SIGSEGV).
>=20

My host has been configured to use IPv6 by default for some time now, but it
*is* possible that there has been some OS-level change during this upgrade =
that
affected this in some way.

> Anyway you could work around it for now by being sure to list IPv6
> equivalent addresses before IPv4 addresses in your pg_hba.conf.
> We'll fix it properly in the next releases.
>=20

Do you mean IPv6 *before* IPv4? Wouldn't that add a preference to IPv6?
I currently have the stock out-of-the-box pg_hba:

  # TYPE  DATABASE        USER            ADDRESS                 METHOD
  # "local" is for Unix domain socket connections only
  local   all             all                                     md5
  # IPv4 local connections:
  host    all             all             127.0.0.1/32            md5
  # IPv6 local connections:
  host    all             all             ::1/128                 md5

(I deleted commented out lines for readability).


>             regards, tom lane
>=20

Thanks,

--=20
Hugo Osvaldo Barrera
A: Because we read from top to bottom, left to right.
Q: Why should I start my reply below the quoted text?

Re: postgresql-server exiting abnormally after an OS upgrade

From

Tom Lane

Date:

17 February 2015, 00:38:29

Hugo Osvaldo Barrera <hugo@barrera.io> writes:
> On 2015-02-16 15:55, Tom Lane wrote:
>> Probably the reason why this was triggered by an OS upgrade is that your
>> system is now choosing to make the DB connection over IPv6 not IPv4.
>> Or maybe it was doing that right along but you now have a memcpy that is
>> doing more checking of its arguments than before (this is evidently an
>> argument sanity check and not a straight SIGSEGV).

> My host has been configured to use IPv6 by default for some time now, but it
> *is* possible that there has been some OS-level change during this upgrade that
> affected this in some way.

On further thought the answer almost certainly is that you now have a libc
version in which memcpy() complains strenuously (ie, aborts) if given
overlapping source and target areas.  Such "helpfulness" is the new fad
among libc maintainers :-(

>> Anyway you could work around it for now by being sure to list IPv6
>> equivalent addresses before IPv4 addresses in your pg_hba.conf.
>> We'll fix it properly in the next releases.

> Do you mean IPv6 *before* IPv4? Wouldn't that add a preference to IPv6?

No, not particularly, considering the cases are disjoint.  An incoming
connection is one or the other.

> I currently have the stock out-of-the-box pg_hba:

>   # TYPE  DATABASE        USER            ADDRESS                 METHOD
>   # "local" is for Unix domain socket connections only
>   local   all             all                                     md5
>   # IPv4 local connections:
>   host    all             all             127.0.0.1/32            md5
>   # IPv6 local connections:
>   host    all             all             ::1/128                 md5

The problem comes when an incoming IPv6 connection is compared to that
127.0.0.1 line.  If you're using only IPv6 anyway, you could just remove
that line.  Or you could do something like this:

host    all             all             ::1/128                 md5
host    all             all             ::0/0                   reject
host    all             all             127.0.0.1/32            md5

so that all IPv6 connections are either accepted or rejected before
reaching the 127.0.0.1 line.

            regards, tom lane

Fwd: Re: postgresql-server exiting abnormally after upgrade to -snapshot

From

Hugo Osvaldo Barrera

Date:

17 February 2015, 14:38:46

Hi,

This comes as a follow-up on a recent issue I mentioned and was discussed on
this list. (See thread [1]).

The included patch fixes the issue and postgres has been working fine for me
with it. See this thread[2] for details.

Cheers,

----- Forwarded message from J=E9r=E9mie Courr=E8ges-Anglas <jca@wxcvbn.org=
> -----

> Date: Mon, 16 Feb 2015 23:21:05 +0100
> From: J=E9r=E9mie Courr=E8ges-Anglas <jca@wxcvbn.org>
> Subject: Re: postgresql-server exiting abnormally after upgrade to -snaps=
hot
> To: Hugo Osvaldo Barrera <hugo@barrera.io>
> Cc: misc@openbsd.org
> Mail-Followup-To: Hugo Osvaldo Barrera <hugo@barrera.io>, misc@openbsd.org
>=20
> jca@wxcvbn.org (J=E9r=E9mie Courr=E8ges-Anglas) writes:
>=20
> > Please try the diff below.  It fixes the "backwards memcpy" problem
> > easily noticeable with psql -h ::1.
>=20
> Updated diff. Thanks to Stuart for reminding me that netmasks sa_len
> values can be much surprising.
>=20
> $OpenBSD$
> --- src/backend/libpq/hba.c.orig    Mon Feb 16 21:53:21 2015
> +++ src/backend/libpq/hba.c    Mon Feb 16 23:08:38 2015
> @@ -700,8 +700,13 @@ check_ip(SockAddr *raddr, struct sockaddr * addr, st=
ru
>          struct sockaddr_storage addrcopy,
>                      maskcopy;
> =20
> -        memcpy(&addrcopy, &addr, sizeof(addrcopy));
> -        memcpy(&maskcopy, &mask, sizeof(maskcopy));
> +        memcpy(&addrcopy, addr, sizeof(struct sockaddr_in));
> +        /*
> +         * On some OSes, if mask is obtained from eg. getifaddrs(3), sa_len
> +         * can vary wildly. We already know that addr->sa_family =3D=3D AF_INE=
T,
> +         * so just use sizeof(struct sockaddr_in).
> +         */
> +        memcpy(&maskcopy, mask, sizeof(struct sockaddr_in));
>          pg_promote_v4_to_v6_addr(&addrcopy);
>          pg_promote_v4_to_v6_mask(&maskcopy);
> =20
>=20
>=20
> --=20
> jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF  DDCC 0DFA 74AE 1524 E7=
EE
>=20

----- End forwarded message -----

[1]: http://www.postgresql.org/message-id/20150216202138.GB20474@athena.bar=
rera.io
[2]: https://marc.info/?t=3D142367714200001&r=3D1&w=3D2

--=20
Hugo Osvaldo Barrera
A: Because we read from top to bottom, left to right.
Q: Why should I start my reply below the quoted text?

Re: Fwd: Re: postgresql-server exiting abnormally after upgrade to -snapshot

From

Tom Lane

Date:

17 February 2015, 15:06:18

Hugo Osvaldo Barrera <hugo@barrera.io> writes:
> The included patch fixes the issue and postgres has been working fine for me
> with it. See this thread[2] for details.

Hm, I was wondering whether copying sizeof(struct sockaddr_storage) might
not be a risky choice.  However, the conclusion on the -hackers thread

http://www.postgresql.org/message-id/flat/23855.1424136475@sss.pgh.pa.us

seems to be that we're going to eliminate the entire issue by just ripping
out this code section, lock stock n barrel.

Thanks for confirming that this fixes it for you, though --- at least
we now have a certain diagnosis of the problem.

            regards, tom lane