Home > mailing lists

BUG #13818: PostgreSQL crashes after cronjob runs as "postgres" - Mailing list pgsql-bugs

From	ryan@runscope.com
Subject	BUG #13818: PostgreSQL crashes after cronjob runs as "postgres"
Date	December 15, 2015 19:35:50
Msg-id	20151214224729.2624.99840@wrigleys.postgresql.org Whole thread Raw
Responses	Re: BUG #13818: PostgreSQL crashes after cronjob runs as "postgres"
List	pgsql-bugs

Tree view

The following bug has been logged on the website:

Bug reference:      13818
Logged by:          Ryan Park
Email address:      ryan@runscope.com
PostgreSQL version: 9.4.5
Operating system:   CentOS 7.2
Description:

We are running PostgreSQL 9.4.5 on CentOS 7, uising the RPM from
postgresql.org. We perform automatic OS updates, so our server was just
upgraded from CentOS 7.1 to CentOS 7.2. Since the upgrade we've been seeing
PostgreSQL crash after a cronjob is run as the "postgres" user.

More details: we have a cronjob that runs every 3 minutes. This cronjob runs
as the OS user "postgres". It makes a very simple database update using
PSQL, like this:
    */3 * * * * /usr/pgsql-9.4/bin/psql -U <username> <dbname> -c 'UPDATE
patches SET name = name WHERE patch = 1'

Here's the syslog output from /var/log/messages when the cronjob runs:

    Dec 14 18:24:01 prod118 systemd: Created slice user-1012.slice.
    Dec 14 18:24:01 prod118 systemd: Starting user-1012.slice.
    Dec 14 18:24:01 prod118 systemd: Started Session 15015 of user
postgres.
    Dec 14 18:24:01 prod118 systemd: Starting Session 15015 of user
postgres.
    Dec 14 18:24:01 prod118 systemd: Removed slice user-1012.slice.
    Dec 14 18:24:01 prod118 systemd: Stopping user-1012.slice.
    Dec 14 18:24:01 prod118 postgres[52611]: [2584-1] PANIC:
semop(id=57311261) failed: Invalid argument
    Dec 14 18:24:01 prod118 postgres[52611]: [2584-2] STATEMENT:  COMMIT
    Dec 14 18:24:01 prod118 postgres[52584]: [2584-1] PANIC:
semop(id=57376799) failed: Invalid argument
    Dec 14 18:24:01 prod118 postgres[52584]: [2584-2] STATEMENT:  COMMIT
    Dec 14 18:24:01 prod118 postgres[52630]: [2584-1] PANIC:
semop(id=57311261) failed: Invalid argument
    Dec 14 18:24:01 prod118 postgres[52630]: [2584-2] STATEMENT:  COMMIT
    Dec 14 18:24:01 prod118 postgres[72998]: [2600-1] LOG:  server process
(PID 52630) was terminated by signal 6: Aborted
    Dec 14 18:24:01 prod118 postgres[72998]: [2600-2] DETAIL:  Failed
process was running: COMMIT
    Dec 14 18:24:01 prod118 postgres[72998]: [2601-1] LOG:  terminating any
other active server processes
    Dec 14 18:24:01 prod118 postgres[52711]: [2585-1] WARNING:  terminating
connection because of crash of another server process
    Dec 14 18:24:01 prod118 postgres[52711]: [2585-2] DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.
    Dec 14 18:24:01 prod118 postgres[52711]: [2585-3] HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
    Dec 14 18:24:01 prod118 postgres[52715]: [2585-1] WARNING:  terminating
connection because of crash of another server process
    Dec 14 18:24:01 prod118 postgres[52714]: [2585-1] WARNING:  terminating
connection because of crash of another server process
    Dec 14 18:24:01 prod118 postgres[52715]: [2585-2] DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.
    Dec 14 18:24:01 prod118 postgres[52714]: [2585-2] DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because another server process exited abnormally and
possibly corrupted shared memory.
    Dec 14 18:24:01 prod118 postgres[52715]: [2585-3] HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
    Dec 14 18:24:01 prod118 postgres[52714]: [2585-3] HINT:  In a moment you
should be able to reconnect to the database and repeat your command.
    ... and so on.

It looks like systemd created a "slice" for the cronjob. A "slice" is a
partition of system resources using cgroups, kinda like a Docker container.
When the cronjob finished, systemd tore down the slice. My hunch is that
when the slice was stopped, it introduced some corruption in shared memory
or another shared resource. After the slice was stopped, we immediately saw
3 semop failures in PostgreSQL, which caused the entire Postgres server to
crash. The initial postgres process (/usr/pgsql-9.4/bin/postgres) kept
running, but all the subprocesses were restarted -- including the
checkpointer, writer, etc. This happened over and over, every time the
cronjob ran.

We tried restarting PostgreSQL and restarting the whole server, but neither
of those helped.

As I mentioned, our cronjob ran as the "postgres" user. When I switched the
cronjob to run from a different user account, everything worked fine and
Postgres stopped crashing. I'm guessing that there are fewer shared
resources between 2 different users, so nothing gets corrupt when the other
user's slice is stopped.

Exact sequence of steps from startup:
    1. Create a database called "<dbname>" with a "patches" table. This is
the schema:
        CREATE TABLE IF NOT EXISTS patches(
            patch integer NOT NULL,
            name character varying(50),
            successful boolean NOT NULL,
            applied_by character varying(1000),
            applied_at timestamp without time zone NOT NULL,
            CONSTRAINT patches_pkey PRIMARY KEY (patch)
        ) WITH (OIDS=FALSE);
    2. Make sure the user "<username>" has permission to write to that
table.
    3. Create a cronjob like the one described above. It should be in the
"postgres" user's crontab.
        */3 * * * * /usr/pgsql-9.4/bin/psql -U <username> <dbname> -c
'UPDATE patches SET name = name WHERE patch = 1'

What we got: PostgreSQL crashed with the logs listed above.

What we expected: PostgreSQL would not crash.

PostgreSQL version: Both server and client are version 9.4.5. We are using
these RPMs:
    postgresql94-9.4.5-1PGDG.rhel7.x86_64
    postgresql94-contrib-9.4.5-1PGDG.rhel7.x86_64
    postgresql94-server-9.4.5-1PGDG.rhel7.x86_64
    postgresql94-libs-9.4.5-1PGDG.rhel7.x86_64

Platform information: CentOS 7.2 on x86_64. The issue happened on both Linux
kernel 3.10.0-229.20.1.el7.x86_64 and on kernel 3.10.0-327.3.1.el7.x86_64.
The version of systemd is 219. This is running on Amazon EC2, on an
i2.4xlarge instance.

Thank you for your help.

Ryan Park
Runscope Operations
ryan@runscope.com

pgsql-bugs by date:

From: Devrim GÜNDÜZ
Date: 15 December 2015, 12:27:27
Subject: Re: Error postgres9.5

From: ashish.chauhan@support.com
Date: 15 December 2015, 19:36:18
Subject: BUG #13819: What is maximum limit on max_connections?

BUG #13818: PostgreSQL crashes after cronjob runs as "postgres" - Mailing list pgsql-bugs

Previous

Next