Thread: [BUGS] BUG #14588: Postgres process blocked on semop
The following bug has been logged on the website: Bug reference: 14588 Logged by: Chenhong Liu Email address: diabloneo@gmail.com PostgreSQL version: 9.2.16 Operating system: CentOS 7.2 1511 Description: We run a distributed application which storing data in pgsql. This application is designed to do some route works during 00:00 to 6:00 every day. I found that, postgres processes are all blocked on semop function, which means they are waiting for sysv semaphores. I'm not sure if this is a bug or we misconfigured pgsql and OS. Techinque information: 30 hosts running the application. 3 hosts running the postgresql servers, one master and two host-stanby servers. pgsql config: max_connections = 1000 shared_buffers = 512MB OS kernel.sem 250 32000 32 128 During the daytime, there are about 300 postgres processes running, work as expected. On 00:00 each day, there will be about 200 more postgresql processes created, now totally about 500. And from that time, thoese working processes are all blocked on semop function. I check this using pstack command, and also check the output of ipcs -s [-i]. After this has happened, all new connection will stay in state authentication. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
I add some log I collected for this problem. My them be helpful.
The attachements are two logs and one picture.
sds_20170311_1124.bug.log
It's log I collected every 5 minutes, it contains postgres processes list, ipcs -s output and ipcs -s -u output. I delete most of the log, just keep necessary pieces.
ipcs.log
It's the output of every semaphores set used by postgresql displayed with ipcs -s -i command, you can find out many process were waited for a semapthore. The number of lines which ncount == 1 is 469
The pstack screenshot
It shows two process's stack, one is pid 186397 who were in authentication, and the other is 188832 who where in INSERT.
Forgot to mention, I can only recovery from this situation by restart postgresql service. If I killed any process blocked with kill -9 command, the service will restart.
On Sun, Mar 12, 2017 at 11:57 AM <diabloneo@gmail.com> wrote:
The following bug has been logged on the website:
Bug reference: 14588
Logged by: Chenhong Liu
Email address: diabloneo@gmail.com
PostgreSQL version: 9.2.16
Operating system: CentOS 7.2 1511
Description:
We run a distributed application which storing data in pgsql. This
application is designed to do some route works during 00:00 to 6:00 every
day. I found that, postgres processes are all blocked on semop function,
which means they are waiting for sysv semaphores. I'm not sure if this is a
bug or we misconfigured pgsql and OS.
Techinque information:
30 hosts running the application.
3 hosts running the postgresql servers, one master and two host-stanby
servers.
pgsql config:
max_connections = 1000
shared_buffers = 512MB
OS kernel.sem
250 32000 32 128
During the daytime, there are about 300 postgres processes running, work as
expected. On 00:00 each day, there will be about 200 more postgresql
processes created, now totally about 500. And from that time, thoese working
processes are all blocked on semop function. I check this using pstack
command, and also check the output of ipcs -s [-i]. After this has happened,
all new connection will stay in state authentication.
--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
Attachment
diabloneo@gmail.com writes: > During the daytime, there are about 300 postgres processes running, work as > expected. On 00:00 each day, there will be about 200 more postgresql > processes created, now totally about 500. And from that time, thoese working > processes are all blocked on semop function. I check this using pstack > command, and also check the output of ipcs -s [-i]. After this has happened, > all new connection will stay in state authentication. And your authentication setup is ...? regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Content of my pg_hba.conf
diabloneo
On Sun, Mar 12, 2017 at 12:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
diabloneo@gmail.com writes:
> During the daytime, there are about 300 postgres processes running, work as
> expected. On 00:00 each day, there will be about 200 more postgresql
> processes created, now totally about 500. And from that time, thoese working
> processes are all blocked on semop function. I check this using pstack
> command, and also check the output of ipcs -s [-i]. After this has happened,
> all new connection will stay in state authentication.
And your authentication setup is ...?
regards, tom lane
Neo Liu <diabloneo@gmail.com> writes: > Content of my pg_hba.conf > local all all peer > host all all 0.0.0.0/0 md5 > host replication demon_replicator 0.0.0.0/0 md5 Which of those would the stuck processes be using? Also, can you attach to a few of the stuck processes and get stack traces? https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Hi, On 2017-03-12 04:30:52 +0000, Neo Liu wrote: > *The pstack screenshot* > It shows two process's stack, one is pid 186397 who were in authentication, > and the other is 188832 who where in INSERT. This suggest you're having quite massive contention around ProcArrayLock. You should consider updating to 9.6. Several releases since 9.2 considerably improved scalability around this (especially 9.6). Regards, Andres -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Mon, Mar 13, 2017 at 5:03 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2017-03-12 04:30:52 +0000, Neo Liu wrote:
> *The pstack screenshot*
> It shows two process's stack, one is pid 186397 who were in authentication,
> and the other is 188832 who where in INSERT.
This suggest you're having quite massive contention around
ProcArrayLock. You should consider updating to 9.6. Several releases
since 9.2 considerably improved scalability around this (especially
9.6).
Regards,
Andres
Thanks, Andres
I think upgrading to newest version is a good way, but I can't perform it on a production system before we doing sufficient testing.
Currently, I want to know if this is a bug of pgsql 9.2.16, and how can I avoid this situation.
Thanks, diabloneo
On Mon, Mar 13, 2017 at 12:30 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Which of those would the stuck processes be using?
Also, can you attach to a few of the stuck processes and get stack traces?
https://wiki.postgresql.org/wiki/Generating_a_stack_trace_of_a_PostgreSQL_backend
regards, tom lane
Hi, Tome
I can't get stack traces for you now. We already add a cron job to restart the pgsql server every morning to solve the problem temporarily. It's a production system in client's environment, I can't login into the system now.
I hope the pstack output in early message can help you.
Thanks, diabloneo