Many queries stuck for waiting MultiXactOffsetControlLock on PostgreSQL 9.6.1 replica - Mailing list pgsql-bugs
From | Rui An |
---|---|
Subject | Many queries stuck for waiting MultiXactOffsetControlLock on PostgreSQL 9.6.1 replica |
Date | |
Msg-id | CABjTVMR39ci6LUVE2yWhpcufPee4nO3cu32sij3Yh8viksCztQ@mail.gmail.com Whole thread Raw |
Responses |
Re: Many queries stuck for waiting MultiXactOffsetControlLock on PostgreSQL 9.6.1 replica
Re: Many queries stuck for waiting MultiXactOffsetControlLock on PostgreSQL 9.6.1 replica |
List | pgsql-bugs |
Hi, I have recently encountered the problem, as title, on some of my PostgreSQL 9.6.1 hot standby replicas when serving readonly queries.
The problem makes a lot of PostgreSQL processes stuck on waiting for MultiXactOffsetControlLock and eventually runs out of the max_connections quota.
For example:
mydb=# select count(*) from pg_stat_activity where state = 'active' and wait_event = 'MultiXactOffsetControlLock';
count
------
956
(1 row)
I have tried use pg_terminate_backend to kill the stuck prcoesses, but no help. They still stuck on waiting for MultiXactOffsetControlLock even if the pg_terminate_backend(pid) returns true.
Currently, the only thing I can do to resolve the problem is to restart my replicas.
Can someone help me find out how could the problem occurred and is there any other way to resolve the problem?
I have also recorded some strace and perf top output on stuck processes:
strace:
strace: Process 555 attached
select(0, NULL, NULL, NULL, {0, 78} = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
semop(60981276, [{0, -1, 0}, 1]) = 0
semop(60981276, [{0, -1, 0}, 1]) = 0
semop(61734963, [{14, -1, 0}, 1]) = 0
select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
semop(61505580, [{12, -1, 0}, 1]) = 0
select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
semop(60981276, [{0, -1, 0}, 1]) = 0
semop(63471720, [{9, -1, 0}, 1]) = 0
select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
semop(60981276, [{0, -1, 0}, 1]) = 0
semop(63471720, [{9, -1, 0}, 1]) = 0
select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
select(0, NULL, NULL, NULL, {0, 1000} = 0 (Timeout)
semop(60981276, [{0, -1, 0}, 1]) = 0
perf top:
Samples: 208 of event 'cpu-clock', Event count (approx.): 36045672
Overhead Shared Object Symbol
30.74% postgres [.] perform_spin_delay
13.70% [kernel] [k] finish_task_switch
13.66% [kernel] [k] 0x00007fff81838f35
8.59% postgres [.] 0x0000000000319107
4.02% postgres [.] 0x0000000000318f04
2.47% postgres [.] pg_usleep
2.10% libc-2.23.so [.] __select
1.68% postgres [.] PGSemaphoreLock
1.68% postgres [.] 0x0000000000318eb2
1.54% [kernel] [k] pvclock_clocksource_read
1.53% postgres [.] GetMultiXactIdMembers
1.41% postgres [.] 0x0000000000318ea4
1.31% postgres [.] LWLockAcquire
1.30% postgres [.] 0x0000000000318e17
1.23% [kernel] [k] do_select
1.07% postgres [.] finish_spin_delay
1.06% [kernel] [k] poll_freewait
1.05% libc-2.23.so [.] semop
0.96% [kernel] [k] poll_schedule_timeout
0.72% [kernel] [k] SYSC_semtimedop
0.69% [kernel] [k] core_sys_select
0.69% [kernel] [k] pid_vnr
0.61% [kernel] [k] dev_gro_receive
0.61% libc-2.23.so [.] random
0.61% postgres [.] 0x0000000000318e3d
0.53% [kernel] [k] hrtimer_active
0.53% [kernel] [k] ipc_obtain_object_check
0.46% postgres [.] 0x0000000000318f00
0.46% postgres [.] 0x0000000000319100
0.46% postgres [.] 0x0000000000319103
0.41% postgres [.] LWLockRelease
0.41% postgres [.] 0x0000000000318f85
0.36% [kernel] [k] run_timer_softirq
0.36% [kernel] [k] sys_semop
pgsql-bugs by date: