Lock pileup stuck processes - Mailing list pgsql-bugs

From Josh berkus
Subject Lock pileup stuck processes
Date
Msg-id 570ECF60.5040200@agliodbs.com
Whole thread Raw
Responses Re: Lock pileup stuck processes
List pgsql-bugs
Folks,

This is a "hard to reproduce" bug, so is being submitted to this list in
order to accumulate evidence for eventual debugging when there are
enough reports to figure something out.  Since I've seen this on two
different user applications now, I think it relates to some kind of
persistent issue either in Postgres or in the OS.

Summary: in some cases, "lock pileups" fail to resolve completely, and
one or more orphan backends are left in permanent lock-waiting state.

Versions observed: 9.2.14, 9.2.15, 9.3.5

Platforms: RHEL6, Fedora

Observations:

1. A long-running transaction grabs one or more row locks.

2. Various queries, especially SELECT FOR UPDATE queries, pile up behind
this lock request.

3. At peak, 30 or more backends are waiting for locks in a dependency
chain.  System load is high.

4. Original transaction ends.

5. Over 10 minutes most of the waiting backends complete their work and
release.

6. 1 to 3 backends never come out of active/waiting state, remaining
that way indefinitely.

My attempts to reproduce this issue under synthetic circumstances have
not been successful.  strace of the stuck backends shows no activity.

--=20
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)

pgsql-bugs by date:

Previous
From: Christoph Berg
Date:
Subject: Re: Bus error in pg_logical_slot_get_changes (9.4.7, sparc)
Next
From: Tom Lane
Date:
Subject: Re: Lock pileup stuck processes