Re: BUG #13660: serializable snapshotting hangs - Mailing list pgsql-bugs
From | Chris Pacejo |
---|---|
Subject | Re: BUG #13660: serializable snapshotting hangs |
Date | |
Msg-id | CAC8iE5isTnJ_8KHv6HyWAqW=2q37wpHTiTA3EBLhWzgLVTBJGw@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #13660: serializable snapshotting hangs (Kevin Grittner <kgrittn@ymail.com>) |
Responses |
Re: BUG #13660: serializable snapshotting hangs
|
List | pgsql-bugs |
I get that, but I had no other open connections to that database. There were many connections to *other* databases on that same *server*, but that shouldn't affect taking a snapshot on an otherwise unused database, correct? Also note that once this happened to one database, it happened to *all* databases on the same server simultaneously. It's as if the read-write transaction check is performed server-wide, not database-wide as I would expect. On Tue, Oct 6, 2015 at 5:38 PM, Kevin Grittner <kgrittn@ymail.com> wrote: > On Thursday, October 1, 2015 9:19 PM, "cpacejo@clearskydata.com" > <cpacejo@clearskydata.com> wrote: > >> PostgreSQL version: 9.4.4 > >> After running fine for weeks, we now find that serializable snapshots >> hang: >> >> our_db=> START TRANSACTION ISOLATION LEVEL SERIALIZABLE, READ ONLY, >> DEFERRABLE; >> START TRANSACTION >> our_db=> SELECT pg_export_snapshot(); >> (...hangs indefinitely...) > >> Killing all backends (i.e. including those accessing other databases) >> unblocked serializable snapshotting. >> >> Is this expected behavior? > > What should be happening is that when the serializable read only > deferrable transaction executes the first statement that needs a > snapshot, it grabs a snapshot and looks for concurrent read write > transactions. If there are none, the snapshot is safe and it goes > ahead with execution; otherwise it goes into a loop where it delays > (to avoid sucking 100% CPU for a core) by calling > ProcWaitForSignal(). Each time that function returns it checks > whether any of the concurrent read write transactions developed a > read-write conflict out (i.e., it read data which was modified by a > transaction concurrent to *it*) to a transaction which committed > before this read only transaction got its snapshot. If so, this > snapshot is unsafe and we need to discard it, get a new snapshot, > and go back into the loop. If all read write transactions > concurrent to the read only transaction complete without developing > such a conflict our snapshot is safe and we can proceed to run > without taking predicate locks or risking a serialization failure. > > Now, that's pretty technical, and you definitely don't need to > follow all of the above to safely use deferrable transactions. The > short version is that your initial statement that needs a snapshot > cannot begin to execute *at least* until any concurrent > transactions which are not READ ONLY complete. One connection > stuck "idle in transaction" can hold it up indefinitely. A > prepared transaction also counts as a transaction which has not yet > completed. > > So if you had any long-running READ WRITE transactions (including > one stuck "idle in transaction" or prepared but not committed), it > is expected behavior. If not, we should try to gather more > information to figure out what's going on. > > One other possibility is that your workload has changed such that > there are now so many read write transactions reading data modified > by other transactions that it is just not finding a safe snapshot > very quickly. Even when running benchmarks at saturation levels on > a 16 core machine we rarely saw a delay more than six seconds, but > there is no upper bound on how long it might take. In such a case > you would have a few options, but let's not get ahead of ourselves > -- the first thing is to check for "idle in transaction" or > prepared transaction issues. > > -- > Kevin Grittner > EDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company >
pgsql-bugs by date: