Thread: Questionable coding in proc.c & lock.c
I spent some time looking around for possible causes of the recently reported deadlock conditions. I couldn't find any smoking gun, but I found a couple of things that look fishy: 1. I notice that ProcReleaseAll does not act quite the same way that ProcRelease does. Look in particular at the logic for handling wakeupNeeded in each one. There is some useless code in ProcRelease, but the bottom line is that it *will* call ProcLockWakeup() unless it finds there are no remaining holders (and deletes the lock instead). OTOH, ProcReleaseAll will only call ProcLockWakeup() if this test succeeds for some lockmode: if (!wakeupNeeded && xidLook->holders[i] > 0 && lockMethodTable->ctl->conflictTab[i] & lock->waitMask) wakeupNeeded = true; I spent some time trying to see a way that this might fail to wake up a waiter who needs to be woken up. I don't see one, but either this code is broken or ProcRelease is doing unnecessary work. 2. The loop in ProcLockWakeup() has been broken since the day it was written. It's trying to scan through the list of waiting processes and awaken all that can legitimately be awoken. However, as soon as it finds one that cannot be woken, it sets last_locktype = proc->token and then continues the loop *without advancing proc*. All subsequent iterations will compare proc->token == last_locktype, find they are equal, and then continue --- again without advancing proc. It's not an infinite loop because there's a list-length counter, but none of the proc entries beyond the first unawakenable one will be examined. The net effect is that the behavior is the same as it was in 6.3 or so, when the loop was just "while (LockResolveConflicts(...) succeeds) do remove-and-awaken-first-entry". So the question is, is there ever a case where it is necessary to wake processes that are in the list beyond one that can't be woken? I haven't been able to conjure up an example, but I am suspicious. This error cannot explain any new hangs seen in 7.0, since the effective behavior of ProcLockWakeup() hasn't changed since Postgre95. But perhaps someone made a change elsewhere relying on the assumption that ProcLockWakeup() would wake up any available waiter. Comments anyone? regards, tom lane
> -----Original Message----- > From: pgsql-hackers-owner@hub.org [mailto:pgsql-hackers-owner@hub.org]On > Behalf Of Tom Lane > > 2. The loop in ProcLockWakeup() has been broken since the day it was > written. It's trying to scan through the list of waiting processes > and awaken all that can legitimately be awoken. However, as soon as > it finds one that cannot be woken, it sets last_locktype = proc->token > and then continues the loop *without advancing proc*. All subsequent > iterations will compare proc->token == last_locktype, find they are I pointed out it once before 6.5 and it was one of the cause of lock freezing then. I've known it has not been changed but neglected to tell it, sorry. However the freezing was resolved by Vadim's change and I've thought it doesn't cause freezing. Hmm,I don't remember well now how the freezing occured. The following is my posting about the freezing. Hope that helps, Regards. Hiroshi Inoue Inoue@tpf.co.jp > There is no row-level locks: all locks over tables are > table-level ones, btree & hash use page-level locks, but > never do page->table level lock escalation. > > However, I'm not sure that proposed changes will help in the next case: > > session-1 => begin; > session-1 => insert into tt values (1); --RowExclusiveLock > > session-2 => begin; > session-2 => insert into tt values (2); --RowExclusiveLock > > session-3 => begin; > session-3 => lock table tt; --AccessExclusiveLock > (conflicts with 1 & 2) > ^ > session-1 => lock table tt in share mode; --ShareLock > (conflicts with 2 & 3) > ^ > This is deadlock situation and must be handled by > DeadLockCheck(). > It's really a deadlock ? Certainly end/abort of session-2 doesn't wakeup session-1/session3. I think it's due to the following code in ProcLockWakeup(). while ((queue_size--) && (proc)) { /* * This proc will conflict as the previous one did, don't even * try. */ if (proc->token == last_locktype) continue; /* * This proc conflicts with locks held by others, ignored. */ if (LockResolveConflicts(lockmethod, lock, proc->token, proc->xid, (XIDLookupEnt * ) NULL) != STATUS_OK) { last_locktype = proc->token; continue; } Once LockResolveConflicts() doesn't return STATUS_OK,proc is not changed and only queue_size-- is executed(never try to wakeup other procs). After inserting the code such asproc = (PROC *) MAKE_PTR(proc->links.prev); before continue statements,ProcLockWakeup() triggerd by end/abort of session-2 could try to wakeup session-1.
> -----Original Message----- > From: pgsql-hackers-owner@hub.org [mailto:pgsql-hackers-owner@hub.org]On > Behalf Of Tom Lane > > I spent some time looking around for possible causes of the recently > reported deadlock conditions. I couldn't find any smoking gun, but > I found a couple of things that look fishy: > Oops,I've forgotten another freezing issue reported by Alfred Perlstein. We know the cause(db access in abort transaction state) of it. Seems xact.c has been pretty changed after I examined it 2 months ago. Have you already fixed it ? If not,I would examine it again and fix the bug. OK ? Here's a reproducible example. Session-1# begin;BEGIN=# lock t;LOCK TABLE Session-2=# begin;BEGIN=# lock t; [blocked] ^CCancel request sentERROR: Query cancel requested while waiting lockreindex=#select * from t;[blocked] Session-1=# commit;COMMIT Session-2ERROR: LockRelation: LockAcquire failed=# abort;ROLLBACK=# lock t;[blocked] Regards. Hiroshi Inoue
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > Oops,I've forgotten another freezing issue reported by Alfred Perlstein. > We know the cause(db access in abort transaction state) of it. > Seems xact.c has been pretty changed after I examined it 2 months > ago. Have you already fixed it ? That example seems to work better than before, but the generic problem is still there: we should avoid running the planner analysis phase and the rewriter when we are in abort state. We also need to do something about holding locks on relations clear through from parsing till execution. It occurs to me that there is a problem closely related to the abort problem here: what if there are transaction boundaries in the query string? Suppose the query string is begin; select * from foo; end; select * from bar; Currently, even if the parser did grab a lock on bar, it'd get dropped during execution of the "end". I think maybe what needs to be done to fix all this is to restructure postgres.c's interface to the parser/rewriter. What we want is to run just the yacc grammar initially to produce a list of raw parse trees (which is enough to detect begin/commit/rollback, no?) Then postgres.c walks down that list, and for each element, if it is commit/rollback OR we are not in abort state, do parse analysis, rewrite, planning, and execution. (Thomas, any comments here?) regards, tom lane
> I think maybe what needs to be done to fix all this is to restructure > postgres.c's interface to the parser/rewriter. What we want is to > run just the yacc grammar initially to produce a list of raw parse > trees (which is enough to detect begin/commit/rollback, no?) Then > postgres.c walks down that list, and for each element, if it is > commit/rollback OR we are not in abort state, do parse analysis, > rewrite, planning, and execution. (Thomas, any comments here?) Sure, why not (restructure postgres.c that is)? I was just thinking about how to implement "autocommit" and was considering doing a hack in analyze.c which just plops a "BEGIN" in front of the existing query. But restructuring a bit higher up will let us make this a real feature, not a hack (I hope ;) btw, even gram.y does touch some of the heap cache (for pg_type) to look for type existance; don't know if that will be a problem but maybe that needs to be rethought also... - Thomas
> > I think maybe what needs to be done to fix all this is to restructure > > postgres.c's interface to the parser/rewriter. What we want is to > > run just the yacc grammar initially to produce a list of raw parse > > trees (which is enough to detect begin/commit/rollback, no?) Then > > postgres.c walks down that list, and for each element, if it is > > commit/rollback OR we are not in abort state, do parse analysis, > > rewrite, planning, and execution. (Thomas, any comments here?) > > Sure, why not (restructure postgres.c that is)? I was just thinking > about how to implement "autocommit" and was considering doing a hack in > analyze.c which just plops a "BEGIN" in front of the existing query. But Man, that is something I would do. :-) -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: > btw, even gram.y does touch some of the heap cache (for pg_type) to look > for type existance; don't know if that will be a problem but maybe that > needs to be rethought also... We'd need to postpone that processing till parse analysis, else we still have the underlying problem. Fortunately we are not parsing C ;-) so it seems to me it shouldn't be necessary to do any table lookups during initial parsing... I assume you are looking at the 'setof' processing? Offhand it seems to me that this code is broken anyway: use of a relation type should refer to the tuple type, but should *not* imply SETOF, at least IMHO. regards, tom lane
> I assume you are looking at the 'setof' processing? Offhand it seems to > me that this code is broken anyway: use of a relation type should refer > to the tuple type, but should *not* imply SETOF, at least IMHO. No, there is another routine (not remembering the name right now) which is involved, I *think* from within gram.y, which barfs when called with "opaque" as an argument (among other things). Can look up more info later (I'm away this weekend). - Thomas