Re: Reducing overhead of frequent table locks - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: Reducing overhead of frequent table locks |
Date | |
Msg-id | 20110524090734.GA18831@tornado.gateway.2wire.net Whole thread Raw |
In response to | Re: Reducing overhead of frequent table locks (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Reducing overhead of frequent table locks
|
List | pgsql-hackers |
On Mon, May 23, 2011 at 09:15:27PM -0400, Robert Haas wrote: > On Fri, May 13, 2011 at 4:16 PM, Noah Misch <noah@leadboat.com> wrote: > > ? ? ? ?if (level >= ShareUpdateExclusiveLock) > > ? ? ? ? ? ? ? ?++strong_lock_counts[my_strong_lock_count_partition] > > ? ? ? ? ? ? ? ?sfence > > ? ? ? ? ? ? ? ?if (strong_lock_counts[my_strong_lock_count_partition] == 1) > > ? ? ? ? ? ? ? ? ? ? ? ?/* marker 1 */ > > ? ? ? ? ? ? ? ? ? ? ? ?import_all_local_locks > > ? ? ? ? ? ? ? ?normal_LockAcquireEx > > ? ? ? ?else if (level <= RowExclusiveLock) > > ? ? ? ? ? ? ? ?lfence > > ? ? ? ? ? ? ? ?if (strong_lock_counts[my_strong_lock_count_partition] == 0) > > ? ? ? ? ? ? ? ? ? ? ? ?/* marker 2 */ > > ? ? ? ? ? ? ? ? ? ? ? ?local_only > > ? ? ? ? ? ? ? ? ? ? ? ?/* marker 3 */ > > ? ? ? ? ? ? ? ?else > > ? ? ? ? ? ? ? ? ? ? ? ?normal_LockAcquireEx > > ? ? ? ?else > > ? ? ? ? ? ? ? ?normal_LockAcquireEx > > > > At marker 1, we need to block until no code is running between markers two and > > three. ?You could do that with a per-backend lock (LW_SHARED by the strong > > locker, LW_EXCLUSIVE by the backend). ?That would probably still be a win over > > the current situation, but it would be nice to have something even cheaper. > > Barring some brilliant idea, or anyway for a first cut, it seems to me > that we can adjust the above pseudocode by assuming the use of a > LWLock. In addition, two other adjustments: first, the first line > should test level > ShareUpdateExclusiveLock, rather than >=, per > previous discussion. Second, import_all_local locks needn't really > move everything; just those locks with a matching locktag. Thus: > > ! if (level > ShareUpdateExclusiveLock) > ! ++strong_lock_counts[my_strong_lock_count_partition] > ! sfence > ! for each backend > ! take per-backend lwlock for target backend > ! transfer fast-path entries with matching locktag > ! release per-backend lwlock for target backend > ! normal_LockAcquireEx > ! else if (level <= RowExclusiveLock) > ! lfence > ! if (strong_lock_counts[my_strong_lock_count_partition] == 0) > ! take per-backend lwlock for own backend > ! fast-path lock acquisition > ! release per-backend lwlock for own backend > ! else > ! normal_LockAcquireEx > ! else > ! normal_LockAcquireEx This drops the part about only transferring fast-path entries once when a strong_lock_counts cell transitions from zero to one. Granted, that itself requires some yet-undiscussed locking. For that matter, we can't have multiple strong lockers completing transfers on the same cell in parallel. Perhaps add a FastPathTransferLock, or an array of per-cell locks, that each strong locker holds for that entire "if" body and while decrementing the strong_lock_counts cell at lock release. As far as the level of detail of this pseudocode goes, there's no need to hold the per-backend LWLock while transferring the fast-path entries. You just need to hold it sometime between bumping strong_lock_counts and transferring the backend's locks. This ensures that, for example, the backend is not sleeping in the middle of a fast-path lock acquisition for the whole duration of this code. > Now, a small fly in the ointment is that we haven't got, with > PostgreSQL, a portable library of memory primitives. So there isn't > an obvious way of doing that sfence/lfence business. I was thinking that, if the final implementation could benefit from memory barrier interfaces, we should create those interfaces now. Start with only a platform-independent dummy implementation that runs a lock/unlock cycle on a spinlock residing in backend-local memory. I'm 75% sure that would be sufficient on all architectures for which we support spinlocks. It may turn out that we can't benefit from such interfaces at this time ... > Now, it seems to > me that in the "strong lock" case, the sfence isn't really needed > anyway, because we're about to start acquiring and releasing an lwlock > for every backend, and that had better act as a full memory barrier > anyhow, or we're doomed. The "weak lock" case is more interesting, > because we need the fence before we've taken any LWLock. Agreed. > But perhaps it'd be sufficient to just acquire the per-backend lwlock > before checking strong_lock_counts[]. If, as we hope, we get back a > zero, then we do the fast-path lock acquisition, release the lwlock, > and away we go. If we get back any other value, then we've wasted an > lwlock acquisition cycle. Or actually maybe not: it seems to me that > in that case we'd better transfer all of our fast-path entries into > the main hash table before trying to acquire any lock the slow way, at > least if we don't want the deadlock detector to have to know about the > fast-path. So then we get this: > > ! if (level > ShareUpdateExclusiveLock) > ! ++strong_lock_counts[my_strong_lock_count_partition] > ! for each backend > ! take per-backend lwlock for target backend > ! transfer fastpath entries with matching locktag > ! release per-backend lwlock for target backend > ! else if (level <= RowExclusiveLock) > ! take per-backend lwlock for own backend > ! if (strong_lock_counts[my_strong_lock_count_partition] == 0) > ! fast-path lock acquisition > ! done = true > ! else > ! transfer all fastpath entries > ! release per-backend lwlock for own backend > ! if (!done) > ! normal_LockAcquireEx Could you elaborate on the last part (the need for "else transfer all fastpath entries") and, specifically, how it aids deadlock avoidance? I didn't think this change would have any impact on deadlocks, because all relevant locks will be in the global lock table before any call to normal_LockAcquireEx. To validate the locking at this level of detail, I think we need to sketch the unlock protocol, too. On each strong lock release, we'll decrement the strong_lock_counts cell. No particular interlock with fast-path lockers should be needed; a stray AccessShareLock needlessly making it into the global lock table is no problem. As mentioned above, we _will_ need an interlock with lock transfer operations. How will transferred fast-path locks get removed from the global lock table? Presumably, the original fast-path locker should do so at transaction end; anything else would contort the life cycle. Then add a way for the backend to know which locks had been transferred as well as an interlock against concurrent transfer operations. Maybe that's all. > That seems like it ought to work, at least assuming the position of > your fencing instructions was correct in the first place. But there's > one big problem to worry about: what happens if the lock transfer > fails due to shared memory exhaustion? It's not so bad in the "weak > lock" case; it'll feel just like the already-existing case where you > try to push another lock into the shared-memory hash table and there's > no room. Essentially you've been living on borrowed time anyway. On > the other hand, the "strong lock" case is a real problem, because a > large number of granted fast-path locks can effectively DOS any strong > locker, even one that wouldn't have conflicted with them. That's > clearly not going to fly, but it's not clear to me what the best way > is to patch around it. To put it another way: the current system is fair; the chance of hitting lock exhaustion is independent of lock level. The new system would be unfair; lock exhaustion is much more likely to appear for a > ShareUpdateExclusiveLock acquisition, through no fault of that transaction. I agree this isn't ideal, but it doesn't look to me like an unacceptable weakness. Making lock slots first-come, first-served is inherently unfair; we're not at all set up to justly arbitrate between mutually-hostile lockers competing for slots. The overall situation will get better, not worse, for the admin who wishes to defend against hostile unprivileged users attempting a lock table DOS. Thanks, nm
pgsql-hackers by date: