Re: patch: avoid heavyweight locking on hash metapage - Mailing list pgsql-hackers
From | Jeff Janes |
---|---|
Subject | Re: patch: avoid heavyweight locking on hash metapage |
Date | |
Msg-id | CAMkU=1zsjKXLian2OCqX0wt6+VUMuSTcyNDBKB_ZVaFj0SA6xg@mail.gmail.com Whole thread Raw |
In response to | patch: avoid heavyweight locking on hash metapage (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: patch: avoid heavyweight locking on hash metapage
|
List | pgsql-hackers |
On Wed, May 30, 2012 at 3:14 PM, Robert Haas <robertmhaas@gmail.com> wrote: > I developed the attached patch to avoid taking a heavyweight lock on > the metapage of a hash index. Instead, an exclusive buffer content > lock is viewed as sufficient permission to modify the metapage, and a > shared buffer content lock is used when such modifications need to be > prevented. For the most part this is a trivial change, because we > were already taking these locks: we were just taking the heavyweight > locks in addition. The only sticking point is that, when we're > searching or inserting, we previously locked the bucket before > releasing the heavyweight metapage lock, which is unworkable when > holding only a buffer content lock because (1) we might deadlock and > (2) buffer content locks can't be held for long periods of time even > when there's no deadlock risk. To fix this, I implemented a simple > loop-and-retry system: we release the metapage content lock, acquire > the heavyweight lock on the target bucket, and then reacquire the > metapage content lock and check that the bucket mapping has not > changed. Normally it hasn't, and we're done. But if by chance it > has, we simply unlock the metapage, release the heavyweight lock we > acquired previously, lock the new bucket, and loop around again. Even > in the worst case we cannot loop very many times here, since we don't > split the same bucket again until we've split all the other buckets, > and 2^N gets big pretty fast. Do we need the retry flag (applies to two places)? If oldblkno is still InvalidBlockNumber then it can't equal blkno. I think the extra variable might be clearer than the magic value, but we already have the magic value so do we want to have both a flag variable and a magic value? + if (retry) + { + if (oldblkno == blkno) + break; + _hash_droplock(rel, oldblkno, HASH_SHARE); + } In the README, the psuedo codes probably needs to mention re-locking the meta page in the loop. Also, "page" is used to mean either the disk page or the shared buffer currently holding that page, depending on context. This is confusing.Maybe we should clarify "Lock the meta page buffer". Of course this gripe precedes your patch and applies to other parts of the code as well, but since we mingle LW locks (on buffers) and heavy locks (on names of disk pages) it might make sense to be more meticulous here. "exclusive-lock page 0 (assert the right to begin a split)" is no longer true, nor is "release X-lock on page 0" Also in the README, section "To prevent deadlock we enforce these coding rules:" would need to be changed as those rules are being changed. But, should we change them at all? In _hash_expandtable, the claim "But since we are only trylocking here it should be OK" doesn't completely satisfy me. Even a conditional heavy-lock acquire still takes a transient non-conditional LW Lock on the lock manager partition. Unless there is a global rule that no one can take a buffer content lock while holding a lock manager partition lock, this seems dangerous. Could this be redone to follow the pattern of heavy locking with no content lock, then reacquiring the buffer content lock to check to make sure we locked the correct things? I don't know if it would be better to loop, or just give up, if the meta page changed underneath us. Cheers, Jeff
pgsql-hackers by date: