Re: Reducing ClogControlLock contention - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Reducing ClogControlLock contention |
Date | |
Msg-id | 20150822141448.GH8552@awork2.anarazel.de Whole thread Raw |
In response to | Reducing ClogControlLock contention (Simon Riggs <simon@2ndQuadrant.com>) |
Responses |
Re: Reducing ClogControlLock contention
Re: Reducing ClogControlLock contention |
List | pgsql-hackers |
Hi, Amit pinged me about my opinion of this patch. I don't really have time/energy for an in-depth review right now, but since I'd looked enough to see some troublesome points, I thought I'd write those. On 2015-06-30 08:02:25 +0100, Simon Riggs wrote: > Proposal for improving this is to acquire the ClogControlLock in Shared > mode, if possible. I find that rather scary. That requires for all read and write paths in clog.c and slru.c to only ever read memory locations once. Otherwise those reads may not get the same results. That'd mean we'd need to add indirections via volatile to all reads/writes. It also requires that we never read in 4 byte quantities. > This is safe because people checking visibility of an xid must always run > TransactionIdIsInProgress() first to avoid race conditions, which will > always return true for the transaction we are currently committing. As a > result, we never get concurrent access to the same bits in clog, which > would require a barrier. I don't think that's really sufficient. There's a bunch of callers do lookups without such checks, e.g. in heapam.c. It might be safe due to other circumstances, but at the very least this is a significant but sublte API revision. > Two concurrent writers might access the same word concurrently, so we > protect against that with a new CommitLock. We could partition that by > pageno also, if needed. To me it seems better to make this more integrated with slru.c. Change the locking so that the control lock (relevant for page mappings et al) is different from the locks for reading/writing data. > * If we're doing an async commit (ie, lsn is valid), then we must wait > * for any active write on the page slot to complete. Otherwise our > * update could reach disk in that write, which will not do since we > @@ -273,7 +290,9 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids, > * write-busy, since we don't care if the update reaches disk sooner than > * we think. > */ > - slotno = SimpleLruReadPage(ClogCtl, pageno, XLogRecPtrIsInvalid(lsn), xid); > + if (!InRecovery) > + LWLockAcquire(CommitLock, LW_EXCLUSIVE); > + slotno = SimpleLruReadPage_ReadOnly(ClogCtl, pageno, XLogRecPtrIsInvalid(lsn), xid); > > /* > * Set the main transaction id, if any. > @@ -312,6 +331,9 @@ TransactionIdSetPageStatus(TransactionId xid, int nsubxids, > ClogCtl->shared->page_dirty[slotno] = true; > > LWLockRelease(CLogControlLock); > + > + if (!InRecovery) > + LWLockRelease(CommitLock); > } TransactionIdSetPageStatus() calls TransactionIdSetStatusBit(), which writes an 8 byte variable (the lsn). That's not safe. Maybe write_ok = XLogRecPtrIsInvalid(lsn) is trying to address that? If so, I don't see how. A page is returned with only the shared lock held if it's in VALID state before. Even if that were changed, this'd be a mightily subtle thing to do without a very fat comment. > @@ -447,7 +447,8 @@ SimpleLruReadPage(SlruCtl ctl, int pageno, bool write_ok, > * It is unspecified whether the lock will be shared or exclusive. > */ > int > -SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid) > +SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, bool write_ok, > + TransactionId xid) > { > SlruShared shared = ctl->shared; > int slotno; > @@ -460,7 +461,9 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid) > { > if (shared->page_number[slotno] == pageno && > shared->page_status[slotno] != SLRU_PAGE_EMPTY && > - shared->page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS) > + shared->page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS && > + (write_ok || > + shared->page_status[slotno] != SLRU_PAGE_WRITE_IN_PROGRESS)) > { > /* See comments for SlruRecentlyUsed macro */ > SlruRecentlyUsed(shared, slotno); > @@ -472,7 +475,7 @@ SimpleLruReadPage_ReadOnly(SlruCtl ctl, int pageno, TransactionId xid) > LWLockRelease(shared->ControlLock); > LWLockAcquire(shared->ControlLock, LW_EXCLUSIVE); > > - return SimpleLruReadPage(ctl, pageno, true, xid); > + return SimpleLruReadPage(ctl, pageno, write_ok, xid); > } This function's name would definitely need to be changed, and it'll need to be documented when/how it's safe to use write_ok = true. Same with slru.c's header. A patch like this will need far more changes. Every read and write from a page has to be guaranteed to only be done once, otherwise you can get 'effectively torn' values. That is, you can't just do static void TransactionIdSetStatusBit(TransactionId xid, XidStatus status, XLogRecPtr lsn, int slotno) ...char *byteptr;char byteval; .../* note this assumes exclusive access to the clog page */byteval = *byteptr;byteval &= ~(((1 << CLOG_BITS_PER_XACT) -1) << bshift);byteval |= (status << bshift);*byteptr = byteval; ... the compiler is entirely free to "optimize" away the byteval variable and do all these masking operations on memory! It can intermittenly write temporary values to byteval, because e.g. the register pressure is too high. To actually rely on single-copy-atomicity you have to enforce that these reads/writes can only happen. Leavout out some possible macro indirection that'd have to look like byteval = (volatile char *) byteptr; ... *(volatile char *) byteptr= byteval; some for TransactionIdGetStatus(), without the write side obviously. and stuff likeif (!XLogRecPtrIsInvalid(lsn)){ int lsnindex = GetLSNIndex(slotno, xid); if (ClogCtl->shared->group_lsn[lsnindex] < lsn) ClogCtl->shared->group_lsn[lsnindex] = lsn;} just aren't possible at all unless we drop a bunch of platforms. In essence I think this patch allows us to roughly benchmark whether and how benefitial fixing the contention around clog is, but it seems pretty far from something that can actually be applied. Greetings, Andres Freund
pgsql-hackers by date: