Re: [Patch] Optimize dropping of relation buffers using dlist - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: [Patch] Optimize dropping of relation buffers using dlist |
Date | |
Msg-id | 20201118181342.h65kc3eqluzwfrgn@alap3.anarazel.de Whole thread Raw |
In response to | Re: [Patch] Optimize dropping of relation buffers using dlist (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: [Patch] Optimize dropping of relation buffers using dlist
RE: [Patch] Optimize dropping of relation buffers using dlist |
List | pgsql-hackers |
Hi, On 2020-11-18 17:34:31 +0530, Amit Kapila wrote: > Yeah, that won't be a bad idea especially because the patch being > discussed in the thread you referred is still in an exploratory phase. > I haven't tested or done a detailed review but I feel there shouldn't > be many problems if we agree on the approach. > > Thomas/others, do you have objections to proceeding here? It shouldn't > be a big problem to change the code in this area even if we get the > shared relation size stuff in. I'm doubtful the patches as is are a good idea / address the correctness concerns to a sufficient degree. One important part of that is that the patch includes pretty much zero explanations about why it is safe what it is doing. Something having being discussed deep in this thread won't help us in a few months, not to speak of a few years. The commit message says: > While recovery, we can get a reliable cached value of nblocks for > supplied relation's fork, and it's safe because there are no other > processes but the startup process that changes the relation size > during recovery. and the code only applies the optimized scan only when cached: + /* + * Look up the buffers in the hashtable and drop them if the block size + * is already cached and the total blocks to be invalidated is below the + * full scan threshold. Otherwise, give up the optimization. + */ + if (cached && nBlocksToInvalidate < BUF_DROP_FULL_SCAN_THRESHOLD) This seems quite narrow to me. There's plenty cases where there's no cached relation size in the startup process, restricting the availability of this optimization as written. Where do we even use DropRelFileNodeBuffers() in recovery? The most common path is DropRelationFiles()->smgrdounlinkall()->DropRelFileNodesAllBuffers(), which 3/4 doesn't address and 4/4 doesn't mention. 4/4 seems to address DropRelationFiles(), but only talks about TRUNCATE? I'm also worried about the cases where this could cause buffers left in the buffer pool, without a crosscheck like Thomas' patch would allow to add. Obviously other processes can dirty buffers in hot_standby, so any leftover buffer could have bad consequences. I also don't get why 4/4 would be a good idea on its own. It uses BUF_DROP_FULL_SCAN_THRESHOLD to guard FindAndDropRelFileNodeBuffers() on a per relation basis. But since DropRelFileNodesAllBuffers() can be used for many relations at once, this could end up doing BUF_DROP_FULL_SCAN_THRESHOLD - 1 lookups a lot of times, once for each of nnodes relations? Also, how is 4/4 safe - this is outside of recovery too? Smaller comment: +static void +FindAndDropRelFileNodeBuffers(RelFileNode rnode, ForkNumber *forkNum, int nforks, + BlockNumber *nForkBlocks, BlockNumber *firstDelBlock) ... + /* Check that it is in the buffer pool. If not, do nothing. */ + LWLockAcquire(bufPartitionLock, LW_SHARED); + buf_id = BufTableLookup(&bufTag, bufHash); ... + bufHdr = GetBufferDescriptor(buf_id); + + buf_state = LockBufHdr(bufHdr); + + if (RelFileNodeEquals(bufHdr->tag.rnode, rnode) && + bufHdr->tag.forkNum == forkNum[i] && + bufHdr->tag.blockNum >= firstDelBlock[i]) + InvalidateBuffer(bufHdr); /* releases spinlock */ + else + UnlockBufHdr(bufHdr, buf_state); a I'm a bit confused about the check here. We hold a buffer partition lock, and have done a lookup in the mapping table. Why are we then rechecking the relfilenode/fork/blocknum? And why are we doing so holding the buffer header lock, which is essentially a spinlock, so should only ever be held for very short portions? This looks like it's copying logic from DropRelFileNodeBuffers() etc, but there the situation is different: We haven't done a buffer mapping lookup, and we don't hold a partition lock! Greetings, Andres Freund
pgsql-hackers by date: