Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL |
Date | |
Msg-id | 20140215180840.GK19470@alap3.anarazel.de Whole thread Raw |
In response to | Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL (Bruce Momjian <bruce@momjian.us>) |
Responses |
Re: 9.2.1 & index-only scans : abnormal heap fetches
after VACUUM FULL
|
List | pgsql-hackers |
On 2014-02-15 12:50:14 -0500, Bruce Momjian wrote: > On Sat, Feb 15, 2014 at 04:16:40PM +0100, Andres Freund wrote: > > Hi, > > > > > *************** end_heap_rewrite(RewriteState state) > > > *** 281,286 **** > > > --- 284,290 ---- > > > true); > > > RelationOpenSmgr(state->rs_new_rel); > > > > > > + update_page_vm(state->rs_new_rel, state->rs_buffer, state->rs_blockno); > > > PageSetChecksumInplace(state->rs_buffer, state->rs_blockno); > > > > > > smgrextend(state->rs_new_rel->rd_smgr, MAIN_FORKNUM, state->rs_blockno, > > > *************** raw_heap_insert(RewriteState state, Heap > > > *** 633,638 **** > > > --- 637,643 ---- > > > */ > > > RelationOpenSmgr(state->rs_new_rel); > > > > > > + update_page_vm(state->rs_new_rel, page, state->rs_blockno); > > > PageSetChecksumInplace(page, state->rs_blockno); > > > > > > smgrextend(state->rs_new_rel->rd_smgr, MAIN_FORKNUM, > > > > I think those two cases should be combined. > > Uh, what I did was to pair the new update_page_vm with the existing > PageSetChecksumInplace calls, figuring if we needed a checksum before we > wrote the page we would need a update_page_vm too. Is that incorrect? > It is that _last_ page write in the second instance. What I mean is that there should be a new function handling the writeout of a page. > > > diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c > > > new file mode 100644 > > > index 899ffac..3e7546b > > > *** a/src/backend/access/heap/visibilitymap.c > > > --- b/src/backend/access/heap/visibilitymap.c > > > *************** visibilitymap_set(Relation rel, BlockNum > > > *** 257,263 **** > > > #endif > > > > > > Assert(InRecovery || XLogRecPtrIsInvalid(recptr)); > > > - Assert(InRecovery || BufferIsValid(heapBuf)); > > > > > > /* Check that we have the right heap page pinned, if present */ > > > if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk) > > > --- 257,262 ---- > > > *************** visibilitymap_set(Relation rel, BlockNum > > > *** 278,284 **** > > > map[mapByte] |= (1 << mapBit); > > > MarkBufferDirty(vmBuf); > > > > > > ! if (RelationNeedsWAL(rel)) > > > { > > > if (XLogRecPtrIsInvalid(recptr)) > > > { > > > --- 277,283 ---- > > > map[mapByte] |= (1 << mapBit); > > > MarkBufferDirty(vmBuf); > > > > > > ! if (RelationNeedsWAL(rel) && BufferIsValid(heapBuf)) > > > { > > > if (XLogRecPtrIsInvalid(recptr)) > > > { > > > > At the very least this needs to revise visibilitymap_set's comment about > > the requirement of passing a buffer unless !InRecovery. > > Oh, good point, comment block updated. > > > Also, how is this going to work with replication if you're explicitly > > not WAL logging this? > > Uh, I assumed that using the existing functions would handle this. If > not, I don't know the answer. Err. Read the block above again, where you added the BufferIsValid(heapBuf) check. That's exactly the part doing the WAL logging. > > > *** a/src/backend/utils/time/tqual.c > > > --- b/src/backend/utils/time/tqual.c > > > *************** static inline void > > > *** 108,113 **** > > > --- 108,117 ---- > > > SetHintBits(HeapTupleHeader tuple, Buffer buffer, > > > uint16 infomask, TransactionId xid) > > > { > > > + /* we might not have a buffer if we are doing raw_heap_insert() */ > > > + if (!BufferIsValid(buffer)) > > > + return; > > > + > > > if (TransactionIdIsValid(xid)) > > > { > > > /* NB: xid must be known committed here! */ > > > > This looks seriously wrong to me. > > > > I think the whole idea of doing this in private memory is bad. The > > visibility routines aren't written for that, and the above is just one > > instance of that, and I am far from convinced it's the only one. So you > > either need to work out how to rely on the visibility checking done in > > cluster.c, or you need to modify rewriteheap.c to write via > > shared_buffers. > > I don't think we can change rewriteheap.c to use shared buffers --- that > code was written to do things in private memory for performance reasons > and I don't think setting hit bits justifies changing that. I don't think that's necessarily contradictory. You'd need to use a ringbuffer strategy for IO, like e.g. VACUUM does. But I admit it's not a insignificant amount of work, and it might need some adjustements in some places. > Can you be more specific about the cluster.c idea? I see > copy_heap_data() in cluster.c calling HeapTupleSatisfiesVacuum() with a > 'buf' just like the code I am working in. Yes, it does. But in contrast to your patch it does so on the *origin* buffer. Which is in shared memory. The idea would be to modify rewrite_heap_tuple() to get a new parameter informing it it about the tuple's visibility. That'd allow you to avoid doing any additional visibility checks. Unfortunately that would still not fix the problem that visibilitymap_set requires a real buffer to be passed in. If you're going that route, you'll need to make a bit more sweeping changes. You'd need to add new blockno parameter and a BufferIsValid() check to the right places in log_heap_visible(). Pretty ugly if you ask me... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: