Thread: Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL
Amit Kapila wrote: > On Thursday, January 10, 2013 6:09 AM Josh Berkus wrote: >> Surely VACUUM FULL should rebuild the visibility map, and make >> tuples in the new relation all-visible, no? Certainly it seems odd to me that VACUUM FULL leaves the the table in a less-well maintained state in terms of visibility than a "normal" vacuum. VACUUM FULL should not need to be followed by another VACUUM. > I think it cannot made all visible. I don't think all tuples in the relation are necessarily visible to all transactions, but the ones which are should probably be flagged that way. > How about if any transaction in SSI mode is started before Vacuum > Full, should it see all tuples. There are no differences between visibility rules for serializable transactions (SSI) and repeatable read transactions. It should be based on whether any snapshots exist which can still see the tuple. -Kevin
On Sat, Jan 12, 2013 at 02:14:03PM -0500, Kevin Grittner wrote: > Amit Kapila wrote: > > On Thursday, January 10, 2013 6:09 AM Josh Berkus wrote: > > >> Surely VACUUM FULL should rebuild the visibility map, and make > >> tuples in the new relation all-visible, no? > > Certainly it seems odd to me that VACUUM FULL leaves the the table > in a less-well maintained state in terms of visibility than a > "normal" vacuum. VACUUM FULL should not need to be followed by > another VACUUM. > > > I think it cannot made all visible. > > I don't think all tuples in the relation are necessarily visible to > all transactions, but the ones which are should probably be flagged > that way. I have developed the attached proof-of-concept patch to fix the problem of having no visibility map after CLUSTER or VACUUM FULL. I tested with these queries: CREATE TABLE test(x INT PRIMARY KEY); INSERT INTO test VALUES (1); VACUUM FULL test; -- or CLUSTER SELECT relfilenode FROM pg_class WHERE relname = 'test'; relfilenode ------------- 16399 Then 'ls -l data/base/16384/16399*' to see the *_vm file. I am not sure how to test that the vm contents are valid. This patch is fairly tricky because our CLUSTER/VACUUM FULL behavior does not do writes through the shared buffer cache, as outlined in this C comment block: * We can't use the normal heap_insert function to insert into the new * heap, because heap_insert overwrites the visibility information. * We use a special-purpose raw_heap_insert function instead, which * is optimized for bulk inserting a lot of tuples, knowing that we have * exclusive access to the heap. raw_heap_insert builds new pages in * local storage. When a page is full, or at the end of the process, * we insert it to WAL as a single record and then write it to disk * directly through smgr. Note, however, that any data sent to the new * heap's TOAST table will go through the normal bufmgr. I originally tried to do this higher up in the stack but ran into problems because I couldn't access the new heap page so I had to do it at the non-shared-buffer page level. I reused the lazy vacuum routines. I need to know this is the right approach, and need to know what things are wrong or missing. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Attachment
On Wed, Nov 27, 2013 at 4:33 PM, Bruce Momjian <bruce@momjian.us> wrote: > On Sat, Jan 12, 2013 at 02:14:03PM -0500, Kevin Grittner wrote: >> Amit Kapila wrote: >> > On Thursday, January 10, 2013 6:09 AM Josh Berkus wrote: >> >> >> Surely VACUUM FULL should rebuild the visibility map, and make >> >> tuples in the new relation all-visible, no? >> >> Certainly it seems odd to me that VACUUM FULL leaves the the table >> in a less-well maintained state in terms of visibility than a >> "normal" vacuum. VACUUM FULL should not need to be followed by >> another VACUUM. >> >> > I think it cannot made all visible. >> >> I don't think all tuples in the relation are necessarily visible to >> all transactions, but the ones which are should probably be flagged >> that way. > > I have developed the attached proof-of-concept patch to fix the problem > of having no visibility map after CLUSTER or VACUUM FULL. I tested with > these queries: > > CREATE TABLE test(x INT PRIMARY KEY); > INSERT INTO test VALUES (1); > VACUUM FULL test; -- or CLUSTER > SELECT relfilenode FROM pg_class WHERE relname = 'test'; > relfilenode > ------------- > 16399 > > Then 'ls -l data/base/16384/16399*' to see the *_vm file. I am not sure > how to test that the vm contents are valid. > > This patch is fairly tricky because our CLUSTER/VACUUM FULL behavior > does not do writes through the shared buffer cache, as outlined in this > C comment block: > > * We can't use the normal heap_insert function to insert into the new > * heap, because heap_insert overwrites the visibility information. > * We use a special-purpose raw_heap_insert function instead, which > * is optimized for bulk inserting a lot of tuples, knowing that we have > * exclusive access to the heap. raw_heap_insert builds new pages in > * local storage. When a page is full, or at the end of the process, > * we insert it to WAL as a single record and then write it to disk > * directly through smgr. Note, however, that any data sent to the new > * heap's TOAST table will go through the normal bufmgr. > > I originally tried to do this higher up in the stack but ran into > problems because I couldn't access the new heap page so I had to do it > at the non-shared-buffer page level. I reused the lazy vacuum routines. > > I need to know this is the right approach, and need to know what things > are wrong or missing. The fact that you've needed to modify SetHintBits() to make this work is pretty nasty. I'm not exactly sure what to do about that, but it doesn't seem good. This patch also has the desirable but surprising consequence that hint bits will be set as a side effect of update_page_vm(), which means that you'd better do that BEFORE checksumming the page. I wonder if we ought to mark each page as all-visible in raw_heap_insert() when we first initialize it, and then clear the flag when we come across a tuple that isn't all-visible. We could try to set hint bits on the tuple before placing it on the page, too, though I'm not sure of the details. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Nov 28, 2013 at 05:17:07PM -0500, Robert Haas wrote: > > I need to know this is the right approach, and need to know what things > > are wrong or missing. > > The fact that you've needed to modify SetHintBits() to make this work > is pretty nasty. I'm not exactly sure what to do about that, but it > doesn't seem good. Hey, if that's the worst of the problems, I will be very happy. There is a sense that because we are not using the shared buffer cache and not WAL logging, we have to skip some stuff. > This patch also has the desirable but surprising consequence that hint > bits will be set as a side effect of update_page_vm(), which means > that you'd better do that BEFORE checksumming the page. Yes, I already saw that and fixed it in my git tree. > I wonder if we ought to mark each page as all-visible in > raw_heap_insert() when we first initialize it, and then clear the flag > when we come across a tuple that isn't all-visible. We could try to > set hint bits on the tuple before placing it on the page, too, though > I'm not sure of the details. I went with the per-page approach because I wanted to re-use the vacuum lazy function. Is there some other code that does this already? I am trying to avoid yet-another set of routines that would need to be maintained or could be buggy. This hit bit setting is tricky. And thanks much for the review! -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On Thu, Nov 28, 2013 at 05:38:05PM -0500, Bruce Momjian wrote: > > I wonder if we ought to mark each page as all-visible in > > raw_heap_insert() when we first initialize it, and then clear the flag > > when we come across a tuple that isn't all-visible. We could try to > > set hint bits on the tuple before placing it on the page, too, though > > I'm not sure of the details. > > I went with the per-page approach because I wanted to re-use the vacuum > lazy function. Is there some other code that does this already? I am > trying to avoid yet-another set of routines that would need to be > maintained or could be buggy. This hit bit setting is tricky. > > And thanks much for the review! So, should I put this in the next commit fest? I still have an unknown about the buffer number to use here: ! /* XXX use 0 or real offset? */ ! ItemPointerSet(&(tuple.t_self), BufferIsValid(buf) ? ! BufferGetBlockNumber(buf) : 0, offnum); Is everyone else OK with this approach? Updated patch attached. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Attachment
On Tue, Dec 3, 2013 at 11:25 AM, Bruce Momjian <bruce@momjian.us> wrote: > On Thu, Nov 28, 2013 at 05:38:05PM -0500, Bruce Momjian wrote: >> > I wonder if we ought to mark each page as all-visible in >> > raw_heap_insert() when we first initialize it, and then clear the flag >> > when we come across a tuple that isn't all-visible. We could try to >> > set hint bits on the tuple before placing it on the page, too, though >> > I'm not sure of the details. >> >> I went with the per-page approach because I wanted to re-use the vacuum >> lazy function. Is there some other code that does this already? I am >> trying to avoid yet-another set of routines that would need to be >> maintained or could be buggy. This hit bit setting is tricky. >> >> And thanks much for the review! > > So, should I put this in the next commit fest? +1. > I still have an unknown > about the buffer number to use here: > > ! /* XXX use 0 or real offset? */ > ! ItemPointerSet(&(tuple.t_self), BufferIsValid(buf) ? > ! BufferGetBlockNumber(buf) : 0, offnum); > > Is everyone else OK with this approach? Updated patch attached. I started looking at this a little more the other day but got bogged down in other things before I got through it all. I think we're going to want to revise this so that it doesn't go through the functions that current assume that they're always deal with a shared_buffer, but I haven't figured out exactly what the most elegant way of doing that is yet. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Dec 3, 2013 at 02:01:52PM -0500, Robert Haas wrote: > On Tue, Dec 3, 2013 at 11:25 AM, Bruce Momjian <bruce@momjian.us> wrote: > > On Thu, Nov 28, 2013 at 05:38:05PM -0500, Bruce Momjian wrote: > >> > I wonder if we ought to mark each page as all-visible in > >> > raw_heap_insert() when we first initialize it, and then clear the flag > >> > when we come across a tuple that isn't all-visible. We could try to > >> > set hint bits on the tuple before placing it on the page, too, though > >> > I'm not sure of the details. > >> > >> I went with the per-page approach because I wanted to re-use the vacuum > >> lazy function. Is there some other code that does this already? I am > >> trying to avoid yet-another set of routines that would need to be > >> maintained or could be buggy. This hit bit setting is tricky. > >> > >> And thanks much for the review! > > > > So, should I put this in the next commit fest? > > +1. OK, I added it to the January commit fest. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On 28 November 2013 22:17, Robert Haas <robertmhaas@gmail.com> wrote: > The fact that you've needed to modify SetHintBits() to make this work > is pretty nasty. I'm not exactly sure what to do about that, but it > doesn't seem good. That makes me feel bad also. I think we should be looking for some special case routines rather than piggybacking there. Wonderful to see you personally and directly addressing this concern and very supportive of this solution for 9.4. I wonder about a user option to let these routines wait until they are the oldest snapshot, so they can then set every page as all-visible without even bothering to check individual items. Or maybe a user option to set them all-visible even without actually being the oldest, as is possible with COPY FREEZE. Full lock on the table is a big thing, so a shame to waste the opportunity to set everything visible just because some long running task is still executing somewhere (but clearly not here otherwise we wouldn't have the lock). -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, Dec 3, 2013 at 11:25 AM, Bruce Momjian <bruce@momjian.us> wrote: > > Is everyone else OK with this approach? Updated patch attached. > Hi, I started to look at this patch and i found that it fails an assertion as soon as you run a VACUUM FULL after a lazy VACUUM even if those are on unrelated relations. For example in an assert-enabled build with the regression database run: VACUUM customer; [... insert here whatever commands you like or nothing at all ...] VACUUM FULL tenk1; TRAP: FailedAssertion("!(InRecovery || ( ((void) ((bool) ((! assert_enabled) || ! (!((heapBuf) <= NBuffers && (heapBuf) >= -NLocBuffer)) || (ExceptionalCondition("!((heapBuf) <= NBuffers && (heapBuf) >= -NLocBuffer)", ("FailedAssertion"), "visibilitymap.c", 260), 0)))), (heapBuf) != 0 ))", File: "visibilitymap.c", Line: 260) LOG: server process (PID 25842) was terminated by signal 6: Aborted DETAIL: Failed process was running: vacuum FULL customer; LOG: terminating any other active server processes trace: (gdb) bt #0 0x00007f9a3d00d475 in *__GI_raise (sig=<optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64 #1 0x00007f9a3d0106f0 in *__GI_abort () at abort.c:92 #2 0x0000000000777597 in ExceptionalCondition ( conditionName=conditionName@entry=0x7cd3b8 "!(InRecovery || ( ((void) ((bool) ((! assert_enabled) || ! (!((heapBuf) <= NBuffers && (heapBuf) >= -NLocBuffer)) || (ExceptionalCondition(\"!((heapBuf) <= NBuffers && (heapBuf) >= -NLocBuffer)\", (\"Fai"..., errorType=errorType@entry=0x7b0730 "FailedAssertion", fileName=fileName@entry=0x7cd105 "visibilitymap.c", lineNumber=lineNumber@entry=260) at assert.c:54 #3 0x00000000004a7d99 in visibilitymap_set (rel=rel@entry=0x7f9a3da56a00, heapBlk=heapBlk@entry=0, heapBuf=heapBuf@entry=0, recptr=recptr@entry=0, vmBuf=220, cutoff_xid=2) at visibilitymap.c:260 #4 0x00000000004a33e5 in update_page_vm (relation=0x7f9a3da56a00, page=page@entry=0x1868b18 "", blkno=0) at rewriteheap.c:702 #5 0x00000000004a3668 in raw_heap_insert (state=state@entry=0x1849e98, tup=tup@entry=0x184f208) at rewriteheap.c:641 #6 0x00000000004a3b8b in rewrite_heap_tuple (state=state@entry=0x1849e98, old_tuple=old_tuple@entry=0x1852a50, new_tuple=new_tuple@entry=0x184f208) at rewriteheap.c:433 #7 0x000000000055c373 in reform_and_rewrite_tuple (tuple=tuple@entry=0x1852a50, oldTupDesc=oldTupDesc@entry=0x7f9a3da4d350, newTupDesc=newTupDesc@entry=0x7f9a3da599a8, values=values@entry=0x1852f40, isnull=isnull@entry=0x184f920 "", newRelHasOids=1 '\001', rwstate=rwstate@entry=0x1849e98) at cluster.c:1670 -- Jaime Casanova www.2ndQuadrant.com Professional PostgreSQL: Soporte 24x7 y capacitación Phone: +593 4 5107566 Cell: +593 987171157
On Fri, Jan 24, 2014 at 04:52:55PM -0500, Jaime Casanova wrote: > On Tue, Dec 3, 2013 at 11:25 AM, Bruce Momjian <bruce@momjian.us> wrote: > > > > Is everyone else OK with this approach? Updated patch attached. > > > > Hi, > > I started to look at this patch and i found that it fails an assertion > as soon as you run a VACUUM FULL after a lazy VACUUM even if those are > on unrelated relations. For example in an assert-enabled build with > the regression database run: > > VACUUM customer; > [... insert here whatever commands you like or nothing at all ...] > VACUUM FULL customer; Wow, thanks for the testing. You are right that I had two bugs, both in visibilitymap_set(). First, I made setting heapBuf optional, but forgot to remove the Assert check now that it was optional. Second, I passed InvalidBlockNumber instead of InvalidBuffer when calling visibilitymap_set(). Both are fixed in the attached patch. Thanks for the report. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Attachment
Hi, > *************** end_heap_rewrite(RewriteState state) > *** 281,286 **** > --- 284,290 ---- > true); > RelationOpenSmgr(state->rs_new_rel); > > + update_page_vm(state->rs_new_rel, state->rs_buffer, state->rs_blockno); > PageSetChecksumInplace(state->rs_buffer, state->rs_blockno); > > smgrextend(state->rs_new_rel->rd_smgr, MAIN_FORKNUM, state->rs_blockno, > *************** raw_heap_insert(RewriteState state, Heap > *** 633,638 **** > --- 637,643 ---- > */ > RelationOpenSmgr(state->rs_new_rel); > > + update_page_vm(state->rs_new_rel, page, state->rs_blockno); > PageSetChecksumInplace(page, state->rs_blockno); > > smgrextend(state->rs_new_rel->rd_smgr, MAIN_FORKNUM, I think those two cases should be combined. > + static void > + update_page_vm(Relation relation, Page page, BlockNumber blkno) > + { > + Buffer vmbuffer = InvalidBuffer; > + TransactionId visibility_cutoff_xid; > + > + visibilitymap_pin(relation, blkno, &vmbuffer); > + Assert(BufferIsValid(vmbuffer)); > + > + if (!visibilitymap_test(relation, blkno, &vmbuffer) && > + heap_page_is_all_visible(relation, InvalidBuffer, page, > + &visibility_cutoff_xid)) > + { > + PageSetAllVisible(page); > + visibilitymap_set(relation, blkno, InvalidBuffer, > + InvalidXLogRecPtr, vmbuffer, visibility_cutoff_xid); > + } > + ReleaseBuffer(vmbuffer); > + } How could visibilitymap_test() be true here? > diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c > new file mode 100644 > index 899ffac..3e7546b > *** a/src/backend/access/heap/visibilitymap.c > --- b/src/backend/access/heap/visibilitymap.c > *************** visibilitymap_set(Relation rel, BlockNum > *** 257,263 **** > #endif > > Assert(InRecovery || XLogRecPtrIsInvalid(recptr)); > - Assert(InRecovery || BufferIsValid(heapBuf)); > > /* Check that we have the right heap page pinned, if present */ > if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk) > --- 257,262 ---- > *************** visibilitymap_set(Relation rel, BlockNum > *** 278,284 **** > map[mapByte] |= (1 << mapBit); > MarkBufferDirty(vmBuf); > > ! if (RelationNeedsWAL(rel)) > { > if (XLogRecPtrIsInvalid(recptr)) > { > --- 277,283 ---- > map[mapByte] |= (1 << mapBit); > MarkBufferDirty(vmBuf); > > ! if (RelationNeedsWAL(rel) && BufferIsValid(heapBuf)) > { > if (XLogRecPtrIsInvalid(recptr)) > { At the very least this needs to revise visibilitymap_set's comment about the requirement of passing a buffer unless !InRecovery. Also, how is this going to work with replication if you're explicitly not WAL logging this? > *************** vac_cmp_itemptr(const void *left, const > *** 1730,1743 **** > * transactions. Also return the visibility_cutoff_xid which is the highest > * xmin amongst the visible tuples. > */ > ! static bool > ! heap_page_is_all_visible(Relation rel, Buffer buf, TransactionId *visibility_cutoff_xid) > { > - Page page = BufferGetPage(buf); > OffsetNumber offnum, > maxoff; > bool all_visible = true; > > *visibility_cutoff_xid = InvalidTransactionId; > > /* > --- 1728,1744 ---- > * transactions. Also return the visibility_cutoff_xid which is the highest > * xmin amongst the visible tuples. > */ > ! bool > ! heap_page_is_all_visible(Relation rel, Buffer buf, Page page, > ! TransactionId *visibility_cutoff_xid) > { > OffsetNumber offnum, > maxoff; > bool all_visible = true; > > + if (BufferIsValid(buf)) > + page = BufferGetPage(buf); > + > *visibility_cutoff_xid = InvalidTransactionId; > > /* > *************** heap_page_is_all_visible(Relation rel, B > *** 1758,1764 **** > if (!ItemIdIsUsed(itemid) || ItemIdIsRedirected(itemid)) > continue; > > ! ItemPointerSet(&(tuple.t_self), BufferGetBlockNumber(buf), offnum); > > /* > * Dead line pointers can have index pointers pointing to them. So > --- 1759,1767 ---- > if (!ItemIdIsUsed(itemid) || ItemIdIsRedirected(itemid)) > continue; > > ! /* XXX use 0 or real offset? */ > ! ItemPointerSet(&(tuple.t_self), BufferIsValid(buf) ? > ! BufferGetBlockNumber(buf) : 0, offnum); How about passing in the page and block number from the outside and not dealing with a buffer in here at all? > /* > * Dead line pointers can have index pointers pointing to them. So > diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c > new file mode 100644 > index f626755..b37ecc4 > *** a/src/backend/utils/time/tqual.c > --- b/src/backend/utils/time/tqual.c > *************** static inline void > *** 108,113 **** > --- 108,117 ---- > SetHintBits(HeapTupleHeader tuple, Buffer buffer, > uint16 infomask, TransactionId xid) > { > + /* we might not have a buffer if we are doing raw_heap_insert() */ > + if (!BufferIsValid(buffer)) > + return; > + > if (TransactionIdIsValid(xid)) > { > /* NB: xid must be known committed here! */ This looks seriously wrong to me. I think the whole idea of doing this in private memory is bad. The visibility routines aren't written for that, and the above is just one instance of that, and I am far from convinced it's the only one. So you either need to work out how to rely on the visibility checking done in cluster.c, or you need to modify rewriteheap.c to write via shared_buffers. Greetings, Andres Freund
On Sat, Feb 15, 2014 at 04:16:40PM +0100, Andres Freund wrote: > Hi, > > > *************** end_heap_rewrite(RewriteState state) > > *** 281,286 **** > > --- 284,290 ---- > > true); > > RelationOpenSmgr(state->rs_new_rel); > > > > + update_page_vm(state->rs_new_rel, state->rs_buffer, state->rs_blockno); > > PageSetChecksumInplace(state->rs_buffer, state->rs_blockno); > > > > smgrextend(state->rs_new_rel->rd_smgr, MAIN_FORKNUM, state->rs_blockno, > > *************** raw_heap_insert(RewriteState state, Heap > > *** 633,638 **** > > --- 637,643 ---- > > */ > > RelationOpenSmgr(state->rs_new_rel); > > > > + update_page_vm(state->rs_new_rel, page, state->rs_blockno); > > PageSetChecksumInplace(page, state->rs_blockno); > > > > smgrextend(state->rs_new_rel->rd_smgr, MAIN_FORKNUM, > > I think those two cases should be combined. Uh, what I did was to pair the new update_page_vm with the existing PageSetChecksumInplace calls, figuring if we needed a checksum before we wrote the page we would need a update_page_vm too. Is that incorrect? It is that _last_ page write in the second instance. > > + static void > > + update_page_vm(Relation relation, Page page, BlockNumber blkno) > > + { > > + Buffer vmbuffer = InvalidBuffer; > > + TransactionId visibility_cutoff_xid; > > + > > + visibilitymap_pin(relation, blkno, &vmbuffer); > > + Assert(BufferIsValid(vmbuffer)); > > + > > + if (!visibilitymap_test(relation, blkno, &vmbuffer) && > > + heap_page_is_all_visible(relation, InvalidBuffer, page, > > + &visibility_cutoff_xid)) > > + { > > + PageSetAllVisible(page); > > + visibilitymap_set(relation, blkno, InvalidBuffer, > > + InvalidXLogRecPtr, vmbuffer, visibility_cutoff_xid); > > + } > > + ReleaseBuffer(vmbuffer); > > + } > > How could visibilitymap_test() be true here? Oh, you are right that I can only hit that once per page; test removed. > > diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c > > new file mode 100644 > > index 899ffac..3e7546b > > *** a/src/backend/access/heap/visibilitymap.c > > --- b/src/backend/access/heap/visibilitymap.c > > *************** visibilitymap_set(Relation rel, BlockNum > > *** 257,263 **** > > #endif > > > > Assert(InRecovery || XLogRecPtrIsInvalid(recptr)); > > - Assert(InRecovery || BufferIsValid(heapBuf)); > > > > /* Check that we have the right heap page pinned, if present */ > > if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk) > > --- 257,262 ---- > > *************** visibilitymap_set(Relation rel, BlockNum > > *** 278,284 **** > > map[mapByte] |= (1 << mapBit); > > MarkBufferDirty(vmBuf); > > > > ! if (RelationNeedsWAL(rel)) > > { > > if (XLogRecPtrIsInvalid(recptr)) > > { > > --- 277,283 ---- > > map[mapByte] |= (1 << mapBit); > > MarkBufferDirty(vmBuf); > > > > ! if (RelationNeedsWAL(rel) && BufferIsValid(heapBuf)) > > { > > if (XLogRecPtrIsInvalid(recptr)) > > { > > At the very least this needs to revise visibilitymap_set's comment about > the requirement of passing a buffer unless !InRecovery. Oh, good point, comment block updated. > Also, how is this going to work with replication if you're explicitly > not WAL logging this? Uh, I assumed that using the existing functions would handle this. If not, I don't know the answer. > > *************** vac_cmp_itemptr(const void *left, const > > *** 1730,1743 **** > > * transactions. Also return the visibility_cutoff_xid which is the highest > > * xmin amongst the visible tuples. > > */ > > ! static bool > > ! heap_page_is_all_visible(Relation rel, Buffer buf, TransactionId *visibility_cutoff_xid) > > { > > - Page page = BufferGetPage(buf); > > OffsetNumber offnum, > > maxoff; > > bool all_visible = true; > > > > *visibility_cutoff_xid = InvalidTransactionId; > > > > /* > > --- 1728,1744 ---- > > * transactions. Also return the visibility_cutoff_xid which is the highest > > * xmin amongst the visible tuples. > > */ > > ! bool > > ! heap_page_is_all_visible(Relation rel, Buffer buf, Page page, > > ! TransactionId *visibility_cutoff_xid) > > { > > OffsetNumber offnum, > > maxoff; > > bool all_visible = true; > > > > + if (BufferIsValid(buf)) > > + page = BufferGetPage(buf); > > + > > *visibility_cutoff_xid = InvalidTransactionId; > > > > /* > > *************** heap_page_is_all_visible(Relation rel, B > > *** 1758,1764 **** > > if (!ItemIdIsUsed(itemid) || ItemIdIsRedirected(itemid)) > > continue; > > > > ! ItemPointerSet(&(tuple.t_self), BufferGetBlockNumber(buf), offnum); > > > > /* > > * Dead line pointers can have index pointers pointing to them. So > > --- 1759,1767 ---- > > if (!ItemIdIsUsed(itemid) || ItemIdIsRedirected(itemid)) > > continue; > > > > ! /* XXX use 0 or real offset? */ > > ! ItemPointerSet(&(tuple.t_self), BufferIsValid(buf) ? > > ! BufferGetBlockNumber(buf) : 0, offnum); > > How about passing in the page and block number from the outside and not > dealing with a buffer in here at all? I would love to do that but HeapTupleSatisfiesVacuum() requires a buffer, though you can (with my patch) optionally not supply one, meaning if I passed in just the block number I would still need to generate a buffer pointer. > > /* > > * Dead line pointers can have index pointers pointing to them. So > > diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c > > new file mode 100644 > > index f626755..b37ecc4 > > *** a/src/backend/utils/time/tqual.c > > --- b/src/backend/utils/time/tqual.c > > *************** static inline void > > *** 108,113 **** > > --- 108,117 ---- > > SetHintBits(HeapTupleHeader tuple, Buffer buffer, > > uint16 infomask, TransactionId xid) > > { > > + /* we might not have a buffer if we are doing raw_heap_insert() */ > > + if (!BufferIsValid(buffer)) > > + return; > > + > > if (TransactionIdIsValid(xid)) > > { > > /* NB: xid must be known committed here! */ > > This looks seriously wrong to me. > > I think the whole idea of doing this in private memory is bad. The > visibility routines aren't written for that, and the above is just one > instance of that, and I am far from convinced it's the only one. So you > either need to work out how to rely on the visibility checking done in > cluster.c, or you need to modify rewriteheap.c to write via > shared_buffers. I don't think we can change rewriteheap.c to use shared buffers --- that code was written to do things in private memory for performance reasons and I don't think setting hit bits justifies changing that. Can you be more specific about the cluster.c idea? I see copy_heap_data() in cluster.c calling HeapTupleSatisfiesVacuum() with a 'buf' just like the code I am working in. Based on Robert's feedback a few months ago I suspected I would need help completing this patch --- now I am sure. Updated patch attached. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Attachment
On 2014-02-15 12:50:14 -0500, Bruce Momjian wrote: > On Sat, Feb 15, 2014 at 04:16:40PM +0100, Andres Freund wrote: > > Hi, > > > > > *************** end_heap_rewrite(RewriteState state) > > > *** 281,286 **** > > > --- 284,290 ---- > > > true); > > > RelationOpenSmgr(state->rs_new_rel); > > > > > > + update_page_vm(state->rs_new_rel, state->rs_buffer, state->rs_blockno); > > > PageSetChecksumInplace(state->rs_buffer, state->rs_blockno); > > > > > > smgrextend(state->rs_new_rel->rd_smgr, MAIN_FORKNUM, state->rs_blockno, > > > *************** raw_heap_insert(RewriteState state, Heap > > > *** 633,638 **** > > > --- 637,643 ---- > > > */ > > > RelationOpenSmgr(state->rs_new_rel); > > > > > > + update_page_vm(state->rs_new_rel, page, state->rs_blockno); > > > PageSetChecksumInplace(page, state->rs_blockno); > > > > > > smgrextend(state->rs_new_rel->rd_smgr, MAIN_FORKNUM, > > > > I think those two cases should be combined. > > Uh, what I did was to pair the new update_page_vm with the existing > PageSetChecksumInplace calls, figuring if we needed a checksum before we > wrote the page we would need a update_page_vm too. Is that incorrect? > It is that _last_ page write in the second instance. What I mean is that there should be a new function handling the writeout of a page. > > > diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c > > > new file mode 100644 > > > index 899ffac..3e7546b > > > *** a/src/backend/access/heap/visibilitymap.c > > > --- b/src/backend/access/heap/visibilitymap.c > > > *************** visibilitymap_set(Relation rel, BlockNum > > > *** 257,263 **** > > > #endif > > > > > > Assert(InRecovery || XLogRecPtrIsInvalid(recptr)); > > > - Assert(InRecovery || BufferIsValid(heapBuf)); > > > > > > /* Check that we have the right heap page pinned, if present */ > > > if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk) > > > --- 257,262 ---- > > > *************** visibilitymap_set(Relation rel, BlockNum > > > *** 278,284 **** > > > map[mapByte] |= (1 << mapBit); > > > MarkBufferDirty(vmBuf); > > > > > > ! if (RelationNeedsWAL(rel)) > > > { > > > if (XLogRecPtrIsInvalid(recptr)) > > > { > > > --- 277,283 ---- > > > map[mapByte] |= (1 << mapBit); > > > MarkBufferDirty(vmBuf); > > > > > > ! if (RelationNeedsWAL(rel) && BufferIsValid(heapBuf)) > > > { > > > if (XLogRecPtrIsInvalid(recptr)) > > > { > > > > At the very least this needs to revise visibilitymap_set's comment about > > the requirement of passing a buffer unless !InRecovery. > > Oh, good point, comment block updated. > > > Also, how is this going to work with replication if you're explicitly > > not WAL logging this? > > Uh, I assumed that using the existing functions would handle this. If > not, I don't know the answer. Err. Read the block above again, where you added the BufferIsValid(heapBuf) check. That's exactly the part doing the WAL logging. > > > *** a/src/backend/utils/time/tqual.c > > > --- b/src/backend/utils/time/tqual.c > > > *************** static inline void > > > *** 108,113 **** > > > --- 108,117 ---- > > > SetHintBits(HeapTupleHeader tuple, Buffer buffer, > > > uint16 infomask, TransactionId xid) > > > { > > > + /* we might not have a buffer if we are doing raw_heap_insert() */ > > > + if (!BufferIsValid(buffer)) > > > + return; > > > + > > > if (TransactionIdIsValid(xid)) > > > { > > > /* NB: xid must be known committed here! */ > > > > This looks seriously wrong to me. > > > > I think the whole idea of doing this in private memory is bad. The > > visibility routines aren't written for that, and the above is just one > > instance of that, and I am far from convinced it's the only one. So you > > either need to work out how to rely on the visibility checking done in > > cluster.c, or you need to modify rewriteheap.c to write via > > shared_buffers. > > I don't think we can change rewriteheap.c to use shared buffers --- that > code was written to do things in private memory for performance reasons > and I don't think setting hit bits justifies changing that. I don't think that's necessarily contradictory. You'd need to use a ringbuffer strategy for IO, like e.g. VACUUM does. But I admit it's not a insignificant amount of work, and it might need some adjustements in some places. > Can you be more specific about the cluster.c idea? I see > copy_heap_data() in cluster.c calling HeapTupleSatisfiesVacuum() with a > 'buf' just like the code I am working in. Yes, it does. But in contrast to your patch it does so on the *origin* buffer. Which is in shared memory. The idea would be to modify rewrite_heap_tuple() to get a new parameter informing it it about the tuple's visibility. That'd allow you to avoid doing any additional visibility checks. Unfortunately that would still not fix the problem that visibilitymap_set requires a real buffer to be passed in. If you're going that route, you'll need to make a bit more sweeping changes. You'd need to add new blockno parameter and a BufferIsValid() check to the right places in log_heap_visible(). Pretty ugly if you ask me... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Feb 15, 2014 at 07:08:40PM +0100, Andres Freund wrote: > > Can you be more specific about the cluster.c idea? I see > > copy_heap_data() in cluster.c calling HeapTupleSatisfiesVacuum() with a > > 'buf' just like the code I am working in. > > Yes, it does. But in contrast to your patch it does so on the *origin* > buffer. Which is in shared memory. > The idea would be to modify rewrite_heap_tuple() to get a new parameter > informing it it about the tuple's visibility. That'd allow you to avoid > doing any additional visibility checks. > > Unfortunately that would still not fix the problem that > visibilitymap_set requires a real buffer to be passed in. If you're > going that route, you'll need to make a bit more sweeping changes. You'd > need to add new blockno parameter and a BufferIsValid() check to the > right places in log_heap_visible(). Pretty ugly if you ask me... Thank you for the thorough review. Unless someone else can complete this, I think it should be marked as returned with feedback. I don't think I am going to learn enough to complete this during the commit-fest. Since learning of the limitations in setting vmmap bits for index-only scans in October, I have been unable to improve VACUUM/CLUSTER, and unable to improve autovacuum --- a double failure. I guess there is always PG 9.5. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On 2014-02-15 21:34:15 -0500, Bruce Momjian wrote: > Thank you for the thorough review. Unless someone else can complete > this, I think it should be marked as returned with feedback. I don't > think I am going to learn enough to complete this during the > commit-fest. Agreed. Marked it as such. > I guess there is always PG 9.5. I sure hope so ;) Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 1/24/14, 3:52 PM, Jaime Casanova wrote: > On Tue, Dec 3, 2013 at 11:25 AM, Bruce Momjian<bruce@momjian.us> wrote: >> > >> >Is everyone else OK with this approach? Updated patch attached. >> > > Hi, > > I started to look at this patch and i found that it fails an assertion > as soon as you run a VACUUM FULL after a lazy VACUUM even if those are > on unrelated relations. For example in an assert-enabled build with > the regression database run: > > VACUUM customer; > [... insert here whatever commands you like or nothing at all ...] > VACUUM FULL tenk1; Is anyone else confused/concerned that regression testing didn't pick this up? The vacuum.sql test does not test lazy vacuumat all, and I can't seem to find any other tests that test lazy vacuum either... -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL
From
Heikki Linnakangas
Date:
On 02/16/2014 10:19 PM, Jim Nasby wrote: > On 1/24/14, 3:52 PM, Jaime Casanova wrote: >> On Tue, Dec 3, 2013 at 11:25 AM, Bruce Momjian<bruce@momjian.us> wrote: >>>> >>>> Is everyone else OK with this approach? Updated patch attached. >>>> >> Hi, >> >> I started to look at this patch and i found that it fails an assertion >> as soon as you run a VACUUM FULL after a lazy VACUUM even if those are >> on unrelated relations. For example in an assert-enabled build with >> the regression database run: >> >> VACUUM customer; >> [... insert here whatever commands you like or nothing at all ...] >> VACUUM FULL tenk1; > > Is anyone else confused/concerned that regression testing didn't pick this up? I wouldn't expect that to be explicitly tested - it's pretty unexpected that they work independently but not when run one after another. But it's a bit surprising that we don't happen to do that combination in any of the tests by pure chance. > The vacuum.sql test does not test lazy vacuum at all, and I can't seem to find any other tests that test lazy vacuum either... There are several lazy vacuums in the regression suite: sql/alter_table.sql:vacuum analyze atacc1(a); sql/alter_table.sql:vacuum analyze atacc1("........pg.dropped.1........"); sql/hs_standby_disallowed.sql:VACUUM hs2; sql/indirect_toast.sql:VACUUM FREEZE toasttest; sql/indirect_toast.sql:VACUUM FREEZE toasttest; sql/matview.sql:VACUUM ANALYZE hogeview; sql/numeric_big.sql:VACUUM ANALYZE num_exp_add; sql/numeric_big.sql:VACUUM ANALYZE num_exp_sub; sql/numeric_big.sql:VACUUM ANALYZE num_exp_div; sql/numeric_big.sql:VACUUM ANALYZE num_exp_mul; sql/numeric_big.sql:VACUUM ANALYZE num_exp_sqrt; sql/numeric_big.sql:VACUUM ANALYZE num_exp_ln; sql/numeric_big.sql:VACUUM ANALYZE num_exp_log10; sql/numeric_big.sql:VACUUM ANALYZE num_exp_power_10_ln; sql/numeric.sql:VACUUM ANALYZE num_exp_add; sql/numeric.sql:VACUUM ANALYZE num_exp_sub; sql/numeric.sql:VACUUM ANALYZE num_exp_div; sql/numeric.sql:VACUUM ANALYZE num_exp_mul; sql/numeric.sql:VACUUM ANALYZE num_exp_sqrt; sql/numeric.sql:VACUUM ANALYZE num_exp_ln; sql/numeric.sql:VACUUM ANALYZE num_exp_log10; sql/numeric.sql:VACUUM ANALYZE num_exp_power_10_ln; sql/sanity_check.sql:VACUUM; sql/without_oid.sql:VACUUM ANALYZE wi; sql/without_oid.sql:VACUUM ANALYZE wo; Most of those commands are there to analyze, rather than vacuum, but lazy vacuum is definitely exercised by the regression tests. I agree it's quite surprising that vacuum.sql doesn't actually perform any lazy vacuums. - Heikki