Thread: Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL

Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL

From

"Kevin Grittner"

Date:

12 January 2013, 19:14:11

Amit Kapila wrote:
> On Thursday, January 10, 2013 6:09 AM Josh Berkus wrote:

>> Surely VACUUM FULL should rebuild the visibility map, and make
>> tuples in the new relation all-visible, no?

Certainly it seems odd to me that VACUUM FULL leaves the the table
in a less-well maintained state in terms of visibility than a
"normal" vacuum. VACUUM FULL should not need to be followed by
another VACUUM.

> I think it cannot made all visible.

I don't think all tuples in the relation are necessarily visible to
all transactions, but the ones which are should probably be flagged
that way.

> How about if any transaction in SSI mode is started before Vacuum
> Full, should it see all tuples.

There are no differences between visibility rules for serializable
transactions (SSI) and repeatable read transactions. It should be
based on whether any snapshots exist which can still see the tuple.

-Kevin

Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL

From

Bruce Momjian

Date:

27 November 2013, 21:33:33

On Sat, Jan 12, 2013 at 02:14:03PM -0500, Kevin Grittner wrote:
> Amit Kapila wrote:
> > On Thursday, January 10, 2013 6:09 AM Josh Berkus wrote:
>
> >> Surely VACUUM FULL should rebuild the visibility map, and make
> >> tuples in the new relation all-visible, no?
>
> Certainly it seems odd to me that VACUUM FULL leaves the the table
> in a less-well maintained state in terms of visibility than a
> "normal" vacuum. VACUUM FULL should not need to be followed by
> another VACUUM.
>
> > I think it cannot made all visible.
>
> I don't think all tuples in the relation are necessarily visible to
> all transactions, but the ones which are should probably be flagged
> that way.

I have developed the attached proof-of-concept patch to fix the problem
of having no visibility map after CLUSTER or VACUUM FULL.  I tested with
these queries:

    CREATE TABLE test(x INT PRIMARY KEY);
    INSERT INTO test VALUES (1);
    VACUUM FULL test; -- or CLUSTER
    SELECT relfilenode FROM pg_class WHERE relname = 'test';
     relfilenode
    -------------
           16399

Then 'ls -l data/base/16384/16399*' to see the *_vm file.  I am not sure
how to test that the vm contents are valid.

This patch is fairly tricky because our CLUSTER/VACUUM FULL behavior
does not do writes through the shared buffer cache, as outlined in this
C comment block:

 * We can't use the normal heap_insert function to insert into the new
 * heap, because heap_insert overwrites the visibility information.
 * We use a special-purpose raw_heap_insert function instead, which
 * is optimized for bulk inserting a lot of tuples, knowing that we have
 * exclusive access to the heap.  raw_heap_insert builds new pages in
 * local storage.  When a page is full, or at the end of the process,
 * we insert it to WAL as a single record and then write it to disk
 * directly through smgr.  Note, however, that any data sent to the new
 * heap's TOAST table will go through the normal bufmgr.

I originally tried to do this higher up in the stack but ran into
problems because I couldn't access the new heap page so I had to do it
at the non-shared-buffer page level.  I reused the lazy vacuum routines.

I need to know this is the right approach, and need to know what things
are wrong or missing.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Attachment

vacuum.diff

Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL

From

Robert Haas

Date:

28 November 2013, 22:17:14

On Wed, Nov 27, 2013 at 4:33 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Sat, Jan 12, 2013 at 02:14:03PM -0500, Kevin Grittner wrote:
>> Amit Kapila wrote:
>> > On Thursday, January 10, 2013 6:09 AM Josh Berkus wrote:
>>
>> >> Surely VACUUM FULL should rebuild the visibility map, and make
>> >> tuples in the new relation all-visible, no?
>>
>> Certainly it seems odd to me that VACUUM FULL leaves the the table
>> in a less-well maintained state in terms of visibility than a
>> "normal" vacuum. VACUUM FULL should not need to be followed by
>> another VACUUM.
>>
>> > I think it cannot made all visible.
>>
>> I don't think all tuples in the relation are necessarily visible to
>> all transactions, but the ones which are should probably be flagged
>> that way.
>
> I have developed the attached proof-of-concept patch to fix the problem
> of having no visibility map after CLUSTER or VACUUM FULL.  I tested with
> these queries:
>
>         CREATE TABLE test(x INT PRIMARY KEY);
>         INSERT INTO test VALUES (1);
>         VACUUM FULL test; -- or CLUSTER
>         SELECT relfilenode FROM pg_class WHERE relname = 'test';
>          relfilenode
>         -------------
>                16399
>
> Then 'ls -l data/base/16384/16399*' to see the *_vm file.  I am not sure
> how to test that the vm contents are valid.
>
> This patch is fairly tricky because our CLUSTER/VACUUM FULL behavior
> does not do writes through the shared buffer cache, as outlined in this
> C comment block:
>
>  * We can't use the normal heap_insert function to insert into the new
>  * heap, because heap_insert overwrites the visibility information.
>  * We use a special-purpose raw_heap_insert function instead, which
>  * is optimized for bulk inserting a lot of tuples, knowing that we have
>  * exclusive access to the heap.  raw_heap_insert builds new pages in
>  * local storage.  When a page is full, or at the end of the process,
>  * we insert it to WAL as a single record and then write it to disk
>  * directly through smgr.  Note, however, that any data sent to the new
>  * heap's TOAST table will go through the normal bufmgr.
>
> I originally tried to do this higher up in the stack but ran into
> problems because I couldn't access the new heap page so I had to do it
> at the non-shared-buffer page level.  I reused the lazy vacuum routines.
>
> I need to know this is the right approach, and need to know what things
> are wrong or missing.

The fact that you've needed to modify SetHintBits() to make this work
is pretty nasty.  I'm not exactly sure what to do about that, but it
doesn't seem good.

This patch also has the desirable but surprising consequence that hint
bits will be set as a side effect of update_page_vm(), which means
that you'd better do that BEFORE checksumming the page.

I wonder if we ought to mark each page as all-visible in
raw_heap_insert() when we first initialize it, and then clear the flag
when we come across a tuple that isn't all-visible.  We could try to
set hint bits on the tuple before placing it on the page, too, though
I'm not sure of the details.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL

From

Bruce Momjian

Date:

28 November 2013, 22:38:20

On Thu, Nov 28, 2013 at 05:17:07PM -0500, Robert Haas wrote:
> > I need to know this is the right approach, and need to know what things
> > are wrong or missing.
> 
> The fact that you've needed to modify SetHintBits() to make this work
> is pretty nasty.  I'm not exactly sure what to do about that, but it
> doesn't seem good.

Hey, if that's the worst of the problems, I will be very happy.  There
is a sense that because we are not using the shared buffer cache and not
WAL logging, we have to skip some stuff.

> This patch also has the desirable but surprising consequence that hint
> bits will be set as a side effect of update_page_vm(), which means
> that you'd better do that BEFORE checksumming the page.

Yes, I already saw that and fixed it in my git tree.

> I wonder if we ought to mark each page as all-visible in
> raw_heap_insert() when we first initialize it, and then clear the flag
> when we come across a tuple that isn't all-visible.  We could try to
> set hint bits on the tuple before placing it on the page, too, though
> I'm not sure of the details.

I went with the per-page approach because I wanted to re-use the vacuum
lazy function.  Is there some other code that does this already?  I am
trying to avoid yet-another set of routines that would need to be
maintained or could be buggy.  This hit bit setting is tricky.

And thanks much for the review!

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL

From

Bruce Momjian

Date:

03 December 2013, 16:26:09

On Thu, Nov 28, 2013 at 05:38:05PM -0500, Bruce Momjian wrote:
> > I wonder if we ought to mark each page as all-visible in
> > raw_heap_insert() when we first initialize it, and then clear the flag
> > when we come across a tuple that isn't all-visible.  We could try to
> > set hint bits on the tuple before placing it on the page, too, though
> > I'm not sure of the details.
>
> I went with the per-page approach because I wanted to re-use the vacuum
> lazy function.  Is there some other code that does this already?  I am
> trying to avoid yet-another set of routines that would need to be
> maintained or could be buggy.  This hit bit setting is tricky.
>
> And thanks much for the review!

So, should I put this in the next commit fest?  I still have an unknown
about the buffer number to use here:

!       /* XXX use 0 or real offset? */
!       ItemPointerSet(&(tuple.t_self), BufferIsValid(buf) ?
!                      BufferGetBlockNumber(buf) : 0, offnum);

Is everyone else OK with this approach?  Updated patch attached.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Attachment

vacuum.diff

Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL

From

Robert Haas

Date:

03 December 2013, 19:01:57

On Tue, Dec 3, 2013 at 11:25 AM, Bruce Momjian <bruce@momjian.us> wrote:
> On Thu, Nov 28, 2013 at 05:38:05PM -0500, Bruce Momjian wrote:
>> > I wonder if we ought to mark each page as all-visible in
>> > raw_heap_insert() when we first initialize it, and then clear the flag
>> > when we come across a tuple that isn't all-visible.  We could try to
>> > set hint bits on the tuple before placing it on the page, too, though
>> > I'm not sure of the details.
>>
>> I went with the per-page approach because I wanted to re-use the vacuum
>> lazy function.  Is there some other code that does this already?  I am
>> trying to avoid yet-another set of routines that would need to be
>> maintained or could be buggy.  This hit bit setting is tricky.
>>
>> And thanks much for the review!
>
> So, should I put this in the next commit fest?

+1.

> I still have an unknown
> about the buffer number to use here:
>
> !       /* XXX use 0 or real offset? */
> !       ItemPointerSet(&(tuple.t_self), BufferIsValid(buf) ?
> !                      BufferGetBlockNumber(buf) : 0, offnum);
>
> Is everyone else OK with this approach?  Updated patch attached.

I started looking at this a little more the other day but got bogged
down in other things before I got through it all.  I think we're going
to want to revise this so that it doesn't go through the functions
that current assume that they're always deal with a shared_buffer, but
I haven't figured out exactly what the most elegant way of doing that
is yet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL

From

Bruce Momjian

Date:

03 December 2013, 19:34:18

On Tue, Dec  3, 2013 at 02:01:52PM -0500, Robert Haas wrote:
> On Tue, Dec 3, 2013 at 11:25 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > On Thu, Nov 28, 2013 at 05:38:05PM -0500, Bruce Momjian wrote:
> >> > I wonder if we ought to mark each page as all-visible in
> >> > raw_heap_insert() when we first initialize it, and then clear the flag
> >> > when we come across a tuple that isn't all-visible.  We could try to
> >> > set hint bits on the tuple before placing it on the page, too, though
> >> > I'm not sure of the details.
> >>
> >> I went with the per-page approach because I wanted to re-use the vacuum
> >> lazy function.  Is there some other code that does this already?  I am
> >> trying to avoid yet-another set of routines that would need to be
> >> maintained or could be buggy.  This hit bit setting is tricky.
> >>
> >> And thanks much for the review!
> >
> > So, should I put this in the next commit fest?
> 
> +1.

OK, I added it to the January commit fest.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL

From

Simon Riggs

Date:

16 January 2014, 01:29:21

On 28 November 2013 22:17, Robert Haas <robertmhaas@gmail.com> wrote:

> The fact that you've needed to modify SetHintBits() to make this work
> is pretty nasty.  I'm not exactly sure what to do about that, but it
> doesn't seem good.

That makes me feel bad also.

I think we should be looking for some special case routines rather
than piggybacking there.

Wonderful to see you personally and directly addressing this concern
and very supportive of this solution for 9.4.

I wonder about a user option to let these routines wait until they are
the oldest snapshot, so they can then set every page as all-visible
without even bothering to check individual items. Or maybe a user
option to set them all-visible even without actually being the oldest,
as is possible with COPY FREEZE. Full lock on the table is a big
thing, so a shame to waste the opportunity to set everything visible
just because some long running task is still executing somewhere (but
clearly not here otherwise we wouldn't have the lock).

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL

From

Jaime Casanova

Date:

24 January 2014, 21:53:20

On Tue, Dec 3, 2013 at 11:25 AM, Bruce Momjian <bruce@momjian.us> wrote:
>
> Is everyone else OK with this approach?  Updated patch attached.
>

Hi,

I started to look at this patch and i found that it fails an assertion
as soon as you run a VACUUM FULL after a lazy VACUUM even if those are
on unrelated relations. For example in an assert-enabled build with
the regression database run:

VACUUM customer;
[... insert here whatever commands you like or nothing at all ...]
VACUUM FULL tenk1;

TRAP: FailedAssertion("!(InRecovery || ( ((void) ((bool) ((!
assert_enabled) || ! (!((heapBuf) <= NBuffers && (heapBuf) >=
-NLocBuffer)) || (ExceptionalCondition("!((heapBuf) <= NBuffers &&
(heapBuf) >= -NLocBuffer)", ("FailedAssertion"), "visibilitymap.c",
260), 0)))), (heapBuf) != 0 ))", File: "visibilitymap.c", Line: 260)
LOG:  server process (PID 25842) was terminated by signal 6: Aborted
DETAIL:  Failed process was running: vacuum FULL customer;
LOG:  terminating any other active server processes

trace:
(gdb) bt
#0  0x00007f9a3d00d475 in *__GI_raise (sig=<optimized out>) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007f9a3d0106f0 in *__GI_abort () at abort.c:92
#2  0x0000000000777597 in ExceptionalCondition (   conditionName=conditionName@entry=0x7cd3b8 "!(InRecovery || (
((void) ((bool) ((! assert_enabled) || ! (!((heapBuf) <= NBuffers &&
(heapBuf) >= -NLocBuffer)) || (ExceptionalCondition(\"!((heapBuf) <=
NBuffers && (heapBuf) >= -NLocBuffer)\", (\"Fai"...,
errorType=errorType@entry=0x7b0730 "FailedAssertion",   fileName=fileName@entry=0x7cd105 "visibilitymap.c",
lineNumber=lineNumber@entry=260) at assert.c:54
#3  0x00000000004a7d99 in visibilitymap_set
(rel=rel@entry=0x7f9a3da56a00, heapBlk=heapBlk@entry=0,
heapBuf=heapBuf@entry=0, recptr=recptr@entry=0, vmBuf=220,   cutoff_xid=2) at visibilitymap.c:260
#4  0x00000000004a33e5 in update_page_vm (relation=0x7f9a3da56a00,
page=page@entry=0x1868b18 "", blkno=0) at rewriteheap.c:702
#5  0x00000000004a3668 in raw_heap_insert
(state=state@entry=0x1849e98, tup=tup@entry=0x184f208) at
rewriteheap.c:641
#6  0x00000000004a3b8b in rewrite_heap_tuple
(state=state@entry=0x1849e98, old_tuple=old_tuple@entry=0x1852a50,
new_tuple=new_tuple@entry=0x184f208)   at rewriteheap.c:433
#7  0x000000000055c373 in reform_and_rewrite_tuple
(tuple=tuple@entry=0x1852a50,
oldTupDesc=oldTupDesc@entry=0x7f9a3da4d350,   newTupDesc=newTupDesc@entry=0x7f9a3da599a8,
values=values@entry=0x1852f40, isnull=isnull@entry=0x184f920 "",
newRelHasOids=1 '\001',   rwstate=rwstate@entry=0x1849e98) at cluster.c:1670

--
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación
Phone: +593 4 5107566         Cell: +593 987171157

Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL

From

Bruce Momjian

Date:

25 January 2014, 02:09:21

On Fri, Jan 24, 2014 at 04:52:55PM -0500, Jaime Casanova wrote:
> On Tue, Dec 3, 2013 at 11:25 AM, Bruce Momjian <bruce@momjian.us> wrote:
> >
> > Is everyone else OK with this approach?  Updated patch attached.
> >
>
> Hi,
>
> I started to look at this patch and i found that it fails an assertion
> as soon as you run a VACUUM FULL after a lazy VACUUM even if those are
> on unrelated relations. For example in an assert-enabled build with
> the regression database run:
>
> VACUUM customer;
> [... insert here whatever commands you like or nothing at all ...]
> VACUUM FULL customer;

Wow, thanks for the testing.  You are right that I had two bugs, both in
visibilitymap_set().  First, I made setting heapBuf optional, but forgot
to remove the Assert check now that it was optional.  Second, I passed
InvalidBlockNumber instead of InvalidBuffer when calling
visibilitymap_set().

Both are fixed in the attached patch.  Thanks for the report.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Attachment

vacuum.diff

Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL

From

Andres Freund

Date:

15 February 2014, 15:16:44

Hi,

> *************** end_heap_rewrite(RewriteState state)
> *** 281,286 ****
> --- 284,290 ----
>                           true);
>           RelationOpenSmgr(state->rs_new_rel);
>   
> +         update_page_vm(state->rs_new_rel, state->rs_buffer, state->rs_blockno);
>           PageSetChecksumInplace(state->rs_buffer, state->rs_blockno);
>   
>           smgrextend(state->rs_new_rel->rd_smgr, MAIN_FORKNUM, state->rs_blockno,
> *************** raw_heap_insert(RewriteState state, Heap
> *** 633,638 ****
> --- 637,643 ----
>                */
>               RelationOpenSmgr(state->rs_new_rel);
>   
> +             update_page_vm(state->rs_new_rel, page, state->rs_blockno);
>               PageSetChecksumInplace(page, state->rs_blockno);
>   
>               smgrextend(state->rs_new_rel->rd_smgr, MAIN_FORKNUM,

I think those two cases should be combined.

> + static void
> + update_page_vm(Relation relation, Page page, BlockNumber blkno)
> + {
> +     Buffer        vmbuffer = InvalidBuffer;
> +     TransactionId visibility_cutoff_xid;
> + 
> +     visibilitymap_pin(relation, blkno, &vmbuffer);
> +     Assert(BufferIsValid(vmbuffer));
> + 
> +     if (!visibilitymap_test(relation, blkno, &vmbuffer) &&
> +         heap_page_is_all_visible(relation, InvalidBuffer, page,
> +                                  &visibility_cutoff_xid))
> +     {
> +         PageSetAllVisible(page);
> +         visibilitymap_set(relation, blkno, InvalidBuffer,
> +                           InvalidXLogRecPtr, vmbuffer, visibility_cutoff_xid);
> +     }
> +     ReleaseBuffer(vmbuffer);
> + }

How could visibilitymap_test() be true here?

> diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
> new file mode 100644
> index 899ffac..3e7546b
> *** a/src/backend/access/heap/visibilitymap.c
> --- b/src/backend/access/heap/visibilitymap.c
> *************** visibilitymap_set(Relation rel, BlockNum
> *** 257,263 ****
>   #endif
>   
>       Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
> -     Assert(InRecovery || BufferIsValid(heapBuf));
>   
>       /* Check that we have the right heap page pinned, if present */
>       if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
> --- 257,262 ----
> *************** visibilitymap_set(Relation rel, BlockNum
> *** 278,284 ****
>           map[mapByte] |= (1 << mapBit);
>           MarkBufferDirty(vmBuf);
>   
> !         if (RelationNeedsWAL(rel))
>           {
>               if (XLogRecPtrIsInvalid(recptr))
>               {
> --- 277,283 ----
>           map[mapByte] |= (1 << mapBit);
>           MarkBufferDirty(vmBuf);
>   
> !         if (RelationNeedsWAL(rel) && BufferIsValid(heapBuf))
>           {
>               if (XLogRecPtrIsInvalid(recptr))
>               {

At the very least this needs to revise visibilitymap_set's comment about
the requirement of passing a buffer unless !InRecovery.

Also, how is this going to work with replication if you're explicitly
not WAL logging this?


> *************** vac_cmp_itemptr(const void *left, const
> *** 1730,1743 ****
>    * transactions. Also return the visibility_cutoff_xid which is the highest
>    * xmin amongst the visible tuples.
>    */
> ! static bool
> ! heap_page_is_all_visible(Relation rel, Buffer buf, TransactionId *visibility_cutoff_xid)
>   {
> -     Page        page = BufferGetPage(buf);
>       OffsetNumber offnum,
>                   maxoff;
>       bool        all_visible = true;
>   
>       *visibility_cutoff_xid = InvalidTransactionId;
>   
>       /*
> --- 1728,1744 ----
>    * transactions. Also return the visibility_cutoff_xid which is the highest
>    * xmin amongst the visible tuples.
>    */
> ! bool
> ! heap_page_is_all_visible(Relation rel, Buffer buf, Page page,
> !                          TransactionId *visibility_cutoff_xid)
>   {
>       OffsetNumber offnum,
>                   maxoff;
>       bool        all_visible = true;
>   
> +     if (BufferIsValid(buf))
> +         page = BufferGetPage(buf);
> + 
>       *visibility_cutoff_xid = InvalidTransactionId;
>   
>       /*
> *************** heap_page_is_all_visible(Relation rel, B
> *** 1758,1764 ****
>           if (!ItemIdIsUsed(itemid) || ItemIdIsRedirected(itemid))
>               continue;
>   
> !         ItemPointerSet(&(tuple.t_self), BufferGetBlockNumber(buf), offnum);
>   
>           /*
>            * Dead line pointers can have index pointers pointing to them. So
> --- 1759,1767 ----
>           if (!ItemIdIsUsed(itemid) || ItemIdIsRedirected(itemid))
>               continue;
>   
> !         /* XXX use 0 or real offset? */
> !         ItemPointerSet(&(tuple.t_self), BufferIsValid(buf) ?
> !                        BufferGetBlockNumber(buf) : 0, offnum);

How about passing in the page and block number from the outside and not
dealing with a buffer in here at all?

>           /*
>            * Dead line pointers can have index pointers pointing to them. So
> diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
> new file mode 100644
> index f626755..b37ecc4
> *** a/src/backend/utils/time/tqual.c
> --- b/src/backend/utils/time/tqual.c
> *************** static inline void
> *** 108,113 ****
> --- 108,117 ----
>   SetHintBits(HeapTupleHeader tuple, Buffer buffer,
>               uint16 infomask, TransactionId xid)
>   {
> +     /* we might not have a buffer if we are doing raw_heap_insert() */
> +     if (!BufferIsValid(buffer))
> +         return;
> + 
>       if (TransactionIdIsValid(xid))
>       {
>           /* NB: xid must be known committed here! */

This looks seriously wrong to me.

I think the whole idea of doing this in private memory is bad. The
visibility routines aren't written for that, and the above is just one
instance of that, and I am far from convinced it's the only one. So you
either need to work out how to rely on the visibility checking done in
cluster.c, or you need to modify rewriteheap.c to write via
shared_buffers.

Greetings,

Andres Freund

Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL

From

Bruce Momjian

Date:

15 February 2014, 17:50:33

On Sat, Feb 15, 2014 at 04:16:40PM +0100, Andres Freund wrote:
> Hi,
>
> > *************** end_heap_rewrite(RewriteState state)
> > *** 281,286 ****
> > --- 284,290 ----
> >                           true);
> >           RelationOpenSmgr(state->rs_new_rel);
> >
> > +         update_page_vm(state->rs_new_rel, state->rs_buffer, state->rs_blockno);
> >           PageSetChecksumInplace(state->rs_buffer, state->rs_blockno);
> >
> >           smgrextend(state->rs_new_rel->rd_smgr, MAIN_FORKNUM, state->rs_blockno,
> > *************** raw_heap_insert(RewriteState state, Heap
> > *** 633,638 ****
> > --- 637,643 ----
> >                */
> >               RelationOpenSmgr(state->rs_new_rel);
> >
> > +             update_page_vm(state->rs_new_rel, page, state->rs_blockno);
> >               PageSetChecksumInplace(page, state->rs_blockno);
> >
> >               smgrextend(state->rs_new_rel->rd_smgr, MAIN_FORKNUM,
>
> I think those two cases should be combined.

Uh, what I did was to pair the new update_page_vm with the existing
PageSetChecksumInplace calls, figuring if we needed a checksum before we
wrote the page we would need a update_page_vm too.  Is that incorrect?
It is that _last_ page write in the second instance.

> > + static void
> > + update_page_vm(Relation relation, Page page, BlockNumber blkno)
> > + {
> > +     Buffer        vmbuffer = InvalidBuffer;
> > +     TransactionId visibility_cutoff_xid;
> > +
> > +     visibilitymap_pin(relation, blkno, &vmbuffer);
> > +     Assert(BufferIsValid(vmbuffer));
> > +
> > +     if (!visibilitymap_test(relation, blkno, &vmbuffer) &&
> > +         heap_page_is_all_visible(relation, InvalidBuffer, page,
> > +                                  &visibility_cutoff_xid))
> > +     {
> > +         PageSetAllVisible(page);
> > +         visibilitymap_set(relation, blkno, InvalidBuffer,
> > +                           InvalidXLogRecPtr, vmbuffer, visibility_cutoff_xid);
> > +     }
> > +     ReleaseBuffer(vmbuffer);
> > + }
>
> How could visibilitymap_test() be true here?

Oh, you are right that I can only hit that once per page;  test removed.

> > diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
> > new file mode 100644
> > index 899ffac..3e7546b
> > *** a/src/backend/access/heap/visibilitymap.c
> > --- b/src/backend/access/heap/visibilitymap.c
> > *************** visibilitymap_set(Relation rel, BlockNum
> > *** 257,263 ****
> >   #endif
> >
> >       Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
> > -     Assert(InRecovery || BufferIsValid(heapBuf));
> >
> >       /* Check that we have the right heap page pinned, if present */
> >       if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
> > --- 257,262 ----
> > *************** visibilitymap_set(Relation rel, BlockNum
> > *** 278,284 ****
> >           map[mapByte] |= (1 << mapBit);
> >           MarkBufferDirty(vmBuf);
> >
> > !         if (RelationNeedsWAL(rel))
> >           {
> >               if (XLogRecPtrIsInvalid(recptr))
> >               {
> > --- 277,283 ----
> >           map[mapByte] |= (1 << mapBit);
> >           MarkBufferDirty(vmBuf);
> >
> > !         if (RelationNeedsWAL(rel) && BufferIsValid(heapBuf))
> >           {
> >               if (XLogRecPtrIsInvalid(recptr))
> >               {
>
> At the very least this needs to revise visibilitymap_set's comment about
> the requirement of passing a buffer unless !InRecovery.

Oh, good point, comment block updated.

> Also, how is this going to work with replication if you're explicitly
> not WAL logging this?

Uh, I assumed that using the existing functions would handle this.  If
not, I don't know the answer.

> > *************** vac_cmp_itemptr(const void *left, const
> > *** 1730,1743 ****
> >    * transactions. Also return the visibility_cutoff_xid which is the highest
> >    * xmin amongst the visible tuples.
> >    */
> > ! static bool
> > ! heap_page_is_all_visible(Relation rel, Buffer buf, TransactionId *visibility_cutoff_xid)
> >   {
> > -     Page        page = BufferGetPage(buf);
> >       OffsetNumber offnum,
> >                   maxoff;
> >       bool        all_visible = true;
> >
> >       *visibility_cutoff_xid = InvalidTransactionId;
> >
> >       /*
> > --- 1728,1744 ----
> >    * transactions. Also return the visibility_cutoff_xid which is the highest
> >    * xmin amongst the visible tuples.
> >    */
> > ! bool
> > ! heap_page_is_all_visible(Relation rel, Buffer buf, Page page,
> > !                          TransactionId *visibility_cutoff_xid)
> >   {
> >       OffsetNumber offnum,
> >                   maxoff;
> >       bool        all_visible = true;
> >
> > +     if (BufferIsValid(buf))
> > +         page = BufferGetPage(buf);
> > +
> >       *visibility_cutoff_xid = InvalidTransactionId;
> >
> >       /*
> > *************** heap_page_is_all_visible(Relation rel, B
> > *** 1758,1764 ****
> >           if (!ItemIdIsUsed(itemid) || ItemIdIsRedirected(itemid))
> >               continue;
> >
> > !         ItemPointerSet(&(tuple.t_self), BufferGetBlockNumber(buf), offnum);
> >
> >           /*
> >            * Dead line pointers can have index pointers pointing to them. So
> > --- 1759,1767 ----
> >           if (!ItemIdIsUsed(itemid) || ItemIdIsRedirected(itemid))
> >               continue;
> >
> > !         /* XXX use 0 or real offset? */
> > !         ItemPointerSet(&(tuple.t_self), BufferIsValid(buf) ?
> > !                        BufferGetBlockNumber(buf) : 0, offnum);
>
> How about passing in the page and block number from the outside and not
> dealing with a buffer in here at all?

I would love to do that but HeapTupleSatisfiesVacuum() requires a
buffer, though you can (with my patch) optionally not supply one,
meaning if I passed in just the block number I would still need to
generate a buffer pointer.

> >           /*
> >            * Dead line pointers can have index pointers pointing to them. So
> > diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
> > new file mode 100644
> > index f626755..b37ecc4
> > *** a/src/backend/utils/time/tqual.c
> > --- b/src/backend/utils/time/tqual.c
> > *************** static inline void
> > *** 108,113 ****
> > --- 108,117 ----
> >   SetHintBits(HeapTupleHeader tuple, Buffer buffer,
> >               uint16 infomask, TransactionId xid)
> >   {
> > +     /* we might not have a buffer if we are doing raw_heap_insert() */
> > +     if (!BufferIsValid(buffer))
> > +         return;
> > +
> >       if (TransactionIdIsValid(xid))
> >       {
> >           /* NB: xid must be known committed here! */
>
> This looks seriously wrong to me.
>
> I think the whole idea of doing this in private memory is bad. The
> visibility routines aren't written for that, and the above is just one
> instance of that, and I am far from convinced it's the only one. So you
> either need to work out how to rely on the visibility checking done in
> cluster.c, or you need to modify rewriteheap.c to write via
> shared_buffers.

I don't think we can change rewriteheap.c to use shared buffers --- that
code was written to do things in private memory for performance reasons
and I don't think setting hit bits justifies changing that.

Can you be more specific about the cluster.c idea?  I see
copy_heap_data() in cluster.c calling HeapTupleSatisfiesVacuum() with a
'buf' just like the code I am working in.

Based on Robert's feedback a few months ago I suspected I would need
help completing this patch --- now I am sure.

Updated patch attached.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +

Attachment

vacuum.diff

Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL

From

Andres Freund

Date:

15 February 2014, 18:08:45

On 2014-02-15 12:50:14 -0500, Bruce Momjian wrote:

> On Sat, Feb 15, 2014 at 04:16:40PM +0100, Andres Freund wrote:
> > Hi,
> > 
> > > *************** end_heap_rewrite(RewriteState state)
> > > *** 281,286 ****
> > > --- 284,290 ----
> > >                           true);
> > >           RelationOpenSmgr(state->rs_new_rel);
> > >   
> > > +         update_page_vm(state->rs_new_rel, state->rs_buffer, state->rs_blockno);
> > >           PageSetChecksumInplace(state->rs_buffer, state->rs_blockno);
> > >   
> > >           smgrextend(state->rs_new_rel->rd_smgr, MAIN_FORKNUM, state->rs_blockno,
> > > *************** raw_heap_insert(RewriteState state, Heap
> > > *** 633,638 ****
> > > --- 637,643 ----
> > >                */
> > >               RelationOpenSmgr(state->rs_new_rel);
> > >   
> > > +             update_page_vm(state->rs_new_rel, page, state->rs_blockno);
> > >               PageSetChecksumInplace(page, state->rs_blockno);
> > >   
> > >               smgrextend(state->rs_new_rel->rd_smgr, MAIN_FORKNUM,
> > 
> > I think those two cases should be combined.
> 
> Uh, what I did was to pair the new update_page_vm with the existing
> PageSetChecksumInplace calls, figuring if we needed a checksum before we
> wrote the page we would need a update_page_vm too.  Is that incorrect?
> It is that _last_ page write in the second instance.

What I mean is that there should be a new function handling the writeout
of a page.

> > > diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c
> > > new file mode 100644
> > > index 899ffac..3e7546b
> > > *** a/src/backend/access/heap/visibilitymap.c
> > > --- b/src/backend/access/heap/visibilitymap.c
> > > *************** visibilitymap_set(Relation rel, BlockNum
> > > *** 257,263 ****
> > >   #endif
> > >   
> > >       Assert(InRecovery || XLogRecPtrIsInvalid(recptr));
> > > -     Assert(InRecovery || BufferIsValid(heapBuf));
> > >   
> > >       /* Check that we have the right heap page pinned, if present */
> > >       if (BufferIsValid(heapBuf) && BufferGetBlockNumber(heapBuf) != heapBlk)
> > > --- 257,262 ----
> > > *************** visibilitymap_set(Relation rel, BlockNum
> > > *** 278,284 ****
> > >           map[mapByte] |= (1 << mapBit);
> > >           MarkBufferDirty(vmBuf);
> > >   
> > > !         if (RelationNeedsWAL(rel))
> > >           {
> > >               if (XLogRecPtrIsInvalid(recptr))
> > >               {
> > > --- 277,283 ----
> > >           map[mapByte] |= (1 << mapBit);
> > >           MarkBufferDirty(vmBuf);
> > >   
> > > !         if (RelationNeedsWAL(rel) && BufferIsValid(heapBuf))
> > >           {
> > >               if (XLogRecPtrIsInvalid(recptr))
> > >               {
> > 
> > At the very least this needs to revise visibilitymap_set's comment about
> > the requirement of passing a buffer unless !InRecovery.
> 
> Oh, good point, comment block updated.
> 
> > Also, how is this going to work with replication if you're explicitly
> > not WAL logging this?
> 
> Uh, I assumed that using the existing functions would handle this.  If
> not, I don't know the answer.

Err. Read the block above again, where you added the
BufferIsValid(heapBuf) check. That's exactly the part doing the WAL
logging.

> > > *** a/src/backend/utils/time/tqual.c
> > > --- b/src/backend/utils/time/tqual.c
> > > *************** static inline void
> > > *** 108,113 ****
> > > --- 108,117 ----
> > >   SetHintBits(HeapTupleHeader tuple, Buffer buffer,
> > >               uint16 infomask, TransactionId xid)
> > >   {
> > > +     /* we might not have a buffer if we are doing raw_heap_insert() */
> > > +     if (!BufferIsValid(buffer))
> > > +         return;
> > > + 
> > >       if (TransactionIdIsValid(xid))
> > >       {
> > >           /* NB: xid must be known committed here! */
> > 
> > This looks seriously wrong to me.
> > 
> > I think the whole idea of doing this in private memory is bad. The
> > visibility routines aren't written for that, and the above is just one
> > instance of that, and I am far from convinced it's the only one. So you
> > either need to work out how to rely on the visibility checking done in
> > cluster.c, or you need to modify rewriteheap.c to write via
> > shared_buffers.
> 
> I don't think we can change rewriteheap.c to use shared buffers --- that
> code was written to do things in private memory for performance reasons
> and I don't think setting hit bits justifies changing that.

I don't think that's necessarily contradictory. You'd need to use a
ringbuffer strategy for IO, like e.g. VACUUM does. But I admit it's not
a insignificant amount of work, and it might need some adjustements in
some places.

> Can you be more specific about the cluster.c idea?  I see
> copy_heap_data() in cluster.c calling HeapTupleSatisfiesVacuum() with a
> 'buf' just like the code I am working in.

Yes, it does. But in contrast to your patch it does so on the *origin*
buffer. Which is in shared memory.
The idea would be to modify rewrite_heap_tuple() to get a new parameter
informing it it about the tuple's visibility. That'd allow you to avoid
doing any additional visibility checks.

Unfortunately that would still not fix the problem that
visibilitymap_set requires a real buffer to be passed in. If you're
going that route, you'll need to make a bit more sweeping changes. You'd
need to add new blockno parameter and a BufferIsValid() check to the
right places in log_heap_visible(). Pretty ugly if you ask me...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL

From

Bruce Momjian

Date:

16 February 2014, 02:34:31

On Sat, Feb 15, 2014 at 07:08:40PM +0100, Andres Freund wrote:
> > Can you be more specific about the cluster.c idea?  I see
> > copy_heap_data() in cluster.c calling HeapTupleSatisfiesVacuum() with a
> > 'buf' just like the code I am working in.
> 
> Yes, it does. But in contrast to your patch it does so on the *origin*
> buffer. Which is in shared memory.
> The idea would be to modify rewrite_heap_tuple() to get a new parameter
> informing it it about the tuple's visibility. That'd allow you to avoid
> doing any additional visibility checks.
> 
> Unfortunately that would still not fix the problem that
> visibilitymap_set requires a real buffer to be passed in. If you're
> going that route, you'll need to make a bit more sweeping changes. You'd
> need to add new blockno parameter and a BufferIsValid() check to the
> right places in log_heap_visible(). Pretty ugly if you ask me...

Thank you for the thorough review.  Unless someone else can complete
this, I think it should be marked as returned with feedback.  I don't
think I am going to learn enough to complete this during the
commit-fest.

Since learning of the limitations in setting vmmap bits for index-only
scans in October, I have been unable to improve VACUUM/CLUSTER, and
unable to improve autovacuum --- a double failure.  I guess there is
always PG 9.5.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL

From

Andres Freund

Date:

16 February 2014, 10:49:54

On 2014-02-15 21:34:15 -0500, Bruce Momjian wrote:
> Thank you for the thorough review.  Unless someone else can complete
> this, I think it should be marked as returned with feedback.  I don't
> think I am going to learn enough to complete this during the
> commit-fest.

Agreed. Marked it as such.


> I guess there is always PG 9.5.

I sure hope so ;)

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL

From

Jim Nasby

Date:

16 February 2014, 22:42:03

On 1/24/14, 3:52 PM, Jaime Casanova wrote:
> On Tue, Dec 3, 2013 at 11:25 AM, Bruce Momjian<bruce@momjian.us>  wrote:
>> >
>> >Is everyone else OK with this approach?  Updated patch attached.
>> >
> Hi,
>
> I started to look at this patch and i found that it fails an assertion
> as soon as you run a VACUUM FULL after a lazy VACUUM even if those are
> on unrelated relations. For example in an assert-enabled build with
> the regression database run:
>
> VACUUM customer;
> [... insert here whatever commands you like or nothing at all ...]
> VACUUM FULL tenk1;

Is anyone else confused/concerned that regression testing didn't pick this up? The vacuum.sql test does not test lazy
vacuumat all, and I can't seem to find any other tests that test lazy vacuum either...
 
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net

Re: 9.2.1 & index-only scans : abnormal heap fetches after VACUUM FULL

From

Heikki Linnakangas

Date:

18 February 2014, 09:04:26

On 02/16/2014 10:19 PM, Jim Nasby wrote:
> On 1/24/14, 3:52 PM, Jaime Casanova wrote:
>> On Tue, Dec 3, 2013 at 11:25 AM, Bruce Momjian<bruce@momjian.us>  wrote:
>>>>
>>>> Is everyone else OK with this approach?  Updated patch attached.
>>>>
>> Hi,
>>
>> I started to look at this patch and i found that it fails an assertion
>> as soon as you run a VACUUM FULL after a lazy VACUUM even if those are
>> on unrelated relations. For example in an assert-enabled build with
>> the regression database run:
>>
>> VACUUM customer;
>> [... insert here whatever commands you like or nothing at all ...]
>> VACUUM FULL tenk1;
>
> Is anyone else confused/concerned that regression testing didn't pick this up?

I wouldn't expect that to be explicitly tested - it's pretty unexpected 
that they work independently but not when run one after another. But 
it's a bit surprising that we don't happen to do that combination in any 
of the tests by pure chance.

> The vacuum.sql test does not test lazy vacuum at all, and I can't seem to find any other tests that test lazy vacuum
either...

There are several lazy vacuums in the regression suite:

sql/alter_table.sql:vacuum analyze atacc1(a);
sql/alter_table.sql:vacuum analyze atacc1("........pg.dropped.1........");
sql/hs_standby_disallowed.sql:VACUUM hs2;
sql/indirect_toast.sql:VACUUM FREEZE toasttest;
sql/indirect_toast.sql:VACUUM FREEZE toasttest;
sql/matview.sql:VACUUM ANALYZE hogeview;
sql/numeric_big.sql:VACUUM ANALYZE num_exp_add;
sql/numeric_big.sql:VACUUM ANALYZE num_exp_sub;
sql/numeric_big.sql:VACUUM ANALYZE num_exp_div;
sql/numeric_big.sql:VACUUM ANALYZE num_exp_mul;
sql/numeric_big.sql:VACUUM ANALYZE num_exp_sqrt;
sql/numeric_big.sql:VACUUM ANALYZE num_exp_ln;
sql/numeric_big.sql:VACUUM ANALYZE num_exp_log10;
sql/numeric_big.sql:VACUUM ANALYZE num_exp_power_10_ln;
sql/numeric.sql:VACUUM ANALYZE num_exp_add;
sql/numeric.sql:VACUUM ANALYZE num_exp_sub;
sql/numeric.sql:VACUUM ANALYZE num_exp_div;
sql/numeric.sql:VACUUM ANALYZE num_exp_mul;
sql/numeric.sql:VACUUM ANALYZE num_exp_sqrt;
sql/numeric.sql:VACUUM ANALYZE num_exp_ln;
sql/numeric.sql:VACUUM ANALYZE num_exp_log10;
sql/numeric.sql:VACUUM ANALYZE num_exp_power_10_ln;
sql/sanity_check.sql:VACUUM;
sql/without_oid.sql:VACUUM ANALYZE wi;
sql/without_oid.sql:VACUUM ANALYZE wo;

Most of those commands are there to analyze, rather than vacuum, but 
lazy vacuum is definitely exercised by the regression tests. I agree 
it's quite surprising that vacuum.sql doesn't actually perform any lazy 
vacuums.

- Heikki