Thread: Accounting of zero-filled buffers in EXPLAIN (BUFFERS)

Accounting of zero-filled buffers in EXPLAIN (BUFFERS)

From

Thomas Munro

Date:

30 April 2018, 05:59:31

Hi,

In EXPLAIN (BUFFERS), there are two kinds of cache misses that show up
as "reads" when in fact they are not reads at all:

1.  Relation extension, which in fact writes a zero-filled block.
2.  The RBM_ZERO_* modes, which provoke neither read nor write.

Here's a suggested fix.

I noticed this because I have some patches in development that hit
these cases a bit more and the numbers didn't match my expectation.  I
suppose someone might want a separate counter for zero-filled buffers
(they're still buffer interactions and cache misses) but it seems like
it's probably below the kind of thing we're interested in counting
(though in passing, I notice recently that some kernels keep some free
pages pre-zeroed so they can supply them faster, which is curious).

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

0001-Don-t-count-zero-filled-buffers-as-read-in-EXPLAIN.patch

Re: Accounting of zero-filled buffers in EXPLAIN (BUFFERS)

From

Andres Freund

Date:

30 April 2018, 06:13:18

On 2018-04-30 14:59:31 +1200, Thomas Munro wrote:
> Hi,
> 
> In EXPLAIN (BUFFERS), there are two kinds of cache misses that show up
> as "reads" when in fact they are not reads at all:
> 
> 1.  Relation extension, which in fact writes a zero-filled block.
> 2.  The RBM_ZERO_* modes, which provoke neither read nor write.

Just for understanding: 2) can never happen for buffers showing up in
EXPLAIN, right?

I'm not saying you shouldn't fix the accounting...

Greetings,

Andres Freund

Re: Accounting of zero-filled buffers in EXPLAIN (BUFFERS)

From

Thomas Munro

Date:

30 April 2018, 06:37:35

On Mon, Apr 30, 2018 at 3:13 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2018-04-30 14:59:31 +1200, Thomas Munro wrote:
>> In EXPLAIN (BUFFERS), there are two kinds of cache misses that show up
>> as "reads" when in fact they are not reads at all:
>>
>> 1.  Relation extension, which in fact writes a zero-filled block.
>> 2.  The RBM_ZERO_* modes, which provoke neither read nor write.
>
> Just for understanding: 2) can never happen for buffers showing up in
> EXPLAIN, right?
>
> I'm not saying you shouldn't fix the accounting...

Maybe the hash AM can reach that in _hash_getinitbuf() while adding
overflow pages to bucket chains?  Admittedly case 2 is obscure and
rare if not unreachable and probably no one would care too much about
that in practice (whereas case 1 can be seen by simply inserting stuff
into a new empty table).  Other patches I'm working on for later
proposal do it more often (think accessing known-empty pages in a kind
of preallocated extent), and it occurred to me that it's clearly a bug
on principle, hence this patch.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Accounting of zero-filled buffers in EXPLAIN (BUFFERS)

From

Haribabu Kommi

Date:

11 July 2018, 15:46:44

On Mon, Apr 30, 2018 at 1:38 PM Thomas Munro <thomas.munro@enterprisedb.com> wrote:

On Mon, Apr 30, 2018 at 3:13 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2018-04-30 14:59:31 +1200, Thomas Munro wrote:
>> In EXPLAIN (BUFFERS), there are two kinds of cache misses that show up
>> as "reads" when in fact they are not reads at all:
>>
>> 1. Relation extension, which in fact writes a zero-filled block.
>> 2. The RBM_ZERO_* modes, which provoke neither read nor write.
>
> Just for understanding: 2) can never happen for buffers showing up in
> EXPLAIN, right?
>
> I'm not saying you shouldn't fix the accounting...

Maybe the hash AM can reach that in _hash_getinitbuf() while adding
overflow pages to bucket chains? Admittedly case 2 is obscure and
rare if not unreachable and probably no one would care too much about
that in practice (whereas case 1 can be seen by simply inserting stuff
into a new empty table). Other patches I'm working on for later
proposal do it more often (think accessing known-empty pages in a kind
of preallocated extent), and it occurred to me that it's clearly a bug
on principle, hence this patch.

I checked the patch and I agree with the change 1). And regarding change 2)

whether it is zeroing the contents of the page or not, it does read?

because if it exists in the buffer pool, we are counting them as hits irrespective

of the mode? Am I missing something?

Regards,

Haribabu Kommi

Fujitsu Australia

Re: Accounting of zero-filled buffers in EXPLAIN (BUFFERS)

From

Thomas Munro

Date:

12 July 2018, 01:31:46

On Thu, Jul 12, 2018 at 12:46 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
>> > On 2018-04-30 14:59:31 +1200, Thomas Munro wrote:
>> >> In EXPLAIN (BUFFERS), there are two kinds of cache misses that show up
>> >> as "reads" when in fact they are not reads at all:
>> >>
>> >> 1.  Relation extension, which in fact writes a zero-filled block.
>> >> 2.  The RBM_ZERO_* modes, which provoke neither read nor write.
>
> I checked the patch and I agree with the change 1). And regarding change 2)
> whether it is  zeroing the contents of the page or not, it does read?
> because if it exists in the buffer pool, we are counting them as hits
> irrespective
> of the mode? Am I missing something?

Further down in the function you can see that there is no read()
system call for the RBM_ZERO_* modes:

                if (mode == RBM_ZERO_AND_LOCK || mode ==
RBM_ZERO_AND_CLEANUP_LOCK)
                        MemSet((char *) bufBlock, 0, BLCKSZ);
                else
                {
                        ...
                        smgrread(smgr, forkNum, blockNum, (char *) bufBlock);
                        ...
                }

I suppose someone might argue that even when it's not a hit and it's
not a read, we might still want to count this buffer interaction in
some other way.  Perhaps there should be a separate counter?  It may
technically be a kind of cache miss, but it's nowhere near as
expensive as a synchronous system call like read() so I didn't propose
that.

Some more on my motivation:  In our zheap prototype, when the system
is working well and we have enough space, we constantly allocate
zeroed buffer pages at the insert point (= head) of an undo log and
drop pages at the discard point (= tail) in the background;
effectively a few pages just go round and round via the freelist and
no read() or write() syscalls happen.  That's something I'm very happy
about and it's one of our claimed advantages over the traditional heap
(which tends to read and dirty more pages), but EXPLAIN (BUFFERS)
hides this virtuous behaviour when comparing with the traditional
heap: it falsely and slanderously reports that zheap is reading undo
pages when it is not.  Of course I don't intent to litigate zheap
design in this thread, I just I figured that since this accounting is
wrong on principle and affects current PostgreSQL too (at least in
theory) I would propose this little patch independently.  It's subtle
enough that I wouldn't bother to back-patch it though.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Accounting of zero-filled buffers in EXPLAIN (BUFFERS)

From

Haribabu Kommi

Date:

12 July 2018, 03:19:29

On Thu, Jul 12, 2018 at 8:32 AM Thomas Munro <thomas.munro@enterprisedb.com> wrote:

On Thu, Jul 12, 2018 at 12:46 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
>> > On 2018-04-30 14:59:31 +1200, Thomas Munro wrote:
>> >> In EXPLAIN (BUFFERS), there are two kinds of cache misses that show up
>> >> as "reads" when in fact they are not reads at all:
>> >>
>> >> 1. Relation extension, which in fact writes a zero-filled block.
>> >> 2. The RBM_ZERO_* modes, which provoke neither read nor write.
>
> I checked the patch and I agree with the change 1). And regarding change 2)
> whether it is zeroing the contents of the page or not, it does read?
> because if it exists in the buffer pool, we are counting them as hits
> irrespective
> of the mode? Am I missing something?

Further down in the function you can see that there is no read()
system call for the RBM_ZERO_* modes:

if (mode == RBM_ZERO_AND_LOCK || mode ==
RBM_ZERO_AND_CLEANUP_LOCK)
MemSet((char *) bufBlock, 0, BLCKSZ);
else
{
...
smgrread(smgr, forkNum, blockNum, (char *) bufBlock);
...
}

Thanks for the details. I got your point. But we need to include

RBM_ZERO_ON_ERROR case read operations, excluding others

are fine.

I suppose someone might argue that even when it's not a hit and it's
not a read, we might still want to count this buffer interaction in
some other way. Perhaps there should be a separate counter? It may
technically be a kind of cache miss, but it's nowhere near as
expensive as a synchronous system call like read() so I didn't propose
that.

Yes, I agree that we may need a new counter that counts the buffers that

are just allocated (no read or no write). But currently, may be the counter

value is very less, so people are not interested.

Some more on my motivation: In our zheap prototype, when the system
is working well and we have enough space, we constantly allocate
zeroed buffer pages at the insert point (= head) of an undo log and
drop pages at the discard point (= tail) in the background;
effectively a few pages just go round and round via the freelist and
no read() or write() syscalls happen. That's something I'm very happy
about and it's one of our claimed advantages over the traditional heap
(which tends to read and dirty more pages), but EXPLAIN (BUFFERS)
hides this virtuous behaviour when comparing with the traditional
heap: it falsely and slanderously reports that zheap is reading undo
pages when it is not. Of course I don't intent to litigate zheap
design in this thread, I just I figured that since this accounting is
wrong on principle and affects current PostgreSQL too (at least in
theory) I would propose this little patch independently. It's subtle
enough that I wouldn't bother to back-patch it though.

OK. May be it is better to implement the buffer allocate counter along with

zheap to provide better buffer results?

Regards,

Haribabu Kommi

Fujitsu Australia

Re: Accounting of zero-filled buffers in EXPLAIN (BUFFERS)

From

David Rowley

Date:

12 July 2018, 04:28:37

On 12 July 2018 at 12:19, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
> Yes, I agree that we may need a new counter that counts the buffers that
> are just allocated (no read or no write). But currently, may be the counter
> value is very less, so people are not interested.

The existing counters don't show up in EXPLAIN (ANALYZE, BUFFERS) when
they're zero. The new counter would surely work the same way, so the
users wouldn't be burdened by additional explain output it when
they're not affected by it.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Accounting of zero-filled buffers in EXPLAIN (BUFFERS)

From

Kyotaro HORIGUCHI

Date:

15 November 2018, 06:31:02

At Thu, 12 Jul 2018 10:31:46 +1200, Thomas Munro <thomas.munro@enterprisedb.com> wrote in
<CAEepm=2yGnw6dJ5wx8tuSRmcpDZ5uziiH_qQ0ptdha8mMLSVqw@mail.gmail.com>
> I suppose someone might argue that even when it's not a hit and it's
> not a read, we might still want to count this buffer interaction in
> some other way.  Perhaps there should be a separate counter?  It may
> technically be a kind of cache miss, but it's nowhere near as
> expensive as a synchronous system call like read() so I didn't propose
> that.

At Thu, 12 Jul 2018 10:19:29 +1000, Haribabu Kommi <kommi.haribabu@gmail.com> wrote in
<CAJrrPGduEpxwtu9VFxT21DNK=WRP=LUjK4GjPfm+4+PCjpcAxA@mail.gmail.com>
> Thanks for the details. I got your point. But we need to include
> RBM_ZERO_ON_ERROR case read operations, excluding others
> are fine.

FWIW I agree to Haribabu's comment that the case of RBM_ZERO_*
other than ON_ERROR is a hit. It seems to show zheap's disk I /O
reduction by itself in certain extent.

At Thu, 12 Jul 2018 13:28:37 +1200, David Rowley <david.rowley@2ndquadrant.com> wrote in
<CAKJS1f_kem=OzVpFADzC2=VpzcXuSQ+HCW=Hp67GAXrnQ-EVTw@mail.gmail.com>
> On 12 July 2018 at 12:19, Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
> > Yes, I agree that we may need a new counter that counts the buffers that
> > are just allocated (no read or no write). But currently, may be the counter
> > value is very less, so people are not interested.

> The existing counters don't show up in EXPLAIN (ANALYZE, BUFFERS) when
> they're zero. The new counter would surely work the same way, so the
> users wouldn't be burdened by additional explain output it when
> they're not affected by it.

I don't object strongly neither to the new counter but I'm not
sure it is enough distinctive from hits, in the view that "a hit
is where we found a buffer for the requested page".

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Accounting of zero-filled buffers in EXPLAIN (BUFFERS)

From

Alvaro Herrera

Date:

17 November 2018, 14:15:54

Is this patch committable now?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Accounting of zero-filled buffers in EXPLAIN (BUFFERS)

From

Kyotaro HORIGUCHI

Date:

19 November 2018, 04:24:08

At Sat, 17 Nov 2018 11:15:54 -0300, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in
<20181117141554.4dkx2u4j6md3bqdh@alvherre.pgsql>
> Is this patch committable now?

I don't think so. We should make a decision on a point.

I was a bit confused (sorry) but IIUIC Haribabu suggested that
the RBM_ZERO_ON_ERROR case should be included to read (not just
ignored). I'm on it.  I think that RPM_ZERO_* other than ON_ERROR
cases could be a kind of hit but I don't insist on it.

So, I think we should decide on at least the ON_ERROR case before
this becomes commttable.

The another counter could be another issue. I don't object to add
the counter but I'm not sure what is the suitable name.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Accounting of zero-filled buffers in EXPLAIN (BUFFERS)

From

Thomas Munro

Date:

23 November 2018, 07:39:13

On Mon, Nov 19, 2018 at 5:24 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> At Sat, 17 Nov 2018 11:15:54 -0300, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in
<20181117141554.4dkx2u4j6md3bqdh@alvherre.pgsql>
> > Is this patch committable now?
>
> I don't think so. We should make a decision on a point.
>
> I was a bit confused (sorry) but IIUIC Haribabu suggested that
> the RBM_ZERO_ON_ERROR case should be included to read (not just
> ignored). I'm on it.  I think that RPM_ZERO_* other than ON_ERROR
> cases could be a kind of hit but I don't insist on it.
>
> So, I think we should decide on at least the ON_ERROR case before
> this becomes commttable.

Agreed, RBM_ZERO_ON_ERROR needs to be counted as a read.  Here's a new
version like that.

> The another counter could be another issue. I don't object to add
> the counter but I'm not sure what is the suitable name.

I think that might be interesting information in the future, but let's
consider that for a later patch.

-- 
Thomas Munro
http://www.enterprisedb.com

Attachment

0001-Don-t-count-zero-filled-buffers-as-read-in-EXPLAI-v2.patch

Re: Accounting of zero-filled buffers in EXPLAIN (BUFFERS)

From

Haribabu Kommi

Date:

27 November 2018, 01:53:05

On Fri, Nov 23, 2018 at 6:39 PM Thomas Munro <thomas.munro@enterprisedb.com> wrote:

On Mon, Nov 19, 2018 at 5:24 PM Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> At Sat, 17 Nov 2018 11:15:54 -0300, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in <20181117141554.4dkx2u4j6md3bqdh@alvherre.pgsql>
> > Is this patch committable now?
>
> I don't think so. We should make a decision on a point.
>
> I was a bit confused (sorry) but IIUIC Haribabu suggested that
> the RBM_ZERO_ON_ERROR case should be included to read (not just
> ignored). I'm on it. I think that RPM_ZERO_* other than ON_ERROR
> cases could be a kind of hit but I don't insist on it.
>
> So, I think we should decide on at least the ON_ERROR case before
> this becomes commttable.

Agreed, RBM_ZERO_ON_ERROR needs to be counted as a read. Here's a new
version like that.

> The another counter could be another issue. I don't object to add
> the counter but I'm not sure what is the suitable name.

I think that might be interesting information in the future, but let's
consider that for a later patch.

Thanks for the updated patch. It looks good.

I marked it in the commitfest as ready for committer.

Regards,

Haribabu Kommi

Fujitsu Australia

Re: Accounting of zero-filled buffers in EXPLAIN (BUFFERS)

From

Thomas Munro

Date:

27 November 2018, 23:03:03

On Tue, Nov 27, 2018 at 2:53 PM Haribabu Kommi <kommi.haribabu@gmail.com> wrote:
> Thanks for the updated patch. It looks good.
> I marked it in the commitfest as ready for committer.

Pushed.  Thanks for the reviews!

-- 
Thomas Munro
http://www.enterprisedb.com