Thread: wal stats questions

wal stats questions

From

Andres Freund

Date:

24 March 2021, 23:22:24

Hi,

I got a few questions about the wal stats while working on the shmem
stats patch:

1) What is the motivation to have both prevWalUsage and pgWalUsage,
   instead of just accumulating the stats since the last submission?
   There doesn't seem to be any comment explaining it? Computing
   differences to previous values, and copying to prev*, isn't free. I
   assume this is out of a fear that the stats could get reset before
   they're used for instrument.c purposes - but who knows?

2) Why is there both pgstat_send_wal() and pgstat_report_wal()? With the
   former being used by wal writer, the latter by most other processes?
   There again don't seem to be comments explaining this.

3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
   just to figure out if there's been any changes isn't all that
   cheap. This is regularly exercised in read-only workloads too. Seems
   adding a boolean WalStats.have_pending = true or such would be
   better.

4) For plain backends pgstat_report_wal() is called by
   pgstat_report_stat() - but it is not checked as part of the "Don't
   expend a clock check if nothing to do" check at the top.  It's
   probably rare to have pending wal stats without also passing one of
   the other conditions, but ...

Generally the various patches seems to to have a lot of the boilerplate
style comments (like "Prepare and send the message"), but very little in
the way of explaining the design.

Greetings,

Andres Freund

Re: wal stats questions

From

Masahiro Ikeda

Date:

25 March 2021, 01:51:56

On 2021/03/25 8:22, Andres Freund wrote:
> Hi,
> 
> I got a few questions about the wal stats while working on the shmem
> stats patch:

Thanks for your reviews.

> 1) What is the motivation to have both prevWalUsage and pgWalUsage,
>    instead of just accumulating the stats since the last submission?
>    There doesn't seem to be any comment explaining it? Computing
>    differences to previous values, and copying to prev*, isn't free. I
>    assume this is out of a fear that the stats could get reset before
>    they're used for instrument.c purposes - but who knows?

Is your point that it's better to call pgWalUsage = 0; after sending the
stats? My understanding is as same as your assumption. For example,
pg_stat_statements.c use pgWalUsage and calculate the diff.

But, because the stats may be sent after the transaction is finished, it
doesn't seem to lead wrong stats if pgWalUsage = 0 is called. So, I agree your
suggestion.

If the extension wants to know the walusage diff across two transactions,
it may lead to wrong stats, but I think it won't happen.

> 2) Why is there both pgstat_send_wal() and pgstat_report_wal()? With the
>    former being used by wal writer, the latter by most other processes?
>    There again don't seem to be comments explaining this.

To control the transmission interval for the wal writer because it may send
the stats too frequency, and to omit to calculate the generated wal stats
because it doesn't generate wal records. But, now I think it's better to merge
them.

> 3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
>    just to figure out if there's been any changes isn't all that
>    cheap. This is regularly exercised in read-only workloads too. Seems
>    adding a boolean WalStats.have_pending = true or such would be
>    better.

I understood that for backends, this may leads performance degradation and
this problem is not only for the WalStats but also SLRUStats.

I wondered the degradation is so big because pgstat_report_stat() checks if at
least PGSTAT_STAT_INTERVAL msec is passed since it last sent before check the
diff. If my understanding is correct, to get timestamp is more expensive.

> 4) For plain backends pgstat_report_wal() is called by
>    pgstat_report_stat() - but it is not checked as part of the "Don't
>    expend a clock check if nothing to do" check at the top.  It's
>    probably rare to have pending wal stats without also passing one of
>    the other conditions, but ...

(I'm not confidence my understanding of your comment is right.)
You mean that we need to expend a clock check in pgstat_report_wal()?
I think it's unnecessary because pgstat_report_stat() is already checked it.

> Generally the various patches seems to to have a lot of the boilerplate
> style comments (like "Prepare and send the message"), but very little in
> the way of explaining the design.

Sorry for that. I'll be careful.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION

Re: wal stats questions

From

Andres Freund

Date:

25 March 2021, 04:07:26

Hi,

On 2021-03-25 10:51:56 +0900, Masahiro Ikeda wrote:
> On 2021/03/25 8:22, Andres Freund wrote:
> > 1) What is the motivation to have both prevWalUsage and pgWalUsage,
> >    instead of just accumulating the stats since the last submission?
> >    There doesn't seem to be any comment explaining it? Computing
> >    differences to previous values, and copying to prev*, isn't free. I
> >    assume this is out of a fear that the stats could get reset before
> >    they're used for instrument.c purposes - but who knows?
>
> Is your point that it's better to call pgWalUsage = 0; after sending the
> stats?

Yes. At least there should be a comment explaining why it's done the way
it is.



> > 3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
> >    just to figure out if there's been any changes isn't all that
> >    cheap. This is regularly exercised in read-only workloads too. Seems
> >    adding a boolean WalStats.have_pending = true or such would be
> >    better.
>
> I understood that for backends, this may leads performance degradation and
> this problem is not only for the WalStats but also SLRUStats.
>
> I wondered the degradation is so big because pgstat_report_stat() checks if at
> least PGSTAT_STAT_INTERVAL msec is passed since it last sent before check the
> diff. If my understanding is correct, to get timestamp is more expensive.

Getting a timestamp is expensive, yes. But I think we need to check for
the no-pending-wal-stats *before* the clock check. See the below:


> > 4) For plain backends pgstat_report_wal() is called by
> >    pgstat_report_stat() - but it is not checked as part of the "Don't
> >    expend a clock check if nothing to do" check at the top.  It's
> >    probably rare to have pending wal stats without also passing one of
> >    the other conditions, but ...
>
> (I'm not confidence my understanding of your comment is right.)
> You mean that we need to expend a clock check in pgstat_report_wal()?
> I think it's unnecessary because pgstat_report_stat() is already checked it.

No, I mean that right now we might can erroneously return early
pgstat_report_wal() for normal backends in some workloads:

void
pgstat_report_stat(bool disconnect)
...
    /* Don't expend a clock check if nothing to do */
    if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) &&
        pgStatXactCommit == 0 && pgStatXactRollback == 0 &&
        !have_function_stats && !disconnect)
        return;

might return if there only is pending WAL activity. This needs to check
whether there are pending WAL stats. Which in turn means that the check
should be fast.

Greetings,

Andres Freund

Re: wal stats questions

From

Kyotaro Horiguchi

Date:

25 March 2021, 07:37:10

At Wed, 24 Mar 2021 21:07:26 -0700, Andres Freund <andres@anarazel.de> wrote in 
> Hi,
> 
> On 2021-03-25 10:51:56 +0900, Masahiro Ikeda wrote:
> > On 2021/03/25 8:22, Andres Freund wrote:
> > > 1) What is the motivation to have both prevWalUsage and pgWalUsage,
> > >    instead of just accumulating the stats since the last submission?
> > >    There doesn't seem to be any comment explaining it? Computing
> > >    differences to previous values, and copying to prev*, isn't free. I
> > >    assume this is out of a fear that the stats could get reset before
> > >    they're used for instrument.c purposes - but who knows?
> >
> > Is your point that it's better to call pgWalUsage = 0; after sending the
> > stats?
> 
> Yes. At least there should be a comment explaining why it's done the way
> it is.

pgWalUsage was used without resetting and we (I) thought that that
behavior should be preserved.  On second thought, as Andres suggested,
we can just reset pgWalUsage at sending since AFAICS no one takes
difference crossing pgstat_report_stat() calls.

> > > 3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
> > >    just to figure out if there's been any changes isn't all that
> > >    cheap. This is regularly exercised in read-only workloads too. Seems
> > >    adding a boolean WalStats.have_pending = true or such would be
> > >    better.
> >
> > I understood that for backends, this may leads performance degradation and
> > this problem is not only for the WalStats but also SLRUStats.
> >
> > I wondered the degradation is so big because pgstat_report_stat() checks if at
> > least PGSTAT_STAT_INTERVAL msec is passed since it last sent before check the
> > diff. If my understanding is correct, to get timestamp is more expensive.
> 
> Getting a timestamp is expensive, yes. But I think we need to check for
> the no-pending-wal-stats *before* the clock check. See the below:
> 
> 
> > > 4) For plain backends pgstat_report_wal() is called by
> > >    pgstat_report_stat() - but it is not checked as part of the "Don't
> > >    expend a clock check if nothing to do" check at the top.  It's
> > >    probably rare to have pending wal stats without also passing one of
> > >    the other conditions, but ...
> >
> > (I'm not confidence my understanding of your comment is right.)
> > You mean that we need to expend a clock check in pgstat_report_wal()?
> > I think it's unnecessary because pgstat_report_stat() is already checked it.
> 
> No, I mean that right now we might can erroneously return early
> pgstat_report_wal() for normal backends in some workloads:
> 
> void
> pgstat_report_stat(bool disconnect)
> ...
>     /* Don't expend a clock check if nothing to do */
>     if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) &&
>         pgStatXactCommit == 0 && pgStatXactRollback == 0 &&
>         !have_function_stats && !disconnect)
>         return;
> 
> might return if there only is pending WAL activity. This needs to check
> whether there are pending WAL stats. Which in turn means that the check
> should be fast.

Agreed that the condition is wrong.  On the other hand, the counters
are incremented in XLogInsertRecord() and I think we don't want add
instructions there.

If any wal activity has been recorded, wal_records is always positive
thus we can check for wal activity just by "pgWalUsage.wal_records >
0, which should be fast enough.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: wal stats questions

From

Fujii Masao

Date:

25 March 2021, 10:01:23


On 2021/03/25 16:37, Kyotaro Horiguchi wrote:
> At Wed, 24 Mar 2021 21:07:26 -0700, Andres Freund <andres@anarazel.de> wrote in
>> Hi,
>>
>> On 2021-03-25 10:51:56 +0900, Masahiro Ikeda wrote:
>>> On 2021/03/25 8:22, Andres Freund wrote:
>>>> 1) What is the motivation to have both prevWalUsage and pgWalUsage,
>>>>     instead of just accumulating the stats since the last submission?
>>>>     There doesn't seem to be any comment explaining it? Computing
>>>>     differences to previous values, and copying to prev*, isn't free. I
>>>>     assume this is out of a fear that the stats could get reset before
>>>>     they're used for instrument.c purposes - but who knows?
>>>
>>> Is your point that it's better to call pgWalUsage = 0; after sending the
>>> stats?
>>
>> Yes. At least there should be a comment explaining why it's done the way
>> it is.
> 
> pgWalUsage was used without resetting and we (I) thought that that
> behavior should be preserved.  On second thought, as Andres suggested,
> we can just reset pgWalUsage at sending since AFAICS no one takes
> difference crossing pgstat_report_stat() calls.

Yes, I agree that we can do that since there seems no such code for now.
Also if we do that, we can check, for example "pgWalUsage.wal_records > 0"
as you suggested, to easily determine whether there is pending WAL stats or not.
Anyway I agree it's better to add comments about the design more.


>>>> 3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
>>>>     just to figure out if there's been any changes isn't all that
>>>>     cheap. This is regularly exercised in read-only workloads too. Seems
>>>>     adding a boolean WalStats.have_pending = true or such would be
>>>>     better.
>>>
>>> I understood that for backends, this may leads performance degradation and
>>> this problem is not only for the WalStats but also SLRUStats.
>>>
>>> I wondered the degradation is so big because pgstat_report_stat() checks if at
>>> least PGSTAT_STAT_INTERVAL msec is passed since it last sent before check the
>>> diff. If my understanding is correct, to get timestamp is more expensive.
>>
>> Getting a timestamp is expensive, yes. But I think we need to check for
>> the no-pending-wal-stats *before* the clock check. See the below:
>>
>>
>>>> 4) For plain backends pgstat_report_wal() is called by
>>>>     pgstat_report_stat() - but it is not checked as part of the "Don't
>>>>     expend a clock check if nothing to do" check at the top.  It's
>>>>     probably rare to have pending wal stats without also passing one of
>>>>     the other conditions, but ...
>>>
>>> (I'm not confidence my understanding of your comment is right.)
>>> You mean that we need to expend a clock check in pgstat_report_wal()?
>>> I think it's unnecessary because pgstat_report_stat() is already checked it.
>>
>> No, I mean that right now we might can erroneously return early
>> pgstat_report_wal() for normal backends in some workloads:
>>
>> void
>> pgstat_report_stat(bool disconnect)
>> ...
>>     /* Don't expend a clock check if nothing to do */
>>     if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) &&
>>         pgStatXactCommit == 0 && pgStatXactRollback == 0 &&
>>         !have_function_stats && !disconnect)
>>         return;
>>
>> might return if there only is pending WAL activity. This needs to check
>> whether there are pending WAL stats. Which in turn means that the check
>> should be fast.
> 
> Agreed that the condition is wrong.  On the other hand, the counters
> are incremented in XLogInsertRecord() and I think we don't want add
> instructions there.

Basically yes. We should avoid that especially while WALInsertLock is being hold.
But it's not so harmful to do that after the lock is released?

> If any wal activity has been recorded, wal_records is always positive
> thus we can check for wal activity just by "pgWalUsage.wal_records >
> 0, which should be fast enough.

Maybe there is the case where a backend generates no WAL records,
but just writes them because it needs to do write-ahead-logging
when flush the table data? If yes, "pgWalUsage.wal_records > 0" is not enough.
Probably other fields also need to be checked.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: wal stats questions

From

Kyotaro Horiguchi

Date:

26 March 2021, 01:08:28

At Thu, 25 Mar 2021 19:01:23 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> On 2021/03/25 16:37, Kyotaro Horiguchi wrote:
> > pgWalUsage was used without resetting and we (I) thought that that
> > behavior should be preserved.  On second thought, as Andres suggested,
> > we can just reset pgWalUsage at sending since AFAICS no one takes
> > difference crossing pgstat_report_stat() calls.
> 
> Yes, I agree that we can do that since there seems no such code for
> now.
> Also if we do that, we can check, for example "pgWalUsage.wal_records
> > 0"
> as you suggested, to easily determine whether there is pending WAL
> stats or not.
> Anyway I agree it's better to add comments about the design more.
...
> > If any wal activity has been recorded, wal_records is always positive
> > thus we can check for wal activity just by "pgWalUsage.wal_records >
> > 0, which should be fast enough.
> 
> Maybe there is the case where a backend generates no WAL records,
> but just writes them because it needs to do write-ahead-logging
> when flush the table data? If yes, "pgWalUsage.wal_records > 0" is not
> enough.
> Probably other fields also need to be checked.

(I noticed I made the discussion above unconsciously premising
pgWalUsage reset.)

I may be misunderstanding or missing something, but the only place
where pgWalUsage counters are increased is XLogInsertRecrod.  That is,
pgWalUsage counts wal insertions, not writes nor flushes.  AFAICS
pgWalUsage.wal_records is always incremented when other counters are
increased.  Looking from another side, we should refrain from adding
counters that incrases at a different time than
pgWalUsage.wal_recrods. (That should be written as a comment there.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: wal stats questions

From

Fujii Masao

Date:

26 March 2021, 01:32:23


On 2021/03/26 10:08, Kyotaro Horiguchi wrote:
> At Thu, 25 Mar 2021 19:01:23 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in
>> On 2021/03/25 16:37, Kyotaro Horiguchi wrote:
>>> pgWalUsage was used without resetting and we (I) thought that that
>>> behavior should be preserved.  On second thought, as Andres suggested,
>>> we can just reset pgWalUsage at sending since AFAICS no one takes
>>> difference crossing pgstat_report_stat() calls.
>>
>> Yes, I agree that we can do that since there seems no such code for
>> now.
>> Also if we do that, we can check, for example "pgWalUsage.wal_records
>>> 0"
>> as you suggested, to easily determine whether there is pending WAL
>> stats or not.
>> Anyway I agree it's better to add comments about the design more.
> ...
>>> If any wal activity has been recorded, wal_records is always positive
>>> thus we can check for wal activity just by "pgWalUsage.wal_records >
>>> 0, which should be fast enough.
>>
>> Maybe there is the case where a backend generates no WAL records,
>> but just writes them because it needs to do write-ahead-logging
>> when flush the table data? If yes, "pgWalUsage.wal_records > 0" is not
>> enough.
>> Probably other fields also need to be checked.
> 
> (I noticed I made the discussion above unconsciously premising
> pgWalUsage reset.)
> 
> I may be misunderstanding or missing something, but the only place
> where pgWalUsage counters are increased is XLogInsertRecrod.  That is,
> pgWalUsage counts wal insertions, not writes nor flushes.  AFAICS

Yes. And WalStats instead of pgWalUsage includes the stats about
not only WAL insertions, but also writes and flushes.
pgstat_send_wal() checks WalStats to determine whether there are
pending WAL stats to send to the stats collector or not. That is,
the counters of not only WAL insertions but also writes and flushes
should be checked to determine whether there are pending stats or not, I think..

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: wal stats questions

From

Kyotaro Horiguchi

Date:

26 March 2021, 03:47:49

At Fri, 26 Mar 2021 10:32:23 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> > I may be misunderstanding or missing something, but the only place
> > where pgWalUsage counters are increased is XLogInsertRecrod.  That is,
> > pgWalUsage counts wal insertions, not writes nor flushes.  AFAICS
> 
> Yes. And WalStats instead of pgWalUsage includes the stats about
> not only WAL insertions, but also writes and flushes.

Ugh! I was missing a very large blob.. Ok, we need additional check
for non-pgWalUsage part. Sorry.

> pgstat_send_wal() checks WalStats to determine whether there are
> pending WAL stats to send to the stats collector or not. That is,
> the counters of not only WAL insertions but also writes and flushes
> should be checked to determine whether there are pending stats or not,
> I think..

I think we may have an additional flag to notify about io-stat part,
in constrast to wal-insertion part . Anyway we do additional
INSTR_TIME_SET_CURRENT when track_wal_io_timinge.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: wal stats questions

From

Masahiro Ikeda

Date:

26 March 2021, 07:20:04

Thanks for many your suggestions!
I made the patch to handle the issues.

> 1) What is the motivation to have both prevWalUsage and pgWalUsage,
>    instead of just accumulating the stats since the last submission?
>    There doesn't seem to be any comment explaining it? Computing
>    differences to previous values, and copying to prev*, isn't free. I
>    assume this is out of a fear that the stats could get reset before
>    they're used for instrument.c purposes - but who knows?

I removed the unnecessary code copying pgWalUsage and just reset the
pgWalUsage after reporting the stats in pgstat_report_wal().


> 2) Why is there both pgstat_send_wal() and pgstat_report_wal()? With the
>    former being used by wal writer, the latter by most other processes?
>    There again don't seem to be comments explaining this.

I added the comments why two functions are separated.
(But is it better to merge them?)


> 3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
>    just to figure out if there's been any changes isn't all that
>    cheap. This is regularly exercised in read-only workloads too. Seems
>    adding a boolean WalStats.have_pending = true or such would be
>    better.
> 4) For plain backends pgstat_report_wal() is called by
>    pgstat_report_stat() - but it is not checked as part of the "Don't
>    expend a clock check if nothing to do" check at the top.  It's
>    probably rare to have pending wal stats without also passing one of
>    the other conditions, but ...

I added the logic to check if the stats counters are updated or not in
pgstat_report_stat() using not only generated wal record but also write/sync
counters, and it can skip to call reporting function.

Although I added the condition which the write/sync counters are updated or
not, I haven't understood the following case yet...Sorry. I checked related
code and tested to insert large object, but I couldn't. I'll investigate more
deeply, but if you already know the function which leads the following case,
please let me know.

> Maybe there is the case where a backend generates no WAL records,
> but just writes them because it needs to do write-ahead-logging
> when flush the table data?

> Ugh! I was missing a very large blob.. Ok, we need additional check
> for non-pgWalUsage part. Sorry.


Regards,

-- 
Masahiro Ikeda
NTT DATA CORPORATION

Attachment

v1-0001-performance-improvements-of-reporting-wal-stats.patch

Re: wal stats questions

From

Andres Freund

Date:

26 March 2021, 17:07:45

Hi,

On 2021-03-25 16:37:10 +0900, Kyotaro Horiguchi wrote:
> On the other hand, the counters are incremented in XLogInsertRecord()
> and I think we don't want add instructions there.

I don't really buy this. Setting a boolean to true, in a cacheline
you're already touching, isn't that much compared to all the other stuff
in there. The branch to check if wal stats timing etc is enabled is much
more expensive.  I think we should just set a boolean to true and leave
it at that.

Greetings,

Andres Freund

Re: wal stats questions

From

Kyotaro Horiguchi

Date:

29 March 2021, 02:09:00

At Fri, 26 Mar 2021 10:07:45 -0700, Andres Freund <andres@anarazel.de> wrote in 
> Hi,
> 
> On 2021-03-25 16:37:10 +0900, Kyotaro Horiguchi wrote:
> > On the other hand, the counters are incremented in XLogInsertRecord()
> > and I think we don't want add instructions there.
> 
> I don't really buy this. Setting a boolean to true, in a cacheline
> you're already touching, isn't that much compared to all the other stuff
> in there. The branch to check if wal stats timing etc is enabled is much
> more expensive.  I think we should just set a boolean to true and leave
> it at that.

Hmm. Yes, I agree to you in that opinion.  I (remember I) was told not
to add even a cycle to the hot path as far as we can avoid when I
tried something like that.

So I'm happy to +1 for that if it is the consensus here, since it is
cleaner.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: wal stats questions

From

Kyotaro Horiguchi

Date:

29 March 2021, 02:11:00

At Mon, 29 Mar 2021 11:09:00 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> At Fri, 26 Mar 2021 10:07:45 -0700, Andres Freund <andres@anarazel.de> wrote in 
> > Hi,
> > 
> > On 2021-03-25 16:37:10 +0900, Kyotaro Horiguchi wrote:
> > > On the other hand, the counters are incremented in XLogInsertRecord()
> > > and I think we don't want add instructions there.
> > 
> > I don't really buy this. Setting a boolean to true, in a cacheline
> > you're already touching, isn't that much compared to all the other stuff
> > in there. The branch to check if wal stats timing etc is enabled is much
> > more expensive.  I think we should just set a boolean to true and leave
> > it at that.
> 
> Hmm. Yes, I agree to you in that opinion.  I (remember I) was told not

It might sound differently.. To be precise, "I had the same opinion
with you".

> to add even a cycle to the hot path as far as we can avoid when I
> tried something like that.
> 
> So I'm happy to +1 for that if it is the consensus here, since it is
> cleaner.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: wal stats questions

From

Masahiro Ikeda

Date:

30 March 2021, 00:41:24

I update the patch since there were my misunderstanding points.

On 2021/03/26 16:20, Masahiro Ikeda wrote:
> Thanks for many your suggestions!
> I made the patch to handle the issues.
> 
>> 1) What is the motivation to have both prevWalUsage and pgWalUsage,
>>    instead of just accumulating the stats since the last submission?
>>    There doesn't seem to be any comment explaining it? Computing
>>    differences to previous values, and copying to prev*, isn't free. I
>>    assume this is out of a fear that the stats could get reset before
>>    they're used for instrument.c purposes - but who knows?
> 
> I removed the unnecessary code copying pgWalUsage and just reset the
> pgWalUsage after reporting the stats in pgstat_report_wal().

I didn't change this.

>> 2) Why is there both pgstat_send_wal() and pgstat_report_wal()? With the
>>    former being used by wal writer, the latter by most other processes?
>>    There again don't seem to be comments explaining this.
> 
> I added the comments why two functions are separated.
> (But is it better to merge them?)

I updated the comments for following reasons.

>> 3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
>>    just to figure out if there's been any changes isn't all that
>>    cheap. This is regularly exercised in read-only workloads too. Seems
>>    adding a boolean WalStats.have_pending = true or such would be
>>    better.
>> 4) For plain backends pgstat_report_wal() is called by
>>    pgstat_report_stat() - but it is not checked as part of the "Don't
>>    expend a clock check if nothing to do" check at the top.  It's
>>    probably rare to have pending wal stats without also passing one of
>>    the other conditions, but ...
> 
> I added the logic to check if the stats counters are updated or not in
> pgstat_report_stat() using not only generated wal record but also write/sync
> counters, and it can skip to call reporting function.

I removed the checking code whether the wal stats counters were updated or not
in pgstat_report_stat() since I couldn't understand the valid case yet.
pgstat_report_stat() is called by backends when the transaction is finished.
This means that the condition seems always pass.

I thought to implement if the WAL stats counters were not updated, skip to
send all statistics including the counters for databases and so on. But I
think it's not good because it may take more time to be reflected the
generated stats by read-only transaction.

> Although I added the condition which the write/sync counters are updated or
> not, I haven't understood the following case yet...Sorry. I checked related
> code and tested to insert large object, but I couldn't. I'll investigate more
> deeply, but if you already know the function which leads the following case,
> please let me know.

I understood the above case (Fujii-san, thanks for your advice in person).
When to flush buffers, for example, to select buffer replacement victim,
it requires a WAL flush if the buffer is dirty. So, to check the WAL stats
counters are updated or not, I check the number of generated wal record and
the counter of syncing in pgstat_report_wal().

The reason why not to check the counter of writing is that if to write is
happened, to sync is happened too in the above case. I added the comments in
the patch.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION

Attachment

v2-0001-performance-improvements-of-reporting-wal-stats.patch

Re: wal stats questions

From

Kyotaro Horiguchi

Date:

30 March 2021, 08:28:43

At Tue, 30 Mar 2021 09:41:24 +0900, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote in 
> I update the patch since there were my misunderstanding points.
> 
> On 2021/03/26 16:20, Masahiro Ikeda wrote:
> > Thanks for many your suggestions!
> > I made the patch to handle the issues.
> > 
> >> 1) What is the motivation to have both prevWalUsage and pgWalUsage,
> >>    instead of just accumulating the stats since the last submission?
> >>    There doesn't seem to be any comment explaining it? Computing
> >>    differences to previous values, and copying to prev*, isn't free. I
> >>    assume this is out of a fear that the stats could get reset before
> >>    they're used for instrument.c purposes - but who knows?
> > 
> > I removed the unnecessary code copying pgWalUsage and just reset the
> > pgWalUsage after reporting the stats in pgstat_report_wal().
> 
> I didn't change this.
> 
> >> 2) Why is there both pgstat_send_wal() and pgstat_report_wal()? With the
> >>    former being used by wal writer, the latter by most other processes?
> >>    There again don't seem to be comments explaining this.
> > 
> > I added the comments why two functions are separated.
> > (But is it better to merge them?)
> 
> I updated the comments for following reasons.
> 
> 
> >> 3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
> >>    just to figure out if there's been any changes isn't all that
> >>    cheap. This is regularly exercised in read-only workloads too. Seems
> >>    adding a boolean WalStats.have_pending = true or such would be
> >>    better.
> >> 4) For plain backends pgstat_report_wal() is called by
> >>    pgstat_report_stat() - but it is not checked as part of the "Don't
> >>    expend a clock check if nothing to do" check at the top.  It's
> >>    probably rare to have pending wal stats without also passing one of
> >>    the other conditions, but ...
> > 
> > I added the logic to check if the stats counters are updated or not in
> > pgstat_report_stat() using not only generated wal record but also write/sync
> > counters, and it can skip to call reporting function.
> 
> I removed the checking code whether the wal stats counters were updated or not
> in pgstat_report_stat() since I couldn't understand the valid case yet.
> pgstat_report_stat() is called by backends when the transaction is finished.
> This means that the condition seems always pass.

Doesn't the same holds for all other counters?  If you are saying that
"wal counters should be zero if all other stats counters are zero", we
need an assertion to check that and a comment to explain that.

I personally find it safer to add the WAL-stats condition to the
fast-return check, rather than addin such assertion.

pgstat_send_wal() is called mainly from pgstat_report_wal() which
consumes pgWalStats counters and WalWriterMain() which
doesn't. Checking on pgWalStats counters isn't so complex that we need
to avoid that in wal writer, thus *I* think pgstat_send_wal() and
pgstat_report_wal() can be consolidated.  Even if you have a strong
opinion that wal writer should call a separate function, the name
should be something other than pgstat_send_wal() since it ignores
pgWalUsage counters, which are supposed to be included in "WAL stats".


> I thought to implement if the WAL stats counters were not updated, skip to
> send all statistics including the counters for databases and so on. But I
> think it's not good because it may take more time to be reflected the
> generated stats by read-only transaction.

Ur, yep.

> > Although I added the condition which the write/sync counters are updated or
> > not, I haven't understood the following case yet...Sorry. I checked related
> > code and tested to insert large object, but I couldn't. I'll investigate more
> > deeply, but if you already know the function which leads the following case,
> > please let me know.
> 
> I understood the above case (Fujii-san, thanks for your advice in person).
> When to flush buffers, for example, to select buffer replacement victim,
> it requires a WAL flush if the buffer is dirty. So, to check the WAL stats
> counters are updated or not, I check the number of generated wal record and
> the counter of syncing in pgstat_report_wal().
> 
> The reason why not to check the counter of writing is that if to write is
> happened, to sync is happened too in the above case. I added the comments in
> the patch.

Mmm..  Although I couldn't read you clearly, The fact that a flush may
happen without a write means the reverse at the same time, a write may
happen without a flush.  For asynchronous commits, WAL-write happens
alone unaccompanied by a flush.  And the corresponding flush would
happen later without writes.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: wal stats questions

From

Masahiro Ikeda

Date:

30 March 2021, 11:37:36

On 2021/03/30 17:28, Kyotaro Horiguchi wrote:
> At Tue, 30 Mar 2021 09:41:24 +0900, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote in 
>> I update the patch since there were my misunderstanding points.
>>
>> On 2021/03/26 16:20, Masahiro Ikeda wrote:
>>> Thanks for many your suggestions!
>>> I made the patch to handle the issues.
>>>
>>>> 1) What is the motivation to have both prevWalUsage and pgWalUsage,
>>>>    instead of just accumulating the stats since the last submission?
>>>>    There doesn't seem to be any comment explaining it? Computing
>>>>    differences to previous values, and copying to prev*, isn't free. I
>>>>    assume this is out of a fear that the stats could get reset before
>>>>    they're used for instrument.c purposes - but who knows?
>>>
>>> I removed the unnecessary code copying pgWalUsage and just reset the
>>> pgWalUsage after reporting the stats in pgstat_report_wal().
>>
>> I didn't change this.
>>
>>>> 2) Why is there both pgstat_send_wal() and pgstat_report_wal()? With the
>>>>    former being used by wal writer, the latter by most other processes?
>>>>    There again don't seem to be comments explaining this.
>>>
>>> I added the comments why two functions are separated.
>>> (But is it better to merge them?)
>>
>> I updated the comments for following reasons.
>>
>>
>>>> 3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
>>>>    just to figure out if there's been any changes isn't all that
>>>>    cheap. This is regularly exercised in read-only workloads too. Seems
>>>>    adding a boolean WalStats.have_pending = true or such would be
>>>>    better.
>>>> 4) For plain backends pgstat_report_wal() is called by
>>>>    pgstat_report_stat() - but it is not checked as part of the "Don't
>>>>    expend a clock check if nothing to do" check at the top.  It's
>>>>    probably rare to have pending wal stats without also passing one of
>>>>    the other conditions, but ...
>>>
>>> I added the logic to check if the stats counters are updated or not in
>>> pgstat_report_stat() using not only generated wal record but also write/sync
>>> counters, and it can skip to call reporting function.
>>
>> I removed the checking code whether the wal stats counters were updated or not
>> in pgstat_report_stat() since I couldn't understand the valid case yet.
>> pgstat_report_stat() is called by backends when the transaction is finished.
>> This means that the condition seems always pass.
> 
> Doesn't the same holds for all other counters?  If you are saying that
> "wal counters should be zero if all other stats counters are zero", we
> need an assertion to check that and a comment to explain that.
> 
> I personally find it safer to add the WAL-stats condition to the
> fast-return check, rather than addin such assertion.
Thanks for your comments.

OK, I added the condition to the fast-return check. I noticed that I
misunderstood that the purpose is to avoid expanding a clock check using WAL
stats counters. But, the purpose is to make the conditions stricter, right?


> pgstat_send_wal() is called mainly from pgstat_report_wal() which
> consumes pgWalStats counters and WalWriterMain() which
> doesn't. Checking on pgWalStats counters isn't so complex that we need
> to avoid that in wal writer, thus *I* think pgstat_send_wal() and
> pgstat_report_wal() can be consolidated.  Even if you have a strong
> opinion that wal writer should call a separate function, the name
> should be something other than pgstat_send_wal() since it ignores
> pgWalUsage counters, which are supposed to be included in "WAL stats".

OK, I consolidated them.



>> I thought to implement if the WAL stats counters were not updated, skip to
>> send all statistics including the counters for databases and so on. But I
>> think it's not good because it may take more time to be reflected the
>> generated stats by read-only transaction.
> 
> Ur, yep.
> 
>>> Although I added the condition which the write/sync counters are updated or
>>> not, I haven't understood the following case yet...Sorry. I checked related
>>> code and tested to insert large object, but I couldn't. I'll investigate more
>>> deeply, but if you already know the function which leads the following case,
>>> please let me know.
>>
>> I understood the above case (Fujii-san, thanks for your advice in person).
>> When to flush buffers, for example, to select buffer replacement victim,
>> it requires a WAL flush if the buffer is dirty. So, to check the WAL stats
>> counters are updated or not, I check the number of generated wal record and
>> the counter of syncing in pgstat_report_wal().
>>
>> The reason why not to check the counter of writing is that if to write is
>> happened, to sync is happened too in the above case. I added the comments in
>> the patch.
> 
> Mmm..  Although I couldn't read you clearly, The fact that a flush may
> happen without a write means the reverse at the same time, a write may
> happen without a flush.  For asynchronous commits, WAL-write happens
> alone unaccompanied by a flush.  And the corresponding flush would
> happen later without writes.

Sorry, I didn't explain it enough.

For processes which may generate WAL records like backends, I thought it's
enough to check (1)the number of generated WAL records and (2)the counters of
syncing(flushing) the WAL. This is checked in pgstat_report_wal(). Sorry for
that I didn't mention (1) in the above thread.

If a backend execute a write transaction, some WAL records must be generated.
So, it's ok to check (1) only regardless of whether asynchronous commit is
enabled or not.

OHOT, if a backend execute a read-only transaction, WAL records won't be
generated (although HOT makes a wal records exactly...). But, WAL-write and
flush may happen when to flush buffers via XLogFlush(). In this case, if
WAL-write happened, flush must be happen later. But, if my understanding is
correct, there is no a case to flush doesn't happen, but to write happen.
So, I thought (2) is needed and it's enough to check the counter of
syncing(flushing).


Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION

Attachment

v3-0001-performance-improvements-of-reporting-wal-stats.patch

Re: wal stats questions

From

Fujii Masao

Date:

13 April 2021, 00:33:12

On 2021/03/30 20:37, Masahiro Ikeda wrote:
> OK, I added the condition to the fast-return check. I noticed that I
> misunderstood that the purpose is to avoid expanding a clock check using WAL
> stats counters. But, the purpose is to make the conditions stricter, right?

Yes. Currently if the following condition is false even when the WAL counters
are updated, nothing is sent to the stats collector. But with your patch,
in this case the WAL stats are sent.

    if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) &&
        pgStatXactCommit == 0 && pgStatXactRollback == 0 &&
        !have_function_stats && !disconnect)

Thanks for the patch! It now fails to be applied to the master cleanly.
So could you rebase the patch?

pgstat_initialize() should initialize pgWalUsage with zero?

+    /*
+     * set the counters related to generated WAL data.
+     */
+    WalStats.m_wal_records = pgWalUsage.wal_records;
+    WalStats.m_wal_fpi = pgWalUsage.wal_fpi;
+    WalStats.m_wal_bytes = pgWalUsage.wal_bytes;

This should be skipped if pgWalUsage.wal_records is zero?

+ * Be careful that the counters are cleared after reporting them to
+ * the stats collector although you can use WalUsageAccumDiff()
+ * to computing differences to previous values. For backends,
+ * the counters may be reset after a transaction is finished and
+ * pgstat_send_wal() is invoked, so you can compute the difference
+ * in the same transaction only.

On the second thought, I'm afraid that this can be likely to be a foot-gun
in the future. So I'm now wondering if the current approach (i.e., calculate
the diff between the current and previous pgWalUsage and don't reset it
to zero) is better. Thought? Since the similar data structure pgBufferUsage
doesn't have this kind of restriction, I'm afraid that the developer can
easily miss that only pgWalUsage has the restriction.

But currently the diff is calculated (i.e., WalUsageAccumDiff() is called)
even when the WAL counters are not updated. Then if that calculated diff is
zero, we skip sending the WAL stats. This logic should be improved. That is,
probably we should be able to check whether the WAL counters are updated
or not, without calculating the diff, because the calculation is not free.
We can implement this by introducing new flag variable that we discussed
upthread. This flag is set to true whenever the WAL counters are incremented
and used to determine whether the WAL stats need to be sent.

If we do this, another issue is that the data types for wal_records and wal_fpi
in pgWalUsage are long. Which may lead to overflow? If yes, it's should be
replaced with uint64.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: wal stats questions

From

Masahiro Ikeda

Date:

16 April 2021, 01:27:11


On 2021/04/13 9:33, Fujii Masao wrote:
> 
> 
> On 2021/03/30 20:37, Masahiro Ikeda wrote:
>> OK, I added the condition to the fast-return check. I noticed that I
>> misunderstood that the purpose is to avoid expanding a clock check using WAL
>> stats counters. But, the purpose is to make the conditions stricter, right?
> 
> Yes. Currently if the following condition is false even when the WAL counters
> are updated, nothing is sent to the stats collector. But with your patch,
> in this case the WAL stats are sent.
> 
>     if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) &&
>         pgStatXactCommit == 0 && pgStatXactRollback == 0 &&
>         !have_function_stats && !disconnect)
> 
> Thanks for the patch! It now fails to be applied to the master cleanly.
> So could you rebase the patch?

Thanks for your comments!
I rebased it.


> pgstat_initialize() should initialize pgWalUsage with zero?

Thanks. But, I didn't handle it because I undo the logic to calculate the diff
as you mentioned below.


> +    /*
> +     * set the counters related to generated WAL data.
> +     */
> +    WalStats.m_wal_records = pgWalUsage.wal_records;
> +    WalStats.m_wal_fpi = pgWalUsage.wal_fpi;
> +    WalStats.m_wal_bytes = pgWalUsage.wal_bytes;
> 
> This should be skipped if pgWalUsage.wal_records is zero?

Yes, fixed it.


> + * Be careful that the counters are cleared after reporting them to
> + * the stats collector although you can use WalUsageAccumDiff()
> + * to computing differences to previous values. For backends,
> + * the counters may be reset after a transaction is finished and
> + * pgstat_send_wal() is invoked, so you can compute the difference
> + * in the same transaction only.
> 
> On the second thought, I'm afraid that this can be likely to be a foot-gun
> in the future. So I'm now wondering if the current approach (i.e., calculate
> the diff between the current and previous pgWalUsage and don't reset it
> to zero) is better. Thought? Since the similar data structure pgBufferUsage
> doesn't have this kind of restriction, I'm afraid that the developer can
> easily miss that only pgWalUsage has the restriction.
> 
> But currently the diff is calculated (i.e., WalUsageAccumDiff() is called)
> even when the WAL counters are not updated. Then if that calculated diff is
> zero, we skip sending the WAL stats. This logic should be improved. That is,
> probably we should be able to check whether the WAL counters are updated
> or not, without calculating the diff, because the calculation is not free.
> We can implement this by introducing new flag variable that we discussed
> upthread. This flag is set to true whenever the WAL counters are incremented
> and used to determine whether the WAL stats need to be sent.

Sound reasonable. I agreed that the restriction has a risk to lead mistakes.
I made the patch introducing a new flag.

- v4-0001-performance-improvements-of-reporting-wal-stats.patch


I think introducing a new flag is not necessary because we can know if the WAL
stats are updated or not using the counters of the generated wal record, wal
writes and wal syncs. It's fast compared to get timestamp. The attached patch
is to check if the counters are updated or not using them.

-
v4-0001-performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch


> If we do this, another issue is that the data types for wal_records and wal_fpi
> in pgWalUsage are long. Which may lead to overflow? If yes, it's should be
> replaced with uint64.

Yes, I fixed. BufferUsage's counters have same issue, so I fixed them too.

BTW, is it better to change PgStat_Counter from int64 to uint64 because there
aren't any counters with negative number?

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION

Attachment

Re: wal stats questions

From

torikoshia

Date:

21 April 2021, 06:08:48

On 2021-04-16 10:27, Masahiro Ikeda wrote:
> On 2021/04/13 9:33, Fujii Masao wrote:
>> 
>> 
>> On 2021/03/30 20:37, Masahiro Ikeda wrote:
>>> OK, I added the condition to the fast-return check. I noticed that I
>>> misunderstood that the purpose is to avoid expanding a clock check 
>>> using WAL
>>> stats counters. But, the purpose is to make the conditions stricter, 
>>> right?
>> 
>> Yes. Currently if the following condition is false even when the WAL 
>> counters
>> are updated, nothing is sent to the stats collector. But with your 
>> patch,
>> in this case the WAL stats are sent.
>> 
>>     if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) &&
>>         pgStatXactCommit == 0 && pgStatXactRollback == 0 &&
>>         !have_function_stats && !disconnect)
>> 
>> Thanks for the patch! It now fails to be applied to the master 
>> cleanly.
>> So could you rebase the patch?
> 
> Thanks for your comments!
> I rebased it.

Thanks for working on this!

I have some minor comments on 
performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch.


177 @@ -3094,20 +3066,33 @@ pgstat_report_wal(void)
178   * Return true if the message is sent, and false otherwise.

Since you changed the return value to void, it seems the description is
not necessary anymore.

208 +        * generate wal records. 'wal_writes' and 'wal_sync' are 
zero means the

It may be better to change 'wal_writes' to 'wal_write' since single
quotation seems to mean variable name.

234 +        * set the counters related to generated WAL data if the 
counters are


set -> Set?


Regards,

Re: wal stats questions

From

Masahiro Ikeda

Date:

21 April 2021, 09:31:24


On 2021/04/21 15:08, torikoshia wrote:
> On 2021-04-16 10:27, Masahiro Ikeda wrote:
>> On 2021/04/13 9:33, Fujii Masao wrote:
>>> 
>>> 
>>> On 2021/03/30 20:37, Masahiro Ikeda wrote:
>>>> OK, I added the condition to the fast-return check. I noticed that I 
>>>> misunderstood that the purpose is to avoid expanding a clock check
>>>> using WAL stats counters. But, the purpose is to make the conditions
>>>> stricter, right?
>>> 
>>> Yes. Currently if the following condition is false even when the WAL
>>> counters are updated, nothing is sent to the stats collector. But with
>>> your patch, in this case the WAL stats are sent.
>>> 
>>> if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) && 
>>> pgStatXactCommit == 0 && pgStatXactRollback == 0 && 
>>> !have_function_stats && !disconnect)
>>> 
>>> Thanks for the patch! It now fails to be applied to the master
>>> cleanly. So could you rebase the patch?
>> 
>> Thanks for your comments! I rebased it.
> 
> Thanks for working on this!

Hi, thanks for your comments!


> I have some minor comments on 
> performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch.
>
> 
> 
> 
> 177 @@ -3094,20 +3066,33 @@ pgstat_report_wal(void) 178   * Return true if
> the message is sent, and false otherwise.
> 
> Since you changed the return value to void, it seems the description is not
> necessary anymore.

Right, I fixed it.


> 208 +        * generate wal records. 'wal_writes' and 'wal_sync' are zero 
> means the
> 
> It may be better to change 'wal_writes' to 'wal_write' since single 
> quotation seems to mean variable name.

Agreed.


> 234 +        * set the counters related to generated WAL data if the
> counters are
> 
> 
> set -> Set?

Yes, I fixed.

> BTW, is it better to change PgStat_Counter from int64 to uint64 because> there aren't any counters with negative
number?

Although this is not related to torikoshi-san's comment, the above my
understanding is not right. Some counters like delta_live_tuples,
delta_dead_tuples and changed_tuples can be negative.


Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION

Attachment

v5-0001-performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch

Re: wal stats questions

From

Fujii Masao

Date:

22 April 2021, 13:42:14


On 2021/04/21 18:31, Masahiro Ikeda wrote:
>> BTW, is it better to change PgStat_Counter from int64 to uint64 because> there aren't any counters with negative
number?

On second thought, it's ok even if the counters like wal_records get overflowed.
Because they are always used to calculate the diff between the previous and
current counters. Even when the current counters are overflowed and
the previous ones are not, WalUsageAccumDiff() seems to return the right
diff of them. If this understanding is right, I'd withdraw my comment and
it's ok to use "long" type for those counters. Thought?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: wal stats questions

From

"Andres Freund"

Date:

22 April 2021, 15:36:59

Hi

On Thu, Apr 22, 2021, at 06:42, Fujii Masao wrote:
> 
> 
> On 2021/04/21 18:31, Masahiro Ikeda wrote:
> >> BTW, is it better to change PgStat_Counter from int64 to uint64 because> there aren't any counters with negative
number?
> 
> On second thought, it's ok even if the counters like wal_records get overflowed.
> Because they are always used to calculate the diff between the previous and
> current counters. Even when the current counters are overflowed and
> the previous ones are not, WalUsageAccumDiff() seems to return the right
> diff of them. If this understanding is right, I'd withdraw my comment and
> it's ok to use "long" type for those counters. Thought?

Why long? It's of a platform dependent size, which doesn't seem useful here?

Andres

Re: wal stats questions

From

Masahiro Ikeda

Date:

23 April 2021, 00:26:17

On 2021/04/23 0:36, Andres Freund wrote:
> Hi
>
> On Thu, Apr 22, 2021, at 06:42, Fujii Masao wrote:
>> On 2021/04/21 18:31, Masahiro Ikeda wrote:
>>>> BTW, is it better to change PgStat_Counter from int64 to uint64 because> there aren't any counters with negative
number?
>> On second thought, it's ok even if the counters like wal_records get overflowed.
>> Because they are always used to calculate the diff between the previous and
>> current counters. Even when the current counters are overflowed and
>> the previous ones are not, WalUsageAccumDiff() seems to return the right
>> diff of them. If this understanding is right, I'd withdraw my comment and
>> it's ok to use "long" type for those counters. Thought?
>
> Why long? It's of a platform dependent size, which doesn't seem useful here?

I think it's ok to unify uint64. Although it's better to use small size for
cache, the idea works well for only some platform which use 4bytes for "long".

(Although I'm not familiar with the standardization...)
It seems that we need to adopt unsinged long if use "long", which may be 4bytes.

I though WalUsageAccumDiff() seems to return the right value too. But, I
researched deeply and found that ISO/IEC 9899:1999 defines unsinged integer
never overflow(2.6.5 Types 9th section) although it doesn't define overflow of
signed integer type.

If my understanding is right, the definition only guarantee
WalUsageAccumDiff() returns the right value only if the type is unsigned
integer. So, it's safe to change "long" to "unsigned long".

Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

Re: wal stats questions

From

Fujii Masao

Date:

23 April 2021, 00:51:40


On 2021/04/23 9:26, Masahiro Ikeda wrote:
> 
> 
> On 2021/04/23 0:36, Andres Freund wrote:
>> Hi
>>
>> On Thu, Apr 22, 2021, at 06:42, Fujii Masao wrote:
>>> On 2021/04/21 18:31, Masahiro Ikeda wrote:
>>>>> BTW, is it better to change PgStat_Counter from int64 to uint64 because> there aren't any counters with negative
number?
>>> On second thought, it's ok even if the counters like wal_records get overflowed.
>>> Because they are always used to calculate the diff between the previous and
>>> current counters. Even when the current counters are overflowed and
>>> the previous ones are not, WalUsageAccumDiff() seems to return the right
>>> diff of them. If this understanding is right, I'd withdraw my comment and
>>> it's ok to use "long" type for those counters. Thought?
>>
>> Why long? It's of a platform dependent size, which doesn't seem useful here?

I'm not sure why "long" was chosen for the counters in BufferUsage.
And then I guess that maybe we didn't change that because using "long"
for them caused no actual issue in practice.


> I think it's ok to unify uint64. Although it's better to use small size for
> cache, the idea works well for only some platform which use 4bytes for "long".
> 
> 
> (Although I'm not familiar with the standardization...)
> It seems that we need to adopt unsinged long if use "long", which may be 4bytes.
> 
> I though WalUsageAccumDiff() seems to return the right value too. But, I
> researched deeply and found that ISO/IEC 9899:1999 defines unsinged integer
> never overflow(2.6.5 Types 9th section) although it doesn't define overflow of
> signed integer type.
> 
> If my understanding is right, the definition only guarantee
> WalUsageAccumDiff() returns the right value only if the type is unsigned
> integer. So, it's safe to change "long" to "unsigned long".

Yes, we can change the counters so they use uint64. But if we do that,
ISTM that we need to change the code more than your patch does.
For example, even with the patch, pg_stat_statements uses Int64GetDatumFast()
to report the counter like shared_blks_hit, but this should be changed?
For example, "%ld" should be changed to "%llu" at the following code in
vacuumlazy.c? I think that we should check all codes that use the counters
whose types are changed to uint64.

            appendStringInfo(&buf,
                             _("WAL usage: %ld records, %ld full page images, %llu bytes"),
                             walusage.wal_records,
                             walusage.wal_fpi,
                             (unsigned long long) walusage.wal_bytes);

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: wal stats questions

From

Andres Freund

Date:

23 April 2021, 01:25:53

Hi,

On 2021-04-23 09:26:17 +0900, Masahiro Ikeda wrote:
> On 2021/04/23 0:36, Andres Freund wrote:
> > On Thu, Apr 22, 2021, at 06:42, Fujii Masao wrote:
> >> On 2021/04/21 18:31, Masahiro Ikeda wrote:
> >>>> BTW, is it better to change PgStat_Counter from int64 to uint64 because> there aren't any counters with negative
number?
> >> On second thought, it's ok even if the counters like wal_records get overflowed.
> >> Because they are always used to calculate the diff between the previous and
> >> current counters. Even when the current counters are overflowed and
> >> the previous ones are not, WalUsageAccumDiff() seems to return the right
> >> diff of them. If this understanding is right, I'd withdraw my comment and
> >> it's ok to use "long" type for those counters. Thought?
> > 
> > Why long? It's of a platform dependent size, which doesn't seem useful here?
> 
> I think it's ok to unify uint64. Although it's better to use small size for
> cache, the idea works well for only some platform which use 4bytes for "long".
> 
> 
> (Although I'm not familiar with the standardization...)
> It seems that we need to adopt unsinged long if use "long", which may be 4bytes.
> 
> I though WalUsageAccumDiff() seems to return the right value too. But, I
> researched deeply and found that ISO/IEC 9899:1999 defines unsinged integer
> never overflow(2.6.5 Types 9th section) although it doesn't define overflow of
> signed integer type.
> 
> If my understanding is right, the definition only guarantee
> WalUsageAccumDiff() returns the right value only if the type is unsigned
> integer. So, it's safe to change "long" to "unsigned long".

I think this should just use 64bit counters. Emitting more than 4
billion records in one transaction isn't actually particularly rare. And
obviously WalUsageAccumDiff() can't do anything about that, once
overflowed it overflowed.

Greetings,

Andres Freund

Re: wal stats questions

From

Fujii Masao

Date:

23 April 2021, 07:30:27


On 2021/04/23 10:25, Andres Freund wrote:
> Hi,
> 
> On 2021-04-23 09:26:17 +0900, Masahiro Ikeda wrote:
>> On 2021/04/23 0:36, Andres Freund wrote:
>>> On Thu, Apr 22, 2021, at 06:42, Fujii Masao wrote:
>>>> On 2021/04/21 18:31, Masahiro Ikeda wrote:
>>>>>> BTW, is it better to change PgStat_Counter from int64 to uint64 because> there aren't any counters with negative
number?
>>>> On second thought, it's ok even if the counters like wal_records get overflowed.
>>>> Because they are always used to calculate the diff between the previous and
>>>> current counters. Even when the current counters are overflowed and
>>>> the previous ones are not, WalUsageAccumDiff() seems to return the right
>>>> diff of them. If this understanding is right, I'd withdraw my comment and
>>>> it's ok to use "long" type for those counters. Thought?
>>>
>>> Why long? It's of a platform dependent size, which doesn't seem useful here?
>>
>> I think it's ok to unify uint64. Although it's better to use small size for
>> cache, the idea works well for only some platform which use 4bytes for "long".
>>
>>
>> (Although I'm not familiar with the standardization...)
>> It seems that we need to adopt unsinged long if use "long", which may be 4bytes.
>>
>> I though WalUsageAccumDiff() seems to return the right value too. But, I
>> researched deeply and found that ISO/IEC 9899:1999 defines unsinged integer
>> never overflow(2.6.5 Types 9th section) although it doesn't define overflow of
>> signed integer type.
>>
>> If my understanding is right, the definition only guarantee
>> WalUsageAccumDiff() returns the right value only if the type is unsigned
>> integer. So, it's safe to change "long" to "unsigned long".
> 
> I think this should just use 64bit counters. Emitting more than 4
> billion records in one transaction isn't actually particularly rare. And

Right. I agree to change the types of the counters to int64.

I think that it's better to make this change as a separate patch from
the changes for pg_stat_wal view.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: wal stats questions

From

Masahiro Ikeda

Date:

26 April 2021, 01:11:39


On 2021/04/23 16:30, Fujii Masao wrote:
> 
> 
> On 2021/04/23 10:25, Andres Freund wrote:
>> Hi,
>>
>> On 2021-04-23 09:26:17 +0900, Masahiro Ikeda wrote:
>>> On 2021/04/23 0:36, Andres Freund wrote:
>>>> On Thu, Apr 22, 2021, at 06:42, Fujii Masao wrote:
>>>>> On 2021/04/21 18:31, Masahiro Ikeda wrote:
>>>>>>> BTW, is it better to change PgStat_Counter from int64 to uint64
>>>>>>> because> there aren't any counters with negative number?
>>>>> On second thought, it's ok even if the counters like wal_records get
>>>>> overflowed.
>>>>> Because they are always used to calculate the diff between the previous and
>>>>> current counters. Even when the current counters are overflowed and
>>>>> the previous ones are not, WalUsageAccumDiff() seems to return the right
>>>>> diff of them. If this understanding is right, I'd withdraw my comment and
>>>>> it's ok to use "long" type for those counters. Thought?
>>>>
>>>> Why long? It's of a platform dependent size, which doesn't seem useful here?
>>>
>>> I think it's ok to unify uint64. Although it's better to use small size for
>>> cache, the idea works well for only some platform which use 4bytes for "long".
>>>
>>>
>>> (Although I'm not familiar with the standardization...)
>>> It seems that we need to adopt unsinged long if use "long", which may be
>>> 4bytes.
>>>
>>> I though WalUsageAccumDiff() seems to return the right value too. But, I
>>> researched deeply and found that ISO/IEC 9899:1999 defines unsinged integer
>>> never overflow(2.6.5 Types 9th section) although it doesn't define overflow of
>>> signed integer type.
>>>
>>> If my understanding is right, the definition only guarantee
>>> WalUsageAccumDiff() returns the right value only if the type is unsigned
>>> integer. So, it's safe to change "long" to "unsigned long".
>>
>> I think this should just use 64bit counters. Emitting more than 4
>> billion records in one transaction isn't actually particularly rare. And
> 
> Right. I agree to change the types of the counters to int64.
> 
> I think that it's better to make this change as a separate patch from
> the changes for pg_stat_wal view.

Thanks for your comments.
OK, I separate two patches.

First patch has only the changes for pg_stat_wal view.
("v6-0001-performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch")

Second one has the changes for the type of the BufferUsage's and WalUsage's
members. I change the type from long to int64. Is it better to make new thread?
("v6-0002-change-the-data-type-of-XXXUsage-from-long-to-int64.patch")

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION

Attachment

Re: wal stats questions

From

Fujii Masao

Date:

27 April 2021, 12:56:20


On 2021/04/26 10:11, Masahiro Ikeda wrote:
> 
> 
> On 2021/04/23 16:30, Fujii Masao wrote:
>>
>>
>> On 2021/04/23 10:25, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2021-04-23 09:26:17 +0900, Masahiro Ikeda wrote:
>>>> On 2021/04/23 0:36, Andres Freund wrote:
>>>>> On Thu, Apr 22, 2021, at 06:42, Fujii Masao wrote:
>>>>>> On 2021/04/21 18:31, Masahiro Ikeda wrote:
>>>>>>>> BTW, is it better to change PgStat_Counter from int64 to uint64
>>>>>>>> because> there aren't any counters with negative number?
>>>>>> On second thought, it's ok even if the counters like wal_records get
>>>>>> overflowed.
>>>>>> Because they are always used to calculate the diff between the previous and
>>>>>> current counters. Even when the current counters are overflowed and
>>>>>> the previous ones are not, WalUsageAccumDiff() seems to return the right
>>>>>> diff of them. If this understanding is right, I'd withdraw my comment and
>>>>>> it's ok to use "long" type for those counters. Thought?
>>>>>
>>>>> Why long? It's of a platform dependent size, which doesn't seem useful here?
>>>>
>>>> I think it's ok to unify uint64. Although it's better to use small size for
>>>> cache, the idea works well for only some platform which use 4bytes for "long".
>>>>
>>>>
>>>> (Although I'm not familiar with the standardization...)
>>>> It seems that we need to adopt unsinged long if use "long", which may be
>>>> 4bytes.
>>>>
>>>> I though WalUsageAccumDiff() seems to return the right value too. But, I
>>>> researched deeply and found that ISO/IEC 9899:1999 defines unsinged integer
>>>> never overflow(2.6.5 Types 9th section) although it doesn't define overflow of
>>>> signed integer type.
>>>>
>>>> If my understanding is right, the definition only guarantee
>>>> WalUsageAccumDiff() returns the right value only if the type is unsigned
>>>> integer. So, it's safe to change "long" to "unsigned long".
>>>
>>> I think this should just use 64bit counters. Emitting more than 4
>>> billion records in one transaction isn't actually particularly rare. And
>>
>> Right. I agree to change the types of the counters to int64.
>>
>> I think that it's better to make this change as a separate patch from
>> the changes for pg_stat_wal view.
> 
> Thanks for your comments.
> OK, I separate two patches.

Thanks!


> 
> First patch has only the changes for pg_stat_wal view.
> ("v6-0001-performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch")

+        pgWalUsage.wal_records == prevWalUsage.wal_records &&
+        walStats.wal_write == 0 && walStats.wal_sync == 0 &&

WalStats.m_wal_write should be checked here instead of walStats.wal_write?

Is there really the case where the number of sync is larger than zero when
the number of writes is zero? If not, it's enough to check only the number
of writes?

+     * wal records weren't generated. So, the counters of 'wal_fpi',
+     * 'wal_bytes', 'm_wal_buffers_full' are not updated neither.

It's better to add the assertion check that confirms
m_wal_buffers_full == 0 whenever wal_records is larger than zero?

> 
> Second one has the changes for the type of the BufferUsage's and WalUsage's
> members. I change the type from long to int64. Is it better to make new thread?
> ("v6-0002-change-the-data-type-of-XXXUsage-from-long-to-int64.patch")

Will review the patch later. I'm ok to discuss that in this thread.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: wal stats questions

From

Masahiro Ikeda

Date:

28 April 2021, 00:10:21


On 2021/04/27 21:56, Fujii Masao wrote:
> 
> 
> On 2021/04/26 10:11, Masahiro Ikeda wrote:
>>
>> First patch has only the changes for pg_stat_wal view.
>> ("v6-0001-performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch")
>>
> 
> +        pgWalUsage.wal_records == prevWalUsage.wal_records &&
> +        walStats.wal_write == 0 && walStats.wal_sync == 0 &&
> > WalStats.m_wal_write should be checked here instead of walStats.wal_write?

Thanks! Yes, I'll fix it.


> Is there really the case where the number of sync is larger than zero when
> the number of writes is zero? If not, it's enough to check only the number
> of writes?

I thought that there is the case if "wal_sync_method" is fdatasync, fsync or
fsync_writethrough. The example case is following.

(1) backend-1 writes the wal data because wal buffer has no space. But, it
doesn't sync the wal data.
(2) backend-2 reads data pages. In the execution, it need to write and sync
the wal because dirty pages is selected as victim pages. backend-2 need to
only sync the wal data because the wal data were already written by backend-1,
but they weren't synced.

I'm ok to change it since it's rare case.


> +     * wal records weren't generated. So, the counters of 'wal_fpi',
> +     * 'wal_bytes', 'm_wal_buffers_full' are not updated neither.
> 
> It's better to add the assertion check that confirms
> m_wal_buffers_full == 0 whenever wal_records is larger than zero?

Sorry, I couldn't understand yet. I thought that m_wal_buffers_full can be
larger than 0 if wal_records > 0.

Do you suggest that the following assertion is needed?

-       if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
-               return false;
+       if (pgWalUsage.wal_records == prevWalUsage.wal_records &&
+               WalStats.m_wal_write == 0 && WalStats.m_wal_sync == 0)
+       {
+               Assert(pgWalUsage.wal_fpi == 0 && pgWalUsage.wal_bytes &&
+                               WalStats.m_wal_buffers_full == 0 &&
WalStats.m_wal_write_time == 0 &&
+                               WalStats.m_wal_sync_time == 0);
+               return;
+       }


>> Second one has the changes for the type of the BufferUsage's and WalUsage's
>> members. I change the type from long to int64. Is it better to make new thread?
>> ("v6-0002-change-the-data-type-of-XXXUsage-from-long-to-int64.patch")
> 
> Will review the patch later. I'm ok to discuss that in this thread.

Thanks!


Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION

Re: wal stats questions

From

Fujii Masao

Date:

11 May 2021, 07:44:49


On 2021/04/28 9:10, Masahiro Ikeda wrote:
> 
> 
> On 2021/04/27 21:56, Fujii Masao wrote:
>>
>>
>> On 2021/04/26 10:11, Masahiro Ikeda wrote:
>>>
>>> First patch has only the changes for pg_stat_wal view.
>>> ("v6-0001-performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch")
>>>
>>
>> +        pgWalUsage.wal_records == prevWalUsage.wal_records &&
>> +        walStats.wal_write == 0 && walStats.wal_sync == 0 &&
>>> WalStats.m_wal_write should be checked here instead of walStats.wal_write?
> 
> Thanks! Yes, I'll fix it.

Thanks!


> 
> 
>> Is there really the case where the number of sync is larger than zero when
>> the number of writes is zero? If not, it's enough to check only the number
>> of writes?
> 
> I thought that there is the case if "wal_sync_method" is fdatasync, fsync or
> fsync_writethrough. The example case is following.
> 
> (1) backend-1 writes the wal data because wal buffer has no space. But, it
> doesn't sync the wal data.
> (2) backend-2 reads data pages. In the execution, it need to write and sync
> the wal because dirty pages is selected as victim pages. backend-2 need to
> only sync the wal data because the wal data were already written by backend-1,
> but they weren't synced.

You're right. So let's leave the check of "m_wal_sync == 0".


> 
> I'm ok to change it since it's rare case.
> 
> 
>> +     * wal records weren't generated. So, the counters of 'wal_fpi',
>> +     * 'wal_bytes', 'm_wal_buffers_full' are not updated neither.
>>
>> It's better to add the assertion check that confirms
>> m_wal_buffers_full == 0 whenever wal_records is larger than zero?
> 
> Sorry, I couldn't understand yet. I thought that m_wal_buffers_full can be
> larger than 0 if wal_records > 0.
> 
> Do you suggest that the following assertion is needed?
> 
> -       if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
> -               return false;
> +       if (pgWalUsage.wal_records == prevWalUsage.wal_records &&
> +               WalStats.m_wal_write == 0 && WalStats.m_wal_sync == 0)
> +       {
> +               Assert(pgWalUsage.wal_fpi == 0 && pgWalUsage.wal_bytes &&
> +                               WalStats.m_wal_buffers_full == 0 &&
> WalStats.m_wal_write_time == 0 &&
> +                               WalStats.m_wal_sync_time == 0);
> +               return;
> +       }

I was thinking to add the "Assert(WalStats.m_wal_buffers_full)" as a safe-guard
because only m_wal_buffers_full is incremented in different places where
wal_records, m_wal_write and m_wal_sync are incremented.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: wal stats questions

From

Fujii Masao

Date:

11 May 2021, 08:25:58


On 2021/04/28 9:10, Masahiro Ikeda wrote:
>>> Second one has the changes for the type of the BufferUsage's and WalUsage's
>>> members. I change the type from long to int64. Is it better to make new thread?
>>> ("v6-0002-change-the-data-type-of-XXXUsage-from-long-to-int64.patch")
>>
>> Will review the patch later. I'm ok to discuss that in this thread.
> 
> Thanks!

0002 patch looks good to me.
I think we can commit this at first. Barring any objection, I will do that.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: wal stats questions

From

Masahiro Ikeda

Date:

11 May 2021, 09:46:21


On 2021/05/11 16:44, Fujii Masao wrote:
> 
> 
> On 2021/04/28 9:10, Masahiro Ikeda wrote:
>>
>>
>> On 2021/04/27 21:56, Fujii Masao wrote:
>>>
>>>
>>> On 2021/04/26 10:11, Masahiro Ikeda wrote:
>>>>
>>>> First patch has only the changes for pg_stat_wal view.
>>>> ("v6-0001-performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch")
>>>>
>>>>
>>>
>>> +        pgWalUsage.wal_records == prevWalUsage.wal_records &&
>>> +        walStats.wal_write == 0 && walStats.wal_sync == 0 &&
>>>> WalStats.m_wal_write should be checked here instead of walStats.wal_write?
>>
>> Thanks! Yes, I'll fix it.
> 
> Thanks!

Thanks for your comments!
I fixed them.

>>> Is there really the case where the number of sync is larger than zero when
>>> the number of writes is zero? If not, it's enough to check only the number
>>> of writes?
>>
>> I thought that there is the case if "wal_sync_method" is fdatasync, fsync or
>> fsync_writethrough. The example case is following.
>>
>> (1) backend-1 writes the wal data because wal buffer has no space. But, it
>> doesn't sync the wal data.
>> (2) backend-2 reads data pages. In the execution, it need to write and sync
>> the wal because dirty pages is selected as victim pages. backend-2 need to
>> only sync the wal data because the wal data were already written by backend-1,
>> but they weren't synced.
> 
> You're right. So let's leave the check of "m_wal_sync == 0".

OK.

>>> +     * wal records weren't generated. So, the counters of 'wal_fpi',
>>> +     * 'wal_bytes', 'm_wal_buffers_full' are not updated neither.
>>>
>>> It's better to add the assertion check that confirms
>>> m_wal_buffers_full == 0 whenever wal_records is larger than zero?
>>
>> Sorry, I couldn't understand yet. I thought that m_wal_buffers_full can be
>> larger than 0 if wal_records > 0.
>>
>> Do you suggest that the following assertion is needed?
>>
>> -       if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
>> -               return false;
>> +       if (pgWalUsage.wal_records == prevWalUsage.wal_records &&
>> +               WalStats.m_wal_write == 0 && WalStats.m_wal_sync == 0)
>> +       {
>> +               Assert(pgWalUsage.wal_fpi == 0 && pgWalUsage.wal_bytes &&
>> +                               WalStats.m_wal_buffers_full == 0 &&
>> WalStats.m_wal_write_time == 0 &&
>> +                               WalStats.m_wal_sync_time == 0);
>> +               return;
>> +       }
> 
> I was thinking to add the "Assert(WalStats.m_wal_buffers_full)" as a safe-guard
> because only m_wal_buffers_full is incremented in different places where
> wal_records, m_wal_write and m_wal_sync are incremented.

Understood. I added the assertion for m_wal_buffers_full only.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION

Attachment

v7-0001-performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch

Re: wal stats questions

From

Fujii Masao

Date:

12 May 2021, 10:19:34


On 2021/05/11 18:46, Masahiro Ikeda wrote:
> 
> 
> On 2021/05/11 16:44, Fujii Masao wrote:
>>
>>
>> On 2021/04/28 9:10, Masahiro Ikeda wrote:
>>>
>>>
>>> On 2021/04/27 21:56, Fujii Masao wrote:
>>>>
>>>>
>>>> On 2021/04/26 10:11, Masahiro Ikeda wrote:
>>>>>
>>>>> First patch has only the changes for pg_stat_wal view.
>>>>> ("v6-0001-performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch")
>>>>>
>>>>>
>>>>
>>>> +        pgWalUsage.wal_records == prevWalUsage.wal_records &&
>>>> +        walStats.wal_write == 0 && walStats.wal_sync == 0 &&
>>>>> WalStats.m_wal_write should be checked here instead of walStats.wal_write?
>>>
>>> Thanks! Yes, I'll fix it.
>>
>> Thanks!
> 
> Thanks for your comments!
> I fixed them.

Thanks for updating the patch!

      if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) &&
          pgStatXactCommit == 0 && pgStatXactRollback == 0 &&
+        pgWalUsage.wal_records == prevWalUsage.wal_records &&
+        WalStats.m_wal_write == 0 && WalStats.m_wal_sync == 0 &&

I'm just wondering if the above WAL activity counters need to be checked.
Maybe it's not necessary because "pgStatXactCommit == 0 && pgStatXactRollback == 0"
is checked? IOW, is there really the case where WAL activity counters are updated
even when both pgStatXactCommit and pgStatXactRollback are zero?


+    if (pgWalUsage.wal_records != prevWalUsage.wal_records)
+    {
+        WalUsage    walusage;
+
+        /*
+         * Calculate how much WAL usage counters were increased by substracting
+         * the previous counters from the current ones. Fill the results in
+         * WAL stats message.
+         */
+        MemSet(&walusage, 0, sizeof(WalUsage));
+        WalUsageAccumDiff(&walusage, &pgWalUsage, &prevWalUsage);

Isn't it better to move the code "prevWalUsage = pgWalUsage" into here?
Because it's necessary only when pgWalUsage.wal_records != prevWalUsage.wal_records.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: wal stats questions

From

Masahiro Ikeda

Date:

13 May 2021, 00:05:37


On 2021/05/12 19:19, Fujii Masao wrote:
> 
> 
> On 2021/05/11 18:46, Masahiro Ikeda wrote:
>>
>>
>> On 2021/05/11 16:44, Fujii Masao wrote:
>>>
>>>
>>> On 2021/04/28 9:10, Masahiro Ikeda wrote:
>>>>
>>>>
>>>> On 2021/04/27 21:56, Fujii Masao wrote:
>>>>>
>>>>>
>>>>> On 2021/04/26 10:11, Masahiro Ikeda wrote:
>>>>>>
>>>>>> First patch has only the changes for pg_stat_wal view.
>>>>>> ("v6-0001-performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch")
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> +        pgWalUsage.wal_records == prevWalUsage.wal_records &&
>>>>> +        walStats.wal_write == 0 && walStats.wal_sync == 0 &&
>>>>>> WalStats.m_wal_write should be checked here instead of walStats.wal_write?
>>>>
>>>> Thanks! Yes, I'll fix it.
>>>
>>> Thanks!
>>
>> Thanks for your comments!
>> I fixed them.
> 
> Thanks for updating the patch!
> 
>      if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) &&
>          pgStatXactCommit == 0 && pgStatXactRollback == 0 &&
> +        pgWalUsage.wal_records == prevWalUsage.wal_records &&
> +        WalStats.m_wal_write == 0 && WalStats.m_wal_sync == 0 &&
> 
> I'm just wondering if the above WAL activity counters need to be checked.
> Maybe it's not necessary because "pgStatXactCommit == 0 && pgStatXactRollback
> == 0"
> is checked? IOW, is there really the case where WAL activity counters are updated
> even when both pgStatXactCommit and pgStatXactRollback are zero?

Thanks for checking.

I haven't found the case yet. But, I added the condition because there is a
discussion that it's safer.

(It seems the mail thread chain is broken, Sorry...)
I copy the discussion at the time below.

https://www.postgresql.org/message-id/20210330.172843.267174731834281075.horikyota.ntt%40gmail.com
>>>> 3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0)
>>>>    just to figure out if there's been any changes isn't all that
>>>>    cheap. This is regularly exercised in read-only workloads too. Seems
>>>>    adding a boolean WalStats.have_pending = true or such would be
>>>>    better.
>>>> 4) For plain backends pgstat_report_wal() is called by
>>>>    pgstat_report_stat() - but it is not checked as part of the "Don't
>>>>    expend a clock check if nothing to do" check at the top.  It's
>>>>    probably rare to have pending wal stats without also passing one of
>>>>    the other conditions, but ...
>>>
>>> I added the logic to check if the stats counters are updated or not in
>>> pgstat_report_stat() using not only generated wal record but also write/sync
>>> counters, and it can skip to call reporting function.
>>
>> I removed the checking code whether the wal stats counters were updated or not
>> in pgstat_report_stat() since I couldn't understand the valid case yet.
>> pgstat_report_stat() is called by backends when the transaction is finished.
>> This means that the condition seems always pass.
>
> Doesn't the same holds for all other counters?  If you are saying that
> "wal counters should be zero if all other stats counters are zero", we
> need an assertion to check that and a comment to explain that.
>
> I personally find it safer to add the WAL-stats condition to the
> fast-return check, rather than addin such assertion.


> +    if (pgWalUsage.wal_records != prevWalUsage.wal_records)
> +    {
> +        WalUsage    walusage;
> +
> +        /*
> +         * Calculate how much WAL usage counters were increased by substracting
> +         * the previous counters from the current ones. Fill the results in
> +         * WAL stats message.
> +         */
> +        MemSet(&walusage, 0, sizeof(WalUsage));
> +        WalUsageAccumDiff(&walusage, &pgWalUsage, &prevWalUsage);
> 
> Isn't it better to move the code "prevWalUsage = pgWalUsage" into here?
> Because it's necessary only when pgWalUsage.wal_records !=
> prevWalUsage.wal_records.

Yes, I fixed it.


Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION

Attachment

v8-0001-performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch

Re: wal stats questions

From

torikoshia

Date:

17 May 2021, 07:07:31

On 2021-05-13 09:05, Masahiro Ikeda wrote:
> On 2021/05/12 19:19, Fujii Masao wrote:
>> 
>> 
>> On 2021/05/11 18:46, Masahiro Ikeda wrote:
>>> 
>>> 
>>> On 2021/05/11 16:44, Fujii Masao wrote:
>>>> 
>>>> 
>>>> On 2021/04/28 9:10, Masahiro Ikeda wrote:
>>>>> 
>>>>> 
>>>>> On 2021/04/27 21:56, Fujii Masao wrote:
>>>>>> 
>>>>>> 
>>>>>> On 2021/04/26 10:11, Masahiro Ikeda wrote:
>>>>>>> 
>>>>>>> First patch has only the changes for pg_stat_wal view.
>>>>>>> ("v6-0001-performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch")
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> +        pgWalUsage.wal_records == prevWalUsage.wal_records &&
>>>>>> +        walStats.wal_write == 0 && walStats.wal_sync == 0 &&
>>>>>>> WalStats.m_wal_write should be checked here instead of 
>>>>>>> walStats.wal_write?
>>>>> 
>>>>> Thanks! Yes, I'll fix it.
>>>> 
>>>> Thanks!
>>> 
>>> Thanks for your comments!
>>> I fixed them.
>> 
>> Thanks for updating the patch!
>> 
>>      if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) &&
>>          pgStatXactCommit == 0 && pgStatXactRollback == 0 &&
>> +        pgWalUsage.wal_records == prevWalUsage.wal_records &&
>> +        WalStats.m_wal_write == 0 && WalStats.m_wal_sync == 0 &&
>> 
>> I'm just wondering if the above WAL activity counters need to be 
>> checked.
>> Maybe it's not necessary because "pgStatXactCommit == 0 && 
>> pgStatXactRollback
>> == 0"
>> is checked? IOW, is there really the case where WAL activity counters 
>> are updated
>> even when both pgStatXactCommit and pgStatXactRollback are zero?
> 
> Thanks for checking.
> 
> I haven't found the case yet. But, I added the condition because there 
> is a
> discussion that it's safer.
> 
> (It seems the mail thread chain is broken, Sorry...)
> I copy the discussion at the time below.
> 
> https://www.postgresql.org/message-id/20210330.172843.267174731834281075.horikyota.ntt%40gmail.com
>>>>> 3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) 
>>>>> == 0)
>>>>>    just to figure out if there's been any changes isn't all that
>>>>>    cheap. This is regularly exercised in read-only workloads too. 
>>>>> Seems
>>>>>    adding a boolean WalStats.have_pending = true or such would be
>>>>>    better.
>>>>> 4) For plain backends pgstat_report_wal() is called by
>>>>>    pgstat_report_stat() - but it is not checked as part of the 
>>>>> "Don't
>>>>>    expend a clock check if nothing to do" check at the top.  It's
>>>>>    probably rare to have pending wal stats without also passing one 
>>>>> of
>>>>>    the other conditions, but ...
>>>> 
>>>> I added the logic to check if the stats counters are updated or not 
>>>> in
>>>> pgstat_report_stat() using not only generated wal record but also 
>>>> write/sync
>>>> counters, and it can skip to call reporting function.
>>> 
>>> I removed the checking code whether the wal stats counters were 
>>> updated or not
>>> in pgstat_report_stat() since I couldn't understand the valid case 
>>> yet.
>>> pgstat_report_stat() is called by backends when the transaction is 
>>> finished.
>>> This means that the condition seems always pass.
>> 
>> Doesn't the same holds for all other counters?  If you are saying that
>> "wal counters should be zero if all other stats counters are zero", we
>> need an assertion to check that and a comment to explain that.
>> 
>> I personally find it safer to add the WAL-stats condition to the
>> fast-return check, rather than addin such assertion.
> 
> 
>> +    if (pgWalUsage.wal_records != prevWalUsage.wal_records)
>> +    {
>> +        WalUsage    walusage;
>> +
>> +        /*
>> +         * Calculate how much WAL usage counters were increased by 
>> substracting
>> +         * the previous counters from the current ones. Fill the 
>> results in
>> +         * WAL stats message.
>> +         */
>> +        MemSet(&walusage, 0, sizeof(WalUsage));
>> +        WalUsageAccumDiff(&walusage, &pgWalUsage, &prevWalUsage);
>> 
>> Isn't it better to move the code "prevWalUsage = pgWalUsage" into 
>> here?
>> Because it's necessary only when pgWalUsage.wal_records !=
>> prevWalUsage.wal_records.
> 
> Yes, I fixed it.
> 
> 
> Regards,

Thanks for updating the patch!

> +     * is executed, wal records aren't nomally generated (although HOT 
> makes

nomally -> normally?

> +     * It's not enough to check the number of generated wal records, for
> +     * example the walwriter may write/sync the WAL although it doesn't

You use both 'wal' and 'WAL' in the comments, but are there any 
intension?

Regards,

Re: wal stats questions

From

Masahiro Ikeda

Date:

17 May 2021, 07:39:51

Thanks for your comments!

>> +     * is executed, wal records aren't nomally generated (although HOT makes
> 
> nomally -> normally?

Yes, fixed.

>> +     * It's not enough to check the number of generated wal records, for
>> +     * example the walwriter may write/sync the WAL although it doesn't
> 
> You use both 'wal' and 'WAL' in the comments, but are there any intension?

No, I changed to use 'WAL' only. Although some other comments have 'wal' and
'WAL', it seems that 'WAL' is often used.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION

Attachment

v9-0001-performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch

Re: wal stats questions

From

Fujii Masao

Date:

17 May 2021, 13:34:33


On 2021/05/17 16:39, Masahiro Ikeda wrote:
> 
> Thanks for your comments!
> 
>>> +     * is executed, wal records aren't nomally generated (although HOT makes
>>
>> nomally -> normally?
> 
> Yes, fixed.
> 
>>> +     * It's not enough to check the number of generated wal records, for
>>> +     * example the walwriter may write/sync the WAL although it doesn't
>>
>> You use both 'wal' and 'WAL' in the comments, but are there any intension?
> 
> No, I changed to use 'WAL' only. Although some other comments have 'wal' and
> 'WAL', it seems that 'WAL' is often used.

Thanks for updating the patch!

+ * Buffer and generated WAL usage counters.
+ *
+ * The counters are accumulated values. There are infrastructures
+ * to add the values and calculate the difference within a specific period.

Is it really worth adding these comments here?

+     * Note: regarding to WAL statistics counters, it's not enough to check
+     * only whether the WAL record is generated or not. If a read transaction
+     * is executed, WAL records aren't normally generated (although HOT makes
+     * WAL records). But, just writes and syncs the WAL data may happen when
+     * to flush buffers.

Aren't the following comments better?

------------------------------
To determine whether any WAL activity has occurred since last time, not only the number of generated WAL records but
alsothe numbers of WAL writes and syncs need to be checked. Because even transaction that generates no WAL records can
writeor sync WAL data when flushing the data pages.
 
------------------------------

-     * This function can be called even if nothing at all has happened. In
-     * this case, avoid sending a completely empty message to the stats
-     * collector.

I think that it's better to leave this comment as it is.

+     * First, to check the WAL stats counters were updated.
+     *
+     * Even if the 'force' is true, we don't need to send the stats if the
+     * counters were not updated.
+     *
+     * We can know whether the counters were updated or not to check only
+     * three counters. So, for performance, we don't allocate another memory
+     * spaces and check the all stats like pgstat_send_slru().

Is it really worth adding these comments here?

+     * The current 'wal_records' is the same as the previous one means that
+     * WAL records weren't generated. So, the counters of 'wal_fpi',
+     * 'wal_bytes', 'm_wal_buffers_full' are not updated neither.
+     *
+     * It's not enough to check the number of generated WAL records, for
+     * example the walwriter may write/sync the WAL although it doesn't
+     * generate WAL records. 'm_wal_write' and 'm_wal_sync' are zero means the
+     * counters of time spent are zero too.

Aren't the following comments better?

------------------------
Check wal_records counter to determine whether any WAL activity has happened since last time. Note that other WalUsage
countersdon't need to be checked because they are incremented always together with wal_records counter.
 

m_wal_buffers_full also doesn't need to be checked because it's incremented only when at least one WAL record is
generated(i.e., wal_records counter is incremented). But for safely, we assert that m_wal_buffers_full is always zero
whenno WAL record is generated
 

This function can be called by a process like walwriter that normally generates no WAL records. To determine whether
anyWAL activity has happened at that process since the last time, the numbers of WAL writes and syncs are also
checked.
------------------------

+ * The accumulated counters for buffer usage.
+ *
+ * The reason the counters are accumulated values is to avoid unexpected
+ * reset because many callers may use them.

Aren't the following comments better?

------------------------
These counters keep being incremented infinitely, i.e., must never be reset to zero, so that we can calculate how much
thecounters are incremented in an arbitrary period.
 
------------------------

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: wal stats questions

From

Masahiro Ikeda

Date:

18 May 2021, 00:57:18


On 2021/05/17 22:34, Fujii Masao wrote:
> 
> 
> On 2021/05/17 16:39, Masahiro Ikeda wrote:
>>
>> Thanks for your comments!
>>
>>>> +     * is executed, wal records aren't nomally generated (although HOT makes
>>>
>>> nomally -> normally?
>>
>> Yes, fixed.
>>
>>>> +     * It's not enough to check the number of generated wal records, for
>>>> +     * example the walwriter may write/sync the WAL although it doesn't
>>>
>>> You use both 'wal' and 'WAL' in the comments, but are there any intension?
>>
>> No, I changed to use 'WAL' only. Although some other comments have 'wal' and
>> 'WAL', it seems that 'WAL' is often used.
> 
> Thanks for updating the patch!

Thanks a lot of comments!

> + * Buffer and generated WAL usage counters.
> + *
> + * The counters are accumulated values. There are infrastructures
> + * to add the values and calculate the difference within a specific period.
> 
> Is it really worth adding these comments here?

BufferUsage has almost same comments. So, I removed it.

> +     * Note: regarding to WAL statistics counters, it's not enough to check
> +     * only whether the WAL record is generated or not. If a read transaction
> +     * is executed, WAL records aren't normally generated (although HOT makes
> +     * WAL records). But, just writes and syncs the WAL data may happen when
> +     * to flush buffers.
> 
> Aren't the following comments better?
> 
> ------------------------------
> To determine whether any WAL activity has occurred since last time, not only
> the number of generated WAL records but also the numbers of WAL writes and
> syncs need to be checked. Because even transaction that generates no WAL
> records can write or sync WAL data when flushing the data pages.
> ------------------------------

Thanks. Yes, I fixed it.

> -     * This function can be called even if nothing at all has happened. In
> -     * this case, avoid sending a completely empty message to the stats
> -     * collector.
> 
> I think that it's better to leave this comment as it is.

OK. I leave it.

> +     * First, to check the WAL stats counters were updated.
> +     *
> +     * Even if the 'force' is true, we don't need to send the stats if the
> +     * counters were not updated.
> +     *
> +     * We can know whether the counters were updated or not to check only
> +     * three counters. So, for performance, we don't allocate another memory
> +     * spaces and check the all stats like pgstat_send_slru().
> 
> Is it really worth adding these comments here?

I removed them because the following comments are enough.

* This function can be called even if nothing at all has happened. In
* this case, avoid sending a completely empty message to the stats
* collector.

> +     * The current 'wal_records' is the same as the previous one means that
> +     * WAL records weren't generated. So, the counters of 'wal_fpi',
> +     * 'wal_bytes', 'm_wal_buffers_full' are not updated neither.
> +     *
> +     * It's not enough to check the number of generated WAL records, for
> +     * example the walwriter may write/sync the WAL although it doesn't
> +     * generate WAL records. 'm_wal_write' and 'm_wal_sync' are zero means the
> +     * counters of time spent are zero too.
> 
> Aren't the following comments better?
> 
> ------------------------
> Check wal_records counter to determine whether any WAL activity has happened
> since last time. Note that other WalUsage counters don't need to be checked
> because they are incremented always together with wal_records counter.
> 
> m_wal_buffers_full also doesn't need to be checked because it's incremented
> only when at least one WAL record is generated (i.e., wal_records counter is
> incremented). But for safely, we assert that m_wal_buffers_full is always zero
> when no WAL record is generated
> 
> This function can be called by a process like walwriter that normally
> generates no WAL records. To determine whether any WAL activity has happened
> at that process since the last time, the numbers of WAL writes and syncs are
> also checked.
> ------------------------

Yes, I modified them.

> + * The accumulated counters for buffer usage.
> + *
> + * The reason the counters are accumulated values is to avoid unexpected
> + * reset because many callers may use them.
> 
> Aren't the following comments better?
> 
> ------------------------
> These counters keep being incremented infinitely, i.e., must never be reset to
> zero, so that we can calculate how much the counters are incremented in an
> arbitrary period.
> ------------------------

Yes, thanks.
I fixed it.

Regards,
-- 
Masahiro Ikeda
NTT DATA CORPORATION

Attachment

v10-0001-performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch

Re: wal stats questions

From

Fujii Masao

Date:

19 May 2021, 02:40:52


On 2021/05/18 9:57, Masahiro Ikeda wrote:
> 
> 
> On 2021/05/17 22:34, Fujii Masao wrote:
>>
>>
>> On 2021/05/17 16:39, Masahiro Ikeda wrote:
>>>
>>> Thanks for your comments!
>>>
>>>>> +     * is executed, wal records aren't nomally generated (although HOT makes
>>>>
>>>> nomally -> normally?
>>>
>>> Yes, fixed.
>>>
>>>>> +     * It's not enough to check the number of generated wal records, for
>>>>> +     * example the walwriter may write/sync the WAL although it doesn't
>>>>
>>>> You use both 'wal' and 'WAL' in the comments, but are there any intension?
>>>
>>> No, I changed to use 'WAL' only. Although some other comments have 'wal' and
>>> 'WAL', it seems that 'WAL' is often used.
>>
>> Thanks for updating the patch!
> 
> Thanks a lot of comments!
> 
>> + * Buffer and generated WAL usage counters.
>> + *
>> + * The counters are accumulated values. There are infrastructures
>> + * to add the values and calculate the difference within a specific period.
>>
>> Is it really worth adding these comments here?
> 
> BufferUsage has almost same comments. So, I removed it.
> 
>> +     * Note: regarding to WAL statistics counters, it's not enough to check
>> +     * only whether the WAL record is generated or not. If a read transaction
>> +     * is executed, WAL records aren't normally generated (although HOT makes
>> +     * WAL records). But, just writes and syncs the WAL data may happen when
>> +     * to flush buffers.
>>
>> Aren't the following comments better?
>>
>> ------------------------------
>> To determine whether any WAL activity has occurred since last time, not only
>> the number of generated WAL records but also the numbers of WAL writes and
>> syncs need to be checked. Because even transaction that generates no WAL
>> records can write or sync WAL data when flushing the data pages.
>> ------------------------------
> 
> Thanks. Yes, I fixed it.
> 
>> -     * This function can be called even if nothing at all has happened. In
>> -     * this case, avoid sending a completely empty message to the stats
>> -     * collector.
>>
>> I think that it's better to leave this comment as it is.
> 
> OK. I leave it.
> 
>> +     * First, to check the WAL stats counters were updated.
>> +     *
>> +     * Even if the 'force' is true, we don't need to send the stats if the
>> +     * counters were not updated.
>> +     *
>> +     * We can know whether the counters were updated or not to check only
>> +     * three counters. So, for performance, we don't allocate another memory
>> +     * spaces and check the all stats like pgstat_send_slru().
>>
>> Is it really worth adding these comments here?
> 
> I removed them because the following comments are enough.
> 
> * This function can be called even if nothing at all has happened. In
> * this case, avoid sending a completely empty message to the stats
> * collector.
> 
>> +     * The current 'wal_records' is the same as the previous one means that
>> +     * WAL records weren't generated. So, the counters of 'wal_fpi',
>> +     * 'wal_bytes', 'm_wal_buffers_full' are not updated neither.
>> +     *
>> +     * It's not enough to check the number of generated WAL records, for
>> +     * example the walwriter may write/sync the WAL although it doesn't
>> +     * generate WAL records. 'm_wal_write' and 'm_wal_sync' are zero means the
>> +     * counters of time spent are zero too.
>>
>> Aren't the following comments better?
>>
>> ------------------------
>> Check wal_records counter to determine whether any WAL activity has happened
>> since last time. Note that other WalUsage counters don't need to be checked
>> because they are incremented always together with wal_records counter.
>>
>> m_wal_buffers_full also doesn't need to be checked because it's incremented
>> only when at least one WAL record is generated (i.e., wal_records counter is
>> incremented). But for safely, we assert that m_wal_buffers_full is always zero
>> when no WAL record is generated
>>
>> This function can be called by a process like walwriter that normally
>> generates no WAL records. To determine whether any WAL activity has happened
>> at that process since the last time, the numbers of WAL writes and syncs are
>> also checked.
>> ------------------------
> 
> Yes, I modified them.
> 
>> + * The accumulated counters for buffer usage.
>> + *
>> + * The reason the counters are accumulated values is to avoid unexpected
>> + * reset because many callers may use them.
>>
>> Aren't the following comments better?
>>
>> ------------------------
>> These counters keep being incremented infinitely, i.e., must never be reset to
>> zero, so that we can calculate how much the counters are incremented in an
>> arbitrary period.
>> ------------------------
> 
> Yes, thanks.
> I fixed it.

Thanks for updating the patch! I modified some comments slightly and
pushed that version of the patch.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: wal stats questions

From

Masahiro Ikeda

Date:

20 May 2021, 00:40:59

On 2021/05/19 11:40, Fujii Masao wrote:
> Thanks for updating the patch! I modified some comments slightly and
> pushed that version of the patch.

Thanks a lot!

Regards,

-- 
Masahiro Ikeda
NTT DATA CORPORATION