Thread: wal stats questions
Hi, I got a few questions about the wal stats while working on the shmem stats patch: 1) What is the motivation to have both prevWalUsage and pgWalUsage, instead of just accumulating the stats since the last submission? There doesn't seem to be any comment explaining it? Computing differences to previous values, and copying to prev*, isn't free. I assume this is out of a fear that the stats could get reset before they're used for instrument.c purposes - but who knows? 2) Why is there both pgstat_send_wal() and pgstat_report_wal()? With the former being used by wal writer, the latter by most other processes? There again don't seem to be comments explaining this. 3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0) just to figure out if there's been any changes isn't all that cheap. This is regularly exercised in read-only workloads too. Seems adding a boolean WalStats.have_pending = true or such would be better. 4) For plain backends pgstat_report_wal() is called by pgstat_report_stat() - but it is not checked as part of the "Don't expend a clock check if nothing to do" check at the top. It's probably rare to have pending wal stats without also passing one of the other conditions, but ... Generally the various patches seems to to have a lot of the boilerplate style comments (like "Prepare and send the message"), but very little in the way of explaining the design. Greetings, Andres Freund
On 2021/03/25 8:22, Andres Freund wrote: > Hi, > > I got a few questions about the wal stats while working on the shmem > stats patch: Thanks for your reviews. > 1) What is the motivation to have both prevWalUsage and pgWalUsage, > instead of just accumulating the stats since the last submission? > There doesn't seem to be any comment explaining it? Computing > differences to previous values, and copying to prev*, isn't free. I > assume this is out of a fear that the stats could get reset before > they're used for instrument.c purposes - but who knows? Is your point that it's better to call pgWalUsage = 0; after sending the stats? My understanding is as same as your assumption. For example, pg_stat_statements.c use pgWalUsage and calculate the diff. But, because the stats may be sent after the transaction is finished, it doesn't seem to lead wrong stats if pgWalUsage = 0 is called. So, I agree your suggestion. If the extension wants to know the walusage diff across two transactions, it may lead to wrong stats, but I think it won't happen. > 2) Why is there both pgstat_send_wal() and pgstat_report_wal()? With the > former being used by wal writer, the latter by most other processes? > There again don't seem to be comments explaining this. To control the transmission interval for the wal writer because it may send the stats too frequency, and to omit to calculate the generated wal stats because it doesn't generate wal records. But, now I think it's better to merge them. > 3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0) > just to figure out if there's been any changes isn't all that > cheap. This is regularly exercised in read-only workloads too. Seems > adding a boolean WalStats.have_pending = true or such would be > better. I understood that for backends, this may leads performance degradation and this problem is not only for the WalStats but also SLRUStats. I wondered the degradation is so big because pgstat_report_stat() checks if at least PGSTAT_STAT_INTERVAL msec is passed since it last sent before check the diff. If my understanding is correct, to get timestamp is more expensive. > 4) For plain backends pgstat_report_wal() is called by > pgstat_report_stat() - but it is not checked as part of the "Don't > expend a clock check if nothing to do" check at the top. It's > probably rare to have pending wal stats without also passing one of > the other conditions, but ... (I'm not confidence my understanding of your comment is right.) You mean that we need to expend a clock check in pgstat_report_wal()? I think it's unnecessary because pgstat_report_stat() is already checked it. > Generally the various patches seems to to have a lot of the boilerplate > style comments (like "Prepare and send the message"), but very little in > the way of explaining the design. Sorry for that. I'll be careful. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Hi, On 2021-03-25 10:51:56 +0900, Masahiro Ikeda wrote: > On 2021/03/25 8:22, Andres Freund wrote: > > 1) What is the motivation to have both prevWalUsage and pgWalUsage, > > instead of just accumulating the stats since the last submission? > > There doesn't seem to be any comment explaining it? Computing > > differences to previous values, and copying to prev*, isn't free. I > > assume this is out of a fear that the stats could get reset before > > they're used for instrument.c purposes - but who knows? > > Is your point that it's better to call pgWalUsage = 0; after sending the > stats? Yes. At least there should be a comment explaining why it's done the way it is. > > 3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0) > > just to figure out if there's been any changes isn't all that > > cheap. This is regularly exercised in read-only workloads too. Seems > > adding a boolean WalStats.have_pending = true or such would be > > better. > > I understood that for backends, this may leads performance degradation and > this problem is not only for the WalStats but also SLRUStats. > > I wondered the degradation is so big because pgstat_report_stat() checks if at > least PGSTAT_STAT_INTERVAL msec is passed since it last sent before check the > diff. If my understanding is correct, to get timestamp is more expensive. Getting a timestamp is expensive, yes. But I think we need to check for the no-pending-wal-stats *before* the clock check. See the below: > > 4) For plain backends pgstat_report_wal() is called by > > pgstat_report_stat() - but it is not checked as part of the "Don't > > expend a clock check if nothing to do" check at the top. It's > > probably rare to have pending wal stats without also passing one of > > the other conditions, but ... > > (I'm not confidence my understanding of your comment is right.) > You mean that we need to expend a clock check in pgstat_report_wal()? > I think it's unnecessary because pgstat_report_stat() is already checked it. No, I mean that right now we might can erroneously return early pgstat_report_wal() for normal backends in some workloads: void pgstat_report_stat(bool disconnect) ... /* Don't expend a clock check if nothing to do */ if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) && pgStatXactCommit == 0 && pgStatXactRollback == 0 && !have_function_stats && !disconnect) return; might return if there only is pending WAL activity. This needs to check whether there are pending WAL stats. Which in turn means that the check should be fast. Greetings, Andres Freund
At Wed, 24 Mar 2021 21:07:26 -0700, Andres Freund <andres@anarazel.de> wrote in > Hi, > > On 2021-03-25 10:51:56 +0900, Masahiro Ikeda wrote: > > On 2021/03/25 8:22, Andres Freund wrote: > > > 1) What is the motivation to have both prevWalUsage and pgWalUsage, > > > instead of just accumulating the stats since the last submission? > > > There doesn't seem to be any comment explaining it? Computing > > > differences to previous values, and copying to prev*, isn't free. I > > > assume this is out of a fear that the stats could get reset before > > > they're used for instrument.c purposes - but who knows? > > > > Is your point that it's better to call pgWalUsage = 0; after sending the > > stats? > > Yes. At least there should be a comment explaining why it's done the way > it is. pgWalUsage was used without resetting and we (I) thought that that behavior should be preserved. On second thought, as Andres suggested, we can just reset pgWalUsage at sending since AFAICS no one takes difference crossing pgstat_report_stat() calls. > > > 3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0) > > > just to figure out if there's been any changes isn't all that > > > cheap. This is regularly exercised in read-only workloads too. Seems > > > adding a boolean WalStats.have_pending = true or such would be > > > better. > > > > I understood that for backends, this may leads performance degradation and > > this problem is not only for the WalStats but also SLRUStats. > > > > I wondered the degradation is so big because pgstat_report_stat() checks if at > > least PGSTAT_STAT_INTERVAL msec is passed since it last sent before check the > > diff. If my understanding is correct, to get timestamp is more expensive. > > Getting a timestamp is expensive, yes. But I think we need to check for > the no-pending-wal-stats *before* the clock check. See the below: > > > > > 4) For plain backends pgstat_report_wal() is called by > > > pgstat_report_stat() - but it is not checked as part of the "Don't > > > expend a clock check if nothing to do" check at the top. It's > > > probably rare to have pending wal stats without also passing one of > > > the other conditions, but ... > > > > (I'm not confidence my understanding of your comment is right.) > > You mean that we need to expend a clock check in pgstat_report_wal()? > > I think it's unnecessary because pgstat_report_stat() is already checked it. > > No, I mean that right now we might can erroneously return early > pgstat_report_wal() for normal backends in some workloads: > > void > pgstat_report_stat(bool disconnect) > ... > /* Don't expend a clock check if nothing to do */ > if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) && > pgStatXactCommit == 0 && pgStatXactRollback == 0 && > !have_function_stats && !disconnect) > return; > > might return if there only is pending WAL activity. This needs to check > whether there are pending WAL stats. Which in turn means that the check > should be fast. Agreed that the condition is wrong. On the other hand, the counters are incremented in XLogInsertRecord() and I think we don't want add instructions there. If any wal activity has been recorded, wal_records is always positive thus we can check for wal activity just by "pgWalUsage.wal_records > 0, which should be fast enough. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2021/03/25 16:37, Kyotaro Horiguchi wrote: > At Wed, 24 Mar 2021 21:07:26 -0700, Andres Freund <andres@anarazel.de> wrote in >> Hi, >> >> On 2021-03-25 10:51:56 +0900, Masahiro Ikeda wrote: >>> On 2021/03/25 8:22, Andres Freund wrote: >>>> 1) What is the motivation to have both prevWalUsage and pgWalUsage, >>>> instead of just accumulating the stats since the last submission? >>>> There doesn't seem to be any comment explaining it? Computing >>>> differences to previous values, and copying to prev*, isn't free. I >>>> assume this is out of a fear that the stats could get reset before >>>> they're used for instrument.c purposes - but who knows? >>> >>> Is your point that it's better to call pgWalUsage = 0; after sending the >>> stats? >> >> Yes. At least there should be a comment explaining why it's done the way >> it is. > > pgWalUsage was used without resetting and we (I) thought that that > behavior should be preserved. On second thought, as Andres suggested, > we can just reset pgWalUsage at sending since AFAICS no one takes > difference crossing pgstat_report_stat() calls. Yes, I agree that we can do that since there seems no such code for now. Also if we do that, we can check, for example "pgWalUsage.wal_records > 0" as you suggested, to easily determine whether there is pending WAL stats or not. Anyway I agree it's better to add comments about the design more. >>>> 3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0) >>>> just to figure out if there's been any changes isn't all that >>>> cheap. This is regularly exercised in read-only workloads too. Seems >>>> adding a boolean WalStats.have_pending = true or such would be >>>> better. >>> >>> I understood that for backends, this may leads performance degradation and >>> this problem is not only for the WalStats but also SLRUStats. >>> >>> I wondered the degradation is so big because pgstat_report_stat() checks if at >>> least PGSTAT_STAT_INTERVAL msec is passed since it last sent before check the >>> diff. If my understanding is correct, to get timestamp is more expensive. >> >> Getting a timestamp is expensive, yes. But I think we need to check for >> the no-pending-wal-stats *before* the clock check. See the below: >> >> >>>> 4) For plain backends pgstat_report_wal() is called by >>>> pgstat_report_stat() - but it is not checked as part of the "Don't >>>> expend a clock check if nothing to do" check at the top. It's >>>> probably rare to have pending wal stats without also passing one of >>>> the other conditions, but ... >>> >>> (I'm not confidence my understanding of your comment is right.) >>> You mean that we need to expend a clock check in pgstat_report_wal()? >>> I think it's unnecessary because pgstat_report_stat() is already checked it. >> >> No, I mean that right now we might can erroneously return early >> pgstat_report_wal() for normal backends in some workloads: >> >> void >> pgstat_report_stat(bool disconnect) >> ... >> /* Don't expend a clock check if nothing to do */ >> if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) && >> pgStatXactCommit == 0 && pgStatXactRollback == 0 && >> !have_function_stats && !disconnect) >> return; >> >> might return if there only is pending WAL activity. This needs to check >> whether there are pending WAL stats. Which in turn means that the check >> should be fast. > > Agreed that the condition is wrong. On the other hand, the counters > are incremented in XLogInsertRecord() and I think we don't want add > instructions there. Basically yes. We should avoid that especially while WALInsertLock is being hold. But it's not so harmful to do that after the lock is released? > If any wal activity has been recorded, wal_records is always positive > thus we can check for wal activity just by "pgWalUsage.wal_records > > 0, which should be fast enough. Maybe there is the case where a backend generates no WAL records, but just writes them because it needs to do write-ahead-logging when flush the table data? If yes, "pgWalUsage.wal_records > 0" is not enough. Probably other fields also need to be checked. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
At Thu, 25 Mar 2021 19:01:23 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > On 2021/03/25 16:37, Kyotaro Horiguchi wrote: > > pgWalUsage was used without resetting and we (I) thought that that > > behavior should be preserved. On second thought, as Andres suggested, > > we can just reset pgWalUsage at sending since AFAICS no one takes > > difference crossing pgstat_report_stat() calls. > > Yes, I agree that we can do that since there seems no such code for > now. > Also if we do that, we can check, for example "pgWalUsage.wal_records > > 0" > as you suggested, to easily determine whether there is pending WAL > stats or not. > Anyway I agree it's better to add comments about the design more. ... > > If any wal activity has been recorded, wal_records is always positive > > thus we can check for wal activity just by "pgWalUsage.wal_records > > > 0, which should be fast enough. > > Maybe there is the case where a backend generates no WAL records, > but just writes them because it needs to do write-ahead-logging > when flush the table data? If yes, "pgWalUsage.wal_records > 0" is not > enough. > Probably other fields also need to be checked. (I noticed I made the discussion above unconsciously premising pgWalUsage reset.) I may be misunderstanding or missing something, but the only place where pgWalUsage counters are increased is XLogInsertRecrod. That is, pgWalUsage counts wal insertions, not writes nor flushes. AFAICS pgWalUsage.wal_records is always incremented when other counters are increased. Looking from another side, we should refrain from adding counters that incrases at a different time than pgWalUsage.wal_recrods. (That should be written as a comment there.) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2021/03/26 10:08, Kyotaro Horiguchi wrote: > At Thu, 25 Mar 2021 19:01:23 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in >> On 2021/03/25 16:37, Kyotaro Horiguchi wrote: >>> pgWalUsage was used without resetting and we (I) thought that that >>> behavior should be preserved. On second thought, as Andres suggested, >>> we can just reset pgWalUsage at sending since AFAICS no one takes >>> difference crossing pgstat_report_stat() calls. >> >> Yes, I agree that we can do that since there seems no such code for >> now. >> Also if we do that, we can check, for example "pgWalUsage.wal_records >>> 0" >> as you suggested, to easily determine whether there is pending WAL >> stats or not. >> Anyway I agree it's better to add comments about the design more. > ... >>> If any wal activity has been recorded, wal_records is always positive >>> thus we can check for wal activity just by "pgWalUsage.wal_records > >>> 0, which should be fast enough. >> >> Maybe there is the case where a backend generates no WAL records, >> but just writes them because it needs to do write-ahead-logging >> when flush the table data? If yes, "pgWalUsage.wal_records > 0" is not >> enough. >> Probably other fields also need to be checked. > > (I noticed I made the discussion above unconsciously premising > pgWalUsage reset.) > > I may be misunderstanding or missing something, but the only place > where pgWalUsage counters are increased is XLogInsertRecrod. That is, > pgWalUsage counts wal insertions, not writes nor flushes. AFAICS Yes. And WalStats instead of pgWalUsage includes the stats about not only WAL insertions, but also writes and flushes. pgstat_send_wal() checks WalStats to determine whether there are pending WAL stats to send to the stats collector or not. That is, the counters of not only WAL insertions but also writes and flushes should be checked to determine whether there are pending stats or not, I think.. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
At Fri, 26 Mar 2021 10:32:23 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > > I may be misunderstanding or missing something, but the only place > > where pgWalUsage counters are increased is XLogInsertRecrod. That is, > > pgWalUsage counts wal insertions, not writes nor flushes. AFAICS > > Yes. And WalStats instead of pgWalUsage includes the stats about > not only WAL insertions, but also writes and flushes. Ugh! I was missing a very large blob.. Ok, we need additional check for non-pgWalUsage part. Sorry. > pgstat_send_wal() checks WalStats to determine whether there are > pending WAL stats to send to the stats collector or not. That is, > the counters of not only WAL insertions but also writes and flushes > should be checked to determine whether there are pending stats or not, > I think.. I think we may have an additional flag to notify about io-stat part, in constrast to wal-insertion part . Anyway we do additional INSTR_TIME_SET_CURRENT when track_wal_io_timinge. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Thanks for many your suggestions! I made the patch to handle the issues. > 1) What is the motivation to have both prevWalUsage and pgWalUsage, > instead of just accumulating the stats since the last submission? > There doesn't seem to be any comment explaining it? Computing > differences to previous values, and copying to prev*, isn't free. I > assume this is out of a fear that the stats could get reset before > they're used for instrument.c purposes - but who knows? I removed the unnecessary code copying pgWalUsage and just reset the pgWalUsage after reporting the stats in pgstat_report_wal(). > 2) Why is there both pgstat_send_wal() and pgstat_report_wal()? With the > former being used by wal writer, the latter by most other processes? > There again don't seem to be comments explaining this. I added the comments why two functions are separated. (But is it better to merge them?) > 3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0) > just to figure out if there's been any changes isn't all that > cheap. This is regularly exercised in read-only workloads too. Seems > adding a boolean WalStats.have_pending = true or such would be > better. > 4) For plain backends pgstat_report_wal() is called by > pgstat_report_stat() - but it is not checked as part of the "Don't > expend a clock check if nothing to do" check at the top. It's > probably rare to have pending wal stats without also passing one of > the other conditions, but ... I added the logic to check if the stats counters are updated or not in pgstat_report_stat() using not only generated wal record but also write/sync counters, and it can skip to call reporting function. Although I added the condition which the write/sync counters are updated or not, I haven't understood the following case yet...Sorry. I checked related code and tested to insert large object, but I couldn't. I'll investigate more deeply, but if you already know the function which leads the following case, please let me know. > Maybe there is the case where a backend generates no WAL records, > but just writes them because it needs to do write-ahead-logging > when flush the table data? > Ugh! I was missing a very large blob.. Ok, we need additional check > for non-pgWalUsage part. Sorry. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
Hi, On 2021-03-25 16:37:10 +0900, Kyotaro Horiguchi wrote: > On the other hand, the counters are incremented in XLogInsertRecord() > and I think we don't want add instructions there. I don't really buy this. Setting a boolean to true, in a cacheline you're already touching, isn't that much compared to all the other stuff in there. The branch to check if wal stats timing etc is enabled is much more expensive. I think we should just set a boolean to true and leave it at that. Greetings, Andres Freund
At Fri, 26 Mar 2021 10:07:45 -0700, Andres Freund <andres@anarazel.de> wrote in > Hi, > > On 2021-03-25 16:37:10 +0900, Kyotaro Horiguchi wrote: > > On the other hand, the counters are incremented in XLogInsertRecord() > > and I think we don't want add instructions there. > > I don't really buy this. Setting a boolean to true, in a cacheline > you're already touching, isn't that much compared to all the other stuff > in there. The branch to check if wal stats timing etc is enabled is much > more expensive. I think we should just set a boolean to true and leave > it at that. Hmm. Yes, I agree to you in that opinion. I (remember I) was told not to add even a cycle to the hot path as far as we can avoid when I tried something like that. So I'm happy to +1 for that if it is the consensus here, since it is cleaner. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Mon, 29 Mar 2021 11:09:00 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > At Fri, 26 Mar 2021 10:07:45 -0700, Andres Freund <andres@anarazel.de> wrote in > > Hi, > > > > On 2021-03-25 16:37:10 +0900, Kyotaro Horiguchi wrote: > > > On the other hand, the counters are incremented in XLogInsertRecord() > > > and I think we don't want add instructions there. > > > > I don't really buy this. Setting a boolean to true, in a cacheline > > you're already touching, isn't that much compared to all the other stuff > > in there. The branch to check if wal stats timing etc is enabled is much > > more expensive. I think we should just set a boolean to true and leave > > it at that. > > Hmm. Yes, I agree to you in that opinion. I (remember I) was told not It might sound differently.. To be precise, "I had the same opinion with you". > to add even a cycle to the hot path as far as we can avoid when I > tried something like that. > > So I'm happy to +1 for that if it is the consensus here, since it is > cleaner. -- Kyotaro Horiguchi NTT Open Source Software Center
I update the patch since there were my misunderstanding points. On 2021/03/26 16:20, Masahiro Ikeda wrote: > Thanks for many your suggestions! > I made the patch to handle the issues. > >> 1) What is the motivation to have both prevWalUsage and pgWalUsage, >> instead of just accumulating the stats since the last submission? >> There doesn't seem to be any comment explaining it? Computing >> differences to previous values, and copying to prev*, isn't free. I >> assume this is out of a fear that the stats could get reset before >> they're used for instrument.c purposes - but who knows? > > I removed the unnecessary code copying pgWalUsage and just reset the > pgWalUsage after reporting the stats in pgstat_report_wal(). I didn't change this. >> 2) Why is there both pgstat_send_wal() and pgstat_report_wal()? With the >> former being used by wal writer, the latter by most other processes? >> There again don't seem to be comments explaining this. > > I added the comments why two functions are separated. > (But is it better to merge them?) I updated the comments for following reasons. >> 3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0) >> just to figure out if there's been any changes isn't all that >> cheap. This is regularly exercised in read-only workloads too. Seems >> adding a boolean WalStats.have_pending = true or such would be >> better. >> 4) For plain backends pgstat_report_wal() is called by >> pgstat_report_stat() - but it is not checked as part of the "Don't >> expend a clock check if nothing to do" check at the top. It's >> probably rare to have pending wal stats without also passing one of >> the other conditions, but ... > > I added the logic to check if the stats counters are updated or not in > pgstat_report_stat() using not only generated wal record but also write/sync > counters, and it can skip to call reporting function. I removed the checking code whether the wal stats counters were updated or not in pgstat_report_stat() since I couldn't understand the valid case yet. pgstat_report_stat() is called by backends when the transaction is finished. This means that the condition seems always pass. I thought to implement if the WAL stats counters were not updated, skip to send all statistics including the counters for databases and so on. But I think it's not good because it may take more time to be reflected the generated stats by read-only transaction. > Although I added the condition which the write/sync counters are updated or > not, I haven't understood the following case yet...Sorry. I checked related > code and tested to insert large object, but I couldn't. I'll investigate more > deeply, but if you already know the function which leads the following case, > please let me know. I understood the above case (Fujii-san, thanks for your advice in person). When to flush buffers, for example, to select buffer replacement victim, it requires a WAL flush if the buffer is dirty. So, to check the WAL stats counters are updated or not, I check the number of generated wal record and the counter of syncing in pgstat_report_wal(). The reason why not to check the counter of writing is that if to write is happened, to sync is happened too in the above case. I added the comments in the patch. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
At Tue, 30 Mar 2021 09:41:24 +0900, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote in > I update the patch since there were my misunderstanding points. > > On 2021/03/26 16:20, Masahiro Ikeda wrote: > > Thanks for many your suggestions! > > I made the patch to handle the issues. > > > >> 1) What is the motivation to have both prevWalUsage and pgWalUsage, > >> instead of just accumulating the stats since the last submission? > >> There doesn't seem to be any comment explaining it? Computing > >> differences to previous values, and copying to prev*, isn't free. I > >> assume this is out of a fear that the stats could get reset before > >> they're used for instrument.c purposes - but who knows? > > > > I removed the unnecessary code copying pgWalUsage and just reset the > > pgWalUsage after reporting the stats in pgstat_report_wal(). > > I didn't change this. > > >> 2) Why is there both pgstat_send_wal() and pgstat_report_wal()? With the > >> former being used by wal writer, the latter by most other processes? > >> There again don't seem to be comments explaining this. > > > > I added the comments why two functions are separated. > > (But is it better to merge them?) > > I updated the comments for following reasons. > > > >> 3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0) > >> just to figure out if there's been any changes isn't all that > >> cheap. This is regularly exercised in read-only workloads too. Seems > >> adding a boolean WalStats.have_pending = true or such would be > >> better. > >> 4) For plain backends pgstat_report_wal() is called by > >> pgstat_report_stat() - but it is not checked as part of the "Don't > >> expend a clock check if nothing to do" check at the top. It's > >> probably rare to have pending wal stats without also passing one of > >> the other conditions, but ... > > > > I added the logic to check if the stats counters are updated or not in > > pgstat_report_stat() using not only generated wal record but also write/sync > > counters, and it can skip to call reporting function. > > I removed the checking code whether the wal stats counters were updated or not > in pgstat_report_stat() since I couldn't understand the valid case yet. > pgstat_report_stat() is called by backends when the transaction is finished. > This means that the condition seems always pass. Doesn't the same holds for all other counters? If you are saying that "wal counters should be zero if all other stats counters are zero", we need an assertion to check that and a comment to explain that. I personally find it safer to add the WAL-stats condition to the fast-return check, rather than addin such assertion. pgstat_send_wal() is called mainly from pgstat_report_wal() which consumes pgWalStats counters and WalWriterMain() which doesn't. Checking on pgWalStats counters isn't so complex that we need to avoid that in wal writer, thus *I* think pgstat_send_wal() and pgstat_report_wal() can be consolidated. Even if you have a strong opinion that wal writer should call a separate function, the name should be something other than pgstat_send_wal() since it ignores pgWalUsage counters, which are supposed to be included in "WAL stats". > I thought to implement if the WAL stats counters were not updated, skip to > send all statistics including the counters for databases and so on. But I > think it's not good because it may take more time to be reflected the > generated stats by read-only transaction. Ur, yep. > > Although I added the condition which the write/sync counters are updated or > > not, I haven't understood the following case yet...Sorry. I checked related > > code and tested to insert large object, but I couldn't. I'll investigate more > > deeply, but if you already know the function which leads the following case, > > please let me know. > > I understood the above case (Fujii-san, thanks for your advice in person). > When to flush buffers, for example, to select buffer replacement victim, > it requires a WAL flush if the buffer is dirty. So, to check the WAL stats > counters are updated or not, I check the number of generated wal record and > the counter of syncing in pgstat_report_wal(). > > The reason why not to check the counter of writing is that if to write is > happened, to sync is happened too in the above case. I added the comments in > the patch. Mmm.. Although I couldn't read you clearly, The fact that a flush may happen without a write means the reverse at the same time, a write may happen without a flush. For asynchronous commits, WAL-write happens alone unaccompanied by a flush. And the corresponding flush would happen later without writes. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On 2021/03/30 17:28, Kyotaro Horiguchi wrote: > At Tue, 30 Mar 2021 09:41:24 +0900, Masahiro Ikeda <ikedamsh@oss.nttdata.com> wrote in >> I update the patch since there were my misunderstanding points. >> >> On 2021/03/26 16:20, Masahiro Ikeda wrote: >>> Thanks for many your suggestions! >>> I made the patch to handle the issues. >>> >>>> 1) What is the motivation to have both prevWalUsage and pgWalUsage, >>>> instead of just accumulating the stats since the last submission? >>>> There doesn't seem to be any comment explaining it? Computing >>>> differences to previous values, and copying to prev*, isn't free. I >>>> assume this is out of a fear that the stats could get reset before >>>> they're used for instrument.c purposes - but who knows? >>> >>> I removed the unnecessary code copying pgWalUsage and just reset the >>> pgWalUsage after reporting the stats in pgstat_report_wal(). >> >> I didn't change this. >> >>>> 2) Why is there both pgstat_send_wal() and pgstat_report_wal()? With the >>>> former being used by wal writer, the latter by most other processes? >>>> There again don't seem to be comments explaining this. >>> >>> I added the comments why two functions are separated. >>> (But is it better to merge them?) >> >> I updated the comments for following reasons. >> >> >>>> 3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0) >>>> just to figure out if there's been any changes isn't all that >>>> cheap. This is regularly exercised in read-only workloads too. Seems >>>> adding a boolean WalStats.have_pending = true or such would be >>>> better. >>>> 4) For plain backends pgstat_report_wal() is called by >>>> pgstat_report_stat() - but it is not checked as part of the "Don't >>>> expend a clock check if nothing to do" check at the top. It's >>>> probably rare to have pending wal stats without also passing one of >>>> the other conditions, but ... >>> >>> I added the logic to check if the stats counters are updated or not in >>> pgstat_report_stat() using not only generated wal record but also write/sync >>> counters, and it can skip to call reporting function. >> >> I removed the checking code whether the wal stats counters were updated or not >> in pgstat_report_stat() since I couldn't understand the valid case yet. >> pgstat_report_stat() is called by backends when the transaction is finished. >> This means that the condition seems always pass. > > Doesn't the same holds for all other counters? If you are saying that > "wal counters should be zero if all other stats counters are zero", we > need an assertion to check that and a comment to explain that. > > I personally find it safer to add the WAL-stats condition to the > fast-return check, rather than addin such assertion. Thanks for your comments. OK, I added the condition to the fast-return check. I noticed that I misunderstood that the purpose is to avoid expanding a clock check using WAL stats counters. But, the purpose is to make the conditions stricter, right? > pgstat_send_wal() is called mainly from pgstat_report_wal() which > consumes pgWalStats counters and WalWriterMain() which > doesn't. Checking on pgWalStats counters isn't so complex that we need > to avoid that in wal writer, thus *I* think pgstat_send_wal() and > pgstat_report_wal() can be consolidated. Even if you have a strong > opinion that wal writer should call a separate function, the name > should be something other than pgstat_send_wal() since it ignores > pgWalUsage counters, which are supposed to be included in "WAL stats". OK, I consolidated them. >> I thought to implement if the WAL stats counters were not updated, skip to >> send all statistics including the counters for databases and so on. But I >> think it's not good because it may take more time to be reflected the >> generated stats by read-only transaction. > > Ur, yep. > >>> Although I added the condition which the write/sync counters are updated or >>> not, I haven't understood the following case yet...Sorry. I checked related >>> code and tested to insert large object, but I couldn't. I'll investigate more >>> deeply, but if you already know the function which leads the following case, >>> please let me know. >> >> I understood the above case (Fujii-san, thanks for your advice in person). >> When to flush buffers, for example, to select buffer replacement victim, >> it requires a WAL flush if the buffer is dirty. So, to check the WAL stats >> counters are updated or not, I check the number of generated wal record and >> the counter of syncing in pgstat_report_wal(). >> >> The reason why not to check the counter of writing is that if to write is >> happened, to sync is happened too in the above case. I added the comments in >> the patch. > > Mmm.. Although I couldn't read you clearly, The fact that a flush may > happen without a write means the reverse at the same time, a write may > happen without a flush. For asynchronous commits, WAL-write happens > alone unaccompanied by a flush. And the corresponding flush would > happen later without writes. Sorry, I didn't explain it enough. For processes which may generate WAL records like backends, I thought it's enough to check (1)the number of generated WAL records and (2)the counters of syncing(flushing) the WAL. This is checked in pgstat_report_wal(). Sorry for that I didn't mention (1) in the above thread. If a backend execute a write transaction, some WAL records must be generated. So, it's ok to check (1) only regardless of whether asynchronous commit is enabled or not. OHOT, if a backend execute a read-only transaction, WAL records won't be generated (although HOT makes a wal records exactly...). But, WAL-write and flush may happen when to flush buffers via XLogFlush(). In this case, if WAL-write happened, flush must be happen later. But, if my understanding is correct, there is no a case to flush doesn't happen, but to write happen. So, I thought (2) is needed and it's enough to check the counter of syncing(flushing). Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
On 2021/03/30 20:37, Masahiro Ikeda wrote: > OK, I added the condition to the fast-return check. I noticed that I > misunderstood that the purpose is to avoid expanding a clock check using WAL > stats counters. But, the purpose is to make the conditions stricter, right? Yes. Currently if the following condition is false even when the WAL counters are updated, nothing is sent to the stats collector. But with your patch, in this case the WAL stats are sent. if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) && pgStatXactCommit == 0 && pgStatXactRollback == 0 && !have_function_stats && !disconnect) Thanks for the patch! It now fails to be applied to the master cleanly. So could you rebase the patch? pgstat_initialize() should initialize pgWalUsage with zero? + /* + * set the counters related to generated WAL data. + */ + WalStats.m_wal_records = pgWalUsage.wal_records; + WalStats.m_wal_fpi = pgWalUsage.wal_fpi; + WalStats.m_wal_bytes = pgWalUsage.wal_bytes; This should be skipped if pgWalUsage.wal_records is zero? + * Be careful that the counters are cleared after reporting them to + * the stats collector although you can use WalUsageAccumDiff() + * to computing differences to previous values. For backends, + * the counters may be reset after a transaction is finished and + * pgstat_send_wal() is invoked, so you can compute the difference + * in the same transaction only. On the second thought, I'm afraid that this can be likely to be a foot-gun in the future. So I'm now wondering if the current approach (i.e., calculate the diff between the current and previous pgWalUsage and don't reset it to zero) is better. Thought? Since the similar data structure pgBufferUsage doesn't have this kind of restriction, I'm afraid that the developer can easily miss that only pgWalUsage has the restriction. But currently the diff is calculated (i.e., WalUsageAccumDiff() is called) even when the WAL counters are not updated. Then if that calculated diff is zero, we skip sending the WAL stats. This logic should be improved. That is, probably we should be able to check whether the WAL counters are updated or not, without calculating the diff, because the calculation is not free. We can implement this by introducing new flag variable that we discussed upthread. This flag is set to true whenever the WAL counters are incremented and used to determine whether the WAL stats need to be sent. If we do this, another issue is that the data types for wal_records and wal_fpi in pgWalUsage are long. Which may lead to overflow? If yes, it's should be replaced with uint64. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2021/04/13 9:33, Fujii Masao wrote: > > > On 2021/03/30 20:37, Masahiro Ikeda wrote: >> OK, I added the condition to the fast-return check. I noticed that I >> misunderstood that the purpose is to avoid expanding a clock check using WAL >> stats counters. But, the purpose is to make the conditions stricter, right? > > Yes. Currently if the following condition is false even when the WAL counters > are updated, nothing is sent to the stats collector. But with your patch, > in this case the WAL stats are sent. > > if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) && > pgStatXactCommit == 0 && pgStatXactRollback == 0 && > !have_function_stats && !disconnect) > > Thanks for the patch! It now fails to be applied to the master cleanly. > So could you rebase the patch? Thanks for your comments! I rebased it. > pgstat_initialize() should initialize pgWalUsage with zero? Thanks. But, I didn't handle it because I undo the logic to calculate the diff as you mentioned below. > + /* > + * set the counters related to generated WAL data. > + */ > + WalStats.m_wal_records = pgWalUsage.wal_records; > + WalStats.m_wal_fpi = pgWalUsage.wal_fpi; > + WalStats.m_wal_bytes = pgWalUsage.wal_bytes; > > This should be skipped if pgWalUsage.wal_records is zero? Yes, fixed it. > + * Be careful that the counters are cleared after reporting them to > + * the stats collector although you can use WalUsageAccumDiff() > + * to computing differences to previous values. For backends, > + * the counters may be reset after a transaction is finished and > + * pgstat_send_wal() is invoked, so you can compute the difference > + * in the same transaction only. > > On the second thought, I'm afraid that this can be likely to be a foot-gun > in the future. So I'm now wondering if the current approach (i.e., calculate > the diff between the current and previous pgWalUsage and don't reset it > to zero) is better. Thought? Since the similar data structure pgBufferUsage > doesn't have this kind of restriction, I'm afraid that the developer can > easily miss that only pgWalUsage has the restriction. > > But currently the diff is calculated (i.e., WalUsageAccumDiff() is called) > even when the WAL counters are not updated. Then if that calculated diff is > zero, we skip sending the WAL stats. This logic should be improved. That is, > probably we should be able to check whether the WAL counters are updated > or not, without calculating the diff, because the calculation is not free. > We can implement this by introducing new flag variable that we discussed > upthread. This flag is set to true whenever the WAL counters are incremented > and used to determine whether the WAL stats need to be sent. Sound reasonable. I agreed that the restriction has a risk to lead mistakes. I made the patch introducing a new flag. - v4-0001-performance-improvements-of-reporting-wal-stats.patch I think introducing a new flag is not necessary because we can know if the WAL stats are updated or not using the counters of the generated wal record, wal writes and wal syncs. It's fast compared to get timestamp. The attached patch is to check if the counters are updated or not using them. - v4-0001-performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch > If we do this, another issue is that the data types for wal_records and wal_fpi > in pgWalUsage are long. Which may lead to overflow? If yes, it's should be > replaced with uint64. Yes, I fixed. BufferUsage's counters have same issue, so I fixed them too. BTW, is it better to change PgStat_Counter from int64 to uint64 because there aren't any counters with negative number? Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
On 2021-04-16 10:27, Masahiro Ikeda wrote: > On 2021/04/13 9:33, Fujii Masao wrote: >> >> >> On 2021/03/30 20:37, Masahiro Ikeda wrote: >>> OK, I added the condition to the fast-return check. I noticed that I >>> misunderstood that the purpose is to avoid expanding a clock check >>> using WAL >>> stats counters. But, the purpose is to make the conditions stricter, >>> right? >> >> Yes. Currently if the following condition is false even when the WAL >> counters >> are updated, nothing is sent to the stats collector. But with your >> patch, >> in this case the WAL stats are sent. >> >> if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) && >> pgStatXactCommit == 0 && pgStatXactRollback == 0 && >> !have_function_stats && !disconnect) >> >> Thanks for the patch! It now fails to be applied to the master >> cleanly. >> So could you rebase the patch? > > Thanks for your comments! > I rebased it. Thanks for working on this! I have some minor comments on performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch. 177 @@ -3094,20 +3066,33 @@ pgstat_report_wal(void) 178 * Return true if the message is sent, and false otherwise. Since you changed the return value to void, it seems the description is not necessary anymore. 208 + * generate wal records. 'wal_writes' and 'wal_sync' are zero means the It may be better to change 'wal_writes' to 'wal_write' since single quotation seems to mean variable name. 234 + * set the counters related to generated WAL data if the counters are set -> Set? Regards,
On 2021/04/21 15:08, torikoshia wrote: > On 2021-04-16 10:27, Masahiro Ikeda wrote: >> On 2021/04/13 9:33, Fujii Masao wrote: >>> >>> >>> On 2021/03/30 20:37, Masahiro Ikeda wrote: >>>> OK, I added the condition to the fast-return check. I noticed that I >>>> misunderstood that the purpose is to avoid expanding a clock check >>>> using WAL stats counters. But, the purpose is to make the conditions >>>> stricter, right? >>> >>> Yes. Currently if the following condition is false even when the WAL >>> counters are updated, nothing is sent to the stats collector. But with >>> your patch, in this case the WAL stats are sent. >>> >>> if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) && >>> pgStatXactCommit == 0 && pgStatXactRollback == 0 && >>> !have_function_stats && !disconnect) >>> >>> Thanks for the patch! It now fails to be applied to the master >>> cleanly. So could you rebase the patch? >> >> Thanks for your comments! I rebased it. > > Thanks for working on this! Hi, thanks for your comments! > I have some minor comments on > performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch. > > > > > 177 @@ -3094,20 +3066,33 @@ pgstat_report_wal(void) 178 * Return true if > the message is sent, and false otherwise. > > Since you changed the return value to void, it seems the description is not > necessary anymore. Right, I fixed it. > 208 + * generate wal records. 'wal_writes' and 'wal_sync' are zero > means the > > It may be better to change 'wal_writes' to 'wal_write' since single > quotation seems to mean variable name. Agreed. > 234 + * set the counters related to generated WAL data if the > counters are > > > set -> Set? Yes, I fixed. > BTW, is it better to change PgStat_Counter from int64 to uint64 because> there aren't any counters with negative number? Although this is not related to torikoshi-san's comment, the above my understanding is not right. Some counters like delta_live_tuples, delta_dead_tuples and changed_tuples can be negative. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
On 2021/04/21 18:31, Masahiro Ikeda wrote: >> BTW, is it better to change PgStat_Counter from int64 to uint64 because> there aren't any counters with negative number? On second thought, it's ok even if the counters like wal_records get overflowed. Because they are always used to calculate the diff between the previous and current counters. Even when the current counters are overflowed and the previous ones are not, WalUsageAccumDiff() seems to return the right diff of them. If this understanding is right, I'd withdraw my comment and it's ok to use "long" type for those counters. Thought? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Hi On Thu, Apr 22, 2021, at 06:42, Fujii Masao wrote: > > > On 2021/04/21 18:31, Masahiro Ikeda wrote: > >> BTW, is it better to change PgStat_Counter from int64 to uint64 because> there aren't any counters with negative number? > > On second thought, it's ok even if the counters like wal_records get overflowed. > Because they are always used to calculate the diff between the previous and > current counters. Even when the current counters are overflowed and > the previous ones are not, WalUsageAccumDiff() seems to return the right > diff of them. If this understanding is right, I'd withdraw my comment and > it's ok to use "long" type for those counters. Thought? Why long? It's of a platform dependent size, which doesn't seem useful here? Andres
On 2021/04/23 0:36, Andres Freund wrote: > Hi > > On Thu, Apr 22, 2021, at 06:42, Fujii Masao wrote: >> On 2021/04/21 18:31, Masahiro Ikeda wrote: >>>> BTW, is it better to change PgStat_Counter from int64 to uint64 because> there aren't any counters with negative number? >> On second thought, it's ok even if the counters like wal_records get overflowed. >> Because they are always used to calculate the diff between the previous and >> current counters. Even when the current counters are overflowed and >> the previous ones are not, WalUsageAccumDiff() seems to return the right >> diff of them. If this understanding is right, I'd withdraw my comment and >> it's ok to use "long" type for those counters. Thought? > > Why long? It's of a platform dependent size, which doesn't seem useful here? I think it's ok to unify uint64. Although it's better to use small size for cache, the idea works well for only some platform which use 4bytes for "long". (Although I'm not familiar with the standardization...) It seems that we need to adopt unsinged long if use "long", which may be 4bytes. I though WalUsageAccumDiff() seems to return the right value too. But, I researched deeply and found that ISO/IEC 9899:1999 defines unsinged integer never overflow(2.6.5 Types 9th section) although it doesn't define overflow of signed integer type. If my understanding is right, the definition only guarantee WalUsageAccumDiff() returns the right value only if the type is unsigned integer. So, it's safe to change "long" to "unsigned long". Regards, -- Masahiro Ikeda NTT DATA CORPORATION
On 2021/04/23 9:26, Masahiro Ikeda wrote: > > > On 2021/04/23 0:36, Andres Freund wrote: >> Hi >> >> On Thu, Apr 22, 2021, at 06:42, Fujii Masao wrote: >>> On 2021/04/21 18:31, Masahiro Ikeda wrote: >>>>> BTW, is it better to change PgStat_Counter from int64 to uint64 because> there aren't any counters with negative number? >>> On second thought, it's ok even if the counters like wal_records get overflowed. >>> Because they are always used to calculate the diff between the previous and >>> current counters. Even when the current counters are overflowed and >>> the previous ones are not, WalUsageAccumDiff() seems to return the right >>> diff of them. If this understanding is right, I'd withdraw my comment and >>> it's ok to use "long" type for those counters. Thought? >> >> Why long? It's of a platform dependent size, which doesn't seem useful here? I'm not sure why "long" was chosen for the counters in BufferUsage. And then I guess that maybe we didn't change that because using "long" for them caused no actual issue in practice. > I think it's ok to unify uint64. Although it's better to use small size for > cache, the idea works well for only some platform which use 4bytes for "long". > > > (Although I'm not familiar with the standardization...) > It seems that we need to adopt unsinged long if use "long", which may be 4bytes. > > I though WalUsageAccumDiff() seems to return the right value too. But, I > researched deeply and found that ISO/IEC 9899:1999 defines unsinged integer > never overflow(2.6.5 Types 9th section) although it doesn't define overflow of > signed integer type. > > If my understanding is right, the definition only guarantee > WalUsageAccumDiff() returns the right value only if the type is unsigned > integer. So, it's safe to change "long" to "unsigned long". Yes, we can change the counters so they use uint64. But if we do that, ISTM that we need to change the code more than your patch does. For example, even with the patch, pg_stat_statements uses Int64GetDatumFast() to report the counter like shared_blks_hit, but this should be changed? For example, "%ld" should be changed to "%llu" at the following code in vacuumlazy.c? I think that we should check all codes that use the counters whose types are changed to uint64. appendStringInfo(&buf, _("WAL usage: %ld records, %ld full page images, %llu bytes"), walusage.wal_records, walusage.wal_fpi, (unsigned long long) walusage.wal_bytes); Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Hi, On 2021-04-23 09:26:17 +0900, Masahiro Ikeda wrote: > On 2021/04/23 0:36, Andres Freund wrote: > > On Thu, Apr 22, 2021, at 06:42, Fujii Masao wrote: > >> On 2021/04/21 18:31, Masahiro Ikeda wrote: > >>>> BTW, is it better to change PgStat_Counter from int64 to uint64 because> there aren't any counters with negative number? > >> On second thought, it's ok even if the counters like wal_records get overflowed. > >> Because they are always used to calculate the diff between the previous and > >> current counters. Even when the current counters are overflowed and > >> the previous ones are not, WalUsageAccumDiff() seems to return the right > >> diff of them. If this understanding is right, I'd withdraw my comment and > >> it's ok to use "long" type for those counters. Thought? > > > > Why long? It's of a platform dependent size, which doesn't seem useful here? > > I think it's ok to unify uint64. Although it's better to use small size for > cache, the idea works well for only some platform which use 4bytes for "long". > > > (Although I'm not familiar with the standardization...) > It seems that we need to adopt unsinged long if use "long", which may be 4bytes. > > I though WalUsageAccumDiff() seems to return the right value too. But, I > researched deeply and found that ISO/IEC 9899:1999 defines unsinged integer > never overflow(2.6.5 Types 9th section) although it doesn't define overflow of > signed integer type. > > If my understanding is right, the definition only guarantee > WalUsageAccumDiff() returns the right value only if the type is unsigned > integer. So, it's safe to change "long" to "unsigned long". I think this should just use 64bit counters. Emitting more than 4 billion records in one transaction isn't actually particularly rare. And obviously WalUsageAccumDiff() can't do anything about that, once overflowed it overflowed. Greetings, Andres Freund
On 2021/04/23 10:25, Andres Freund wrote: > Hi, > > On 2021-04-23 09:26:17 +0900, Masahiro Ikeda wrote: >> On 2021/04/23 0:36, Andres Freund wrote: >>> On Thu, Apr 22, 2021, at 06:42, Fujii Masao wrote: >>>> On 2021/04/21 18:31, Masahiro Ikeda wrote: >>>>>> BTW, is it better to change PgStat_Counter from int64 to uint64 because> there aren't any counters with negative number? >>>> On second thought, it's ok even if the counters like wal_records get overflowed. >>>> Because they are always used to calculate the diff between the previous and >>>> current counters. Even when the current counters are overflowed and >>>> the previous ones are not, WalUsageAccumDiff() seems to return the right >>>> diff of them. If this understanding is right, I'd withdraw my comment and >>>> it's ok to use "long" type for those counters. Thought? >>> >>> Why long? It's of a platform dependent size, which doesn't seem useful here? >> >> I think it's ok to unify uint64. Although it's better to use small size for >> cache, the idea works well for only some platform which use 4bytes for "long". >> >> >> (Although I'm not familiar with the standardization...) >> It seems that we need to adopt unsinged long if use "long", which may be 4bytes. >> >> I though WalUsageAccumDiff() seems to return the right value too. But, I >> researched deeply and found that ISO/IEC 9899:1999 defines unsinged integer >> never overflow(2.6.5 Types 9th section) although it doesn't define overflow of >> signed integer type. >> >> If my understanding is right, the definition only guarantee >> WalUsageAccumDiff() returns the right value only if the type is unsigned >> integer. So, it's safe to change "long" to "unsigned long". > > I think this should just use 64bit counters. Emitting more than 4 > billion records in one transaction isn't actually particularly rare. And Right. I agree to change the types of the counters to int64. I think that it's better to make this change as a separate patch from the changes for pg_stat_wal view. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2021/04/23 16:30, Fujii Masao wrote: > > > On 2021/04/23 10:25, Andres Freund wrote: >> Hi, >> >> On 2021-04-23 09:26:17 +0900, Masahiro Ikeda wrote: >>> On 2021/04/23 0:36, Andres Freund wrote: >>>> On Thu, Apr 22, 2021, at 06:42, Fujii Masao wrote: >>>>> On 2021/04/21 18:31, Masahiro Ikeda wrote: >>>>>>> BTW, is it better to change PgStat_Counter from int64 to uint64 >>>>>>> because> there aren't any counters with negative number? >>>>> On second thought, it's ok even if the counters like wal_records get >>>>> overflowed. >>>>> Because they are always used to calculate the diff between the previous and >>>>> current counters. Even when the current counters are overflowed and >>>>> the previous ones are not, WalUsageAccumDiff() seems to return the right >>>>> diff of them. If this understanding is right, I'd withdraw my comment and >>>>> it's ok to use "long" type for those counters. Thought? >>>> >>>> Why long? It's of a platform dependent size, which doesn't seem useful here? >>> >>> I think it's ok to unify uint64. Although it's better to use small size for >>> cache, the idea works well for only some platform which use 4bytes for "long". >>> >>> >>> (Although I'm not familiar with the standardization...) >>> It seems that we need to adopt unsinged long if use "long", which may be >>> 4bytes. >>> >>> I though WalUsageAccumDiff() seems to return the right value too. But, I >>> researched deeply and found that ISO/IEC 9899:1999 defines unsinged integer >>> never overflow(2.6.5 Types 9th section) although it doesn't define overflow of >>> signed integer type. >>> >>> If my understanding is right, the definition only guarantee >>> WalUsageAccumDiff() returns the right value only if the type is unsigned >>> integer. So, it's safe to change "long" to "unsigned long". >> >> I think this should just use 64bit counters. Emitting more than 4 >> billion records in one transaction isn't actually particularly rare. And > > Right. I agree to change the types of the counters to int64. > > I think that it's better to make this change as a separate patch from > the changes for pg_stat_wal view. Thanks for your comments. OK, I separate two patches. First patch has only the changes for pg_stat_wal view. ("v6-0001-performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch") Second one has the changes for the type of the BufferUsage's and WalUsage's members. I change the type from long to int64. Is it better to make new thread? ("v6-0002-change-the-data-type-of-XXXUsage-from-long-to-int64.patch") Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
On 2021/04/26 10:11, Masahiro Ikeda wrote: > > > On 2021/04/23 16:30, Fujii Masao wrote: >> >> >> On 2021/04/23 10:25, Andres Freund wrote: >>> Hi, >>> >>> On 2021-04-23 09:26:17 +0900, Masahiro Ikeda wrote: >>>> On 2021/04/23 0:36, Andres Freund wrote: >>>>> On Thu, Apr 22, 2021, at 06:42, Fujii Masao wrote: >>>>>> On 2021/04/21 18:31, Masahiro Ikeda wrote: >>>>>>>> BTW, is it better to change PgStat_Counter from int64 to uint64 >>>>>>>> because> there aren't any counters with negative number? >>>>>> On second thought, it's ok even if the counters like wal_records get >>>>>> overflowed. >>>>>> Because they are always used to calculate the diff between the previous and >>>>>> current counters. Even when the current counters are overflowed and >>>>>> the previous ones are not, WalUsageAccumDiff() seems to return the right >>>>>> diff of them. If this understanding is right, I'd withdraw my comment and >>>>>> it's ok to use "long" type for those counters. Thought? >>>>> >>>>> Why long? It's of a platform dependent size, which doesn't seem useful here? >>>> >>>> I think it's ok to unify uint64. Although it's better to use small size for >>>> cache, the idea works well for only some platform which use 4bytes for "long". >>>> >>>> >>>> (Although I'm not familiar with the standardization...) >>>> It seems that we need to adopt unsinged long if use "long", which may be >>>> 4bytes. >>>> >>>> I though WalUsageAccumDiff() seems to return the right value too. But, I >>>> researched deeply and found that ISO/IEC 9899:1999 defines unsinged integer >>>> never overflow(2.6.5 Types 9th section) although it doesn't define overflow of >>>> signed integer type. >>>> >>>> If my understanding is right, the definition only guarantee >>>> WalUsageAccumDiff() returns the right value only if the type is unsigned >>>> integer. So, it's safe to change "long" to "unsigned long". >>> >>> I think this should just use 64bit counters. Emitting more than 4 >>> billion records in one transaction isn't actually particularly rare. And >> >> Right. I agree to change the types of the counters to int64. >> >> I think that it's better to make this change as a separate patch from >> the changes for pg_stat_wal view. > > Thanks for your comments. > OK, I separate two patches. Thanks! > > First patch has only the changes for pg_stat_wal view. > ("v6-0001-performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch") + pgWalUsage.wal_records == prevWalUsage.wal_records && + walStats.wal_write == 0 && walStats.wal_sync == 0 && WalStats.m_wal_write should be checked here instead of walStats.wal_write? Is there really the case where the number of sync is larger than zero when the number of writes is zero? If not, it's enough to check only the number of writes? + * wal records weren't generated. So, the counters of 'wal_fpi', + * 'wal_bytes', 'm_wal_buffers_full' are not updated neither. It's better to add the assertion check that confirms m_wal_buffers_full == 0 whenever wal_records is larger than zero? > > Second one has the changes for the type of the BufferUsage's and WalUsage's > members. I change the type from long to int64. Is it better to make new thread? > ("v6-0002-change-the-data-type-of-XXXUsage-from-long-to-int64.patch") Will review the patch later. I'm ok to discuss that in this thread. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2021/04/27 21:56, Fujii Masao wrote: > > > On 2021/04/26 10:11, Masahiro Ikeda wrote: >> >> First patch has only the changes for pg_stat_wal view. >> ("v6-0001-performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch") >> > > + pgWalUsage.wal_records == prevWalUsage.wal_records && > + walStats.wal_write == 0 && walStats.wal_sync == 0 && > > WalStats.m_wal_write should be checked here instead of walStats.wal_write? Thanks! Yes, I'll fix it. > Is there really the case where the number of sync is larger than zero when > the number of writes is zero? If not, it's enough to check only the number > of writes? I thought that there is the case if "wal_sync_method" is fdatasync, fsync or fsync_writethrough. The example case is following. (1) backend-1 writes the wal data because wal buffer has no space. But, it doesn't sync the wal data. (2) backend-2 reads data pages. In the execution, it need to write and sync the wal because dirty pages is selected as victim pages. backend-2 need to only sync the wal data because the wal data were already written by backend-1, but they weren't synced. I'm ok to change it since it's rare case. > + * wal records weren't generated. So, the counters of 'wal_fpi', > + * 'wal_bytes', 'm_wal_buffers_full' are not updated neither. > > It's better to add the assertion check that confirms > m_wal_buffers_full == 0 whenever wal_records is larger than zero? Sorry, I couldn't understand yet. I thought that m_wal_buffers_full can be larger than 0 if wal_records > 0. Do you suggest that the following assertion is needed? - if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0) - return false; + if (pgWalUsage.wal_records == prevWalUsage.wal_records && + WalStats.m_wal_write == 0 && WalStats.m_wal_sync == 0) + { + Assert(pgWalUsage.wal_fpi == 0 && pgWalUsage.wal_bytes && + WalStats.m_wal_buffers_full == 0 && WalStats.m_wal_write_time == 0 && + WalStats.m_wal_sync_time == 0); + return; + } >> Second one has the changes for the type of the BufferUsage's and WalUsage's >> members. I change the type from long to int64. Is it better to make new thread? >> ("v6-0002-change-the-data-type-of-XXXUsage-from-long-to-int64.patch") > > Will review the patch later. I'm ok to discuss that in this thread. Thanks! Regards, -- Masahiro Ikeda NTT DATA CORPORATION
On 2021/04/28 9:10, Masahiro Ikeda wrote: > > > On 2021/04/27 21:56, Fujii Masao wrote: >> >> >> On 2021/04/26 10:11, Masahiro Ikeda wrote: >>> >>> First patch has only the changes for pg_stat_wal view. >>> ("v6-0001-performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch") >>> >> >> + pgWalUsage.wal_records == prevWalUsage.wal_records && >> + walStats.wal_write == 0 && walStats.wal_sync == 0 && >>> WalStats.m_wal_write should be checked here instead of walStats.wal_write? > > Thanks! Yes, I'll fix it. Thanks! > > >> Is there really the case where the number of sync is larger than zero when >> the number of writes is zero? If not, it's enough to check only the number >> of writes? > > I thought that there is the case if "wal_sync_method" is fdatasync, fsync or > fsync_writethrough. The example case is following. > > (1) backend-1 writes the wal data because wal buffer has no space. But, it > doesn't sync the wal data. > (2) backend-2 reads data pages. In the execution, it need to write and sync > the wal because dirty pages is selected as victim pages. backend-2 need to > only sync the wal data because the wal data were already written by backend-1, > but they weren't synced. You're right. So let's leave the check of "m_wal_sync == 0". > > I'm ok to change it since it's rare case. > > >> + * wal records weren't generated. So, the counters of 'wal_fpi', >> + * 'wal_bytes', 'm_wal_buffers_full' are not updated neither. >> >> It's better to add the assertion check that confirms >> m_wal_buffers_full == 0 whenever wal_records is larger than zero? > > Sorry, I couldn't understand yet. I thought that m_wal_buffers_full can be > larger than 0 if wal_records > 0. > > Do you suggest that the following assertion is needed? > > - if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0) > - return false; > + if (pgWalUsage.wal_records == prevWalUsage.wal_records && > + WalStats.m_wal_write == 0 && WalStats.m_wal_sync == 0) > + { > + Assert(pgWalUsage.wal_fpi == 0 && pgWalUsage.wal_bytes && > + WalStats.m_wal_buffers_full == 0 && > WalStats.m_wal_write_time == 0 && > + WalStats.m_wal_sync_time == 0); > + return; > + } I was thinking to add the "Assert(WalStats.m_wal_buffers_full)" as a safe-guard because only m_wal_buffers_full is incremented in different places where wal_records, m_wal_write and m_wal_sync are incremented. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2021/04/28 9:10, Masahiro Ikeda wrote: >>> Second one has the changes for the type of the BufferUsage's and WalUsage's >>> members. I change the type from long to int64. Is it better to make new thread? >>> ("v6-0002-change-the-data-type-of-XXXUsage-from-long-to-int64.patch") >> >> Will review the patch later. I'm ok to discuss that in this thread. > > Thanks! 0002 patch looks good to me. I think we can commit this at first. Barring any objection, I will do that. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2021/05/11 16:44, Fujii Masao wrote: > > > On 2021/04/28 9:10, Masahiro Ikeda wrote: >> >> >> On 2021/04/27 21:56, Fujii Masao wrote: >>> >>> >>> On 2021/04/26 10:11, Masahiro Ikeda wrote: >>>> >>>> First patch has only the changes for pg_stat_wal view. >>>> ("v6-0001-performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch") >>>> >>>> >>> >>> + pgWalUsage.wal_records == prevWalUsage.wal_records && >>> + walStats.wal_write == 0 && walStats.wal_sync == 0 && >>>> WalStats.m_wal_write should be checked here instead of walStats.wal_write? >> >> Thanks! Yes, I'll fix it. > > Thanks! Thanks for your comments! I fixed them. >>> Is there really the case where the number of sync is larger than zero when >>> the number of writes is zero? If not, it's enough to check only the number >>> of writes? >> >> I thought that there is the case if "wal_sync_method" is fdatasync, fsync or >> fsync_writethrough. The example case is following. >> >> (1) backend-1 writes the wal data because wal buffer has no space. But, it >> doesn't sync the wal data. >> (2) backend-2 reads data pages. In the execution, it need to write and sync >> the wal because dirty pages is selected as victim pages. backend-2 need to >> only sync the wal data because the wal data were already written by backend-1, >> but they weren't synced. > > You're right. So let's leave the check of "m_wal_sync == 0". OK. >>> + * wal records weren't generated. So, the counters of 'wal_fpi', >>> + * 'wal_bytes', 'm_wal_buffers_full' are not updated neither. >>> >>> It's better to add the assertion check that confirms >>> m_wal_buffers_full == 0 whenever wal_records is larger than zero? >> >> Sorry, I couldn't understand yet. I thought that m_wal_buffers_full can be >> larger than 0 if wal_records > 0. >> >> Do you suggest that the following assertion is needed? >> >> - if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0) >> - return false; >> + if (pgWalUsage.wal_records == prevWalUsage.wal_records && >> + WalStats.m_wal_write == 0 && WalStats.m_wal_sync == 0) >> + { >> + Assert(pgWalUsage.wal_fpi == 0 && pgWalUsage.wal_bytes && >> + WalStats.m_wal_buffers_full == 0 && >> WalStats.m_wal_write_time == 0 && >> + WalStats.m_wal_sync_time == 0); >> + return; >> + } > > I was thinking to add the "Assert(WalStats.m_wal_buffers_full)" as a safe-guard > because only m_wal_buffers_full is incremented in different places where > wal_records, m_wal_write and m_wal_sync are incremented. Understood. I added the assertion for m_wal_buffers_full only. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
On 2021/05/11 18:46, Masahiro Ikeda wrote: > > > On 2021/05/11 16:44, Fujii Masao wrote: >> >> >> On 2021/04/28 9:10, Masahiro Ikeda wrote: >>> >>> >>> On 2021/04/27 21:56, Fujii Masao wrote: >>>> >>>> >>>> On 2021/04/26 10:11, Masahiro Ikeda wrote: >>>>> >>>>> First patch has only the changes for pg_stat_wal view. >>>>> ("v6-0001-performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch") >>>>> >>>>> >>>> >>>> + pgWalUsage.wal_records == prevWalUsage.wal_records && >>>> + walStats.wal_write == 0 && walStats.wal_sync == 0 && >>>>> WalStats.m_wal_write should be checked here instead of walStats.wal_write? >>> >>> Thanks! Yes, I'll fix it. >> >> Thanks! > > Thanks for your comments! > I fixed them. Thanks for updating the patch! if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) && pgStatXactCommit == 0 && pgStatXactRollback == 0 && + pgWalUsage.wal_records == prevWalUsage.wal_records && + WalStats.m_wal_write == 0 && WalStats.m_wal_sync == 0 && I'm just wondering if the above WAL activity counters need to be checked. Maybe it's not necessary because "pgStatXactCommit == 0 && pgStatXactRollback == 0" is checked? IOW, is there really the case where WAL activity counters are updated even when both pgStatXactCommit and pgStatXactRollback are zero? + if (pgWalUsage.wal_records != prevWalUsage.wal_records) + { + WalUsage walusage; + + /* + * Calculate how much WAL usage counters were increased by substracting + * the previous counters from the current ones. Fill the results in + * WAL stats message. + */ + MemSet(&walusage, 0, sizeof(WalUsage)); + WalUsageAccumDiff(&walusage, &pgWalUsage, &prevWalUsage); Isn't it better to move the code "prevWalUsage = pgWalUsage" into here? Because it's necessary only when pgWalUsage.wal_records != prevWalUsage.wal_records. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2021/05/12 19:19, Fujii Masao wrote: > > > On 2021/05/11 18:46, Masahiro Ikeda wrote: >> >> >> On 2021/05/11 16:44, Fujii Masao wrote: >>> >>> >>> On 2021/04/28 9:10, Masahiro Ikeda wrote: >>>> >>>> >>>> On 2021/04/27 21:56, Fujii Masao wrote: >>>>> >>>>> >>>>> On 2021/04/26 10:11, Masahiro Ikeda wrote: >>>>>> >>>>>> First patch has only the changes for pg_stat_wal view. >>>>>> ("v6-0001-performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch") >>>>>> >>>>>> >>>>>> >>>>> >>>>> + pgWalUsage.wal_records == prevWalUsage.wal_records && >>>>> + walStats.wal_write == 0 && walStats.wal_sync == 0 && >>>>>> WalStats.m_wal_write should be checked here instead of walStats.wal_write? >>>> >>>> Thanks! Yes, I'll fix it. >>> >>> Thanks! >> >> Thanks for your comments! >> I fixed them. > > Thanks for updating the patch! > > if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) && > pgStatXactCommit == 0 && pgStatXactRollback == 0 && > + pgWalUsage.wal_records == prevWalUsage.wal_records && > + WalStats.m_wal_write == 0 && WalStats.m_wal_sync == 0 && > > I'm just wondering if the above WAL activity counters need to be checked. > Maybe it's not necessary because "pgStatXactCommit == 0 && pgStatXactRollback > == 0" > is checked? IOW, is there really the case where WAL activity counters are updated > even when both pgStatXactCommit and pgStatXactRollback are zero? Thanks for checking. I haven't found the case yet. But, I added the condition because there is a discussion that it's safer. (It seems the mail thread chain is broken, Sorry...) I copy the discussion at the time below. https://www.postgresql.org/message-id/20210330.172843.267174731834281075.horikyota.ntt%40gmail.com >>>> 3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) == 0) >>>> just to figure out if there's been any changes isn't all that >>>> cheap. This is regularly exercised in read-only workloads too. Seems >>>> adding a boolean WalStats.have_pending = true or such would be >>>> better. >>>> 4) For plain backends pgstat_report_wal() is called by >>>> pgstat_report_stat() - but it is not checked as part of the "Don't >>>> expend a clock check if nothing to do" check at the top. It's >>>> probably rare to have pending wal stats without also passing one of >>>> the other conditions, but ... >>> >>> I added the logic to check if the stats counters are updated or not in >>> pgstat_report_stat() using not only generated wal record but also write/sync >>> counters, and it can skip to call reporting function. >> >> I removed the checking code whether the wal stats counters were updated or not >> in pgstat_report_stat() since I couldn't understand the valid case yet. >> pgstat_report_stat() is called by backends when the transaction is finished. >> This means that the condition seems always pass. > > Doesn't the same holds for all other counters? If you are saying that > "wal counters should be zero if all other stats counters are zero", we > need an assertion to check that and a comment to explain that. > > I personally find it safer to add the WAL-stats condition to the > fast-return check, rather than addin such assertion. > + if (pgWalUsage.wal_records != prevWalUsage.wal_records) > + { > + WalUsage walusage; > + > + /* > + * Calculate how much WAL usage counters were increased by substracting > + * the previous counters from the current ones. Fill the results in > + * WAL stats message. > + */ > + MemSet(&walusage, 0, sizeof(WalUsage)); > + WalUsageAccumDiff(&walusage, &pgWalUsage, &prevWalUsage); > > Isn't it better to move the code "prevWalUsage = pgWalUsage" into here? > Because it's necessary only when pgWalUsage.wal_records != > prevWalUsage.wal_records. Yes, I fixed it. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
On 2021-05-13 09:05, Masahiro Ikeda wrote: > On 2021/05/12 19:19, Fujii Masao wrote: >> >> >> On 2021/05/11 18:46, Masahiro Ikeda wrote: >>> >>> >>> On 2021/05/11 16:44, Fujii Masao wrote: >>>> >>>> >>>> On 2021/04/28 9:10, Masahiro Ikeda wrote: >>>>> >>>>> >>>>> On 2021/04/27 21:56, Fujii Masao wrote: >>>>>> >>>>>> >>>>>> On 2021/04/26 10:11, Masahiro Ikeda wrote: >>>>>>> >>>>>>> First patch has only the changes for pg_stat_wal view. >>>>>>> ("v6-0001-performance-improvements-of-reporting-wal-stats-without-introducing-a-new-variable.patch") >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> + pgWalUsage.wal_records == prevWalUsage.wal_records && >>>>>> + walStats.wal_write == 0 && walStats.wal_sync == 0 && >>>>>>> WalStats.m_wal_write should be checked here instead of >>>>>>> walStats.wal_write? >>>>> >>>>> Thanks! Yes, I'll fix it. >>>> >>>> Thanks! >>> >>> Thanks for your comments! >>> I fixed them. >> >> Thanks for updating the patch! >> >> if ((pgStatTabList == NULL || pgStatTabList->tsa_used == 0) && >> pgStatXactCommit == 0 && pgStatXactRollback == 0 && >> + pgWalUsage.wal_records == prevWalUsage.wal_records && >> + WalStats.m_wal_write == 0 && WalStats.m_wal_sync == 0 && >> >> I'm just wondering if the above WAL activity counters need to be >> checked. >> Maybe it's not necessary because "pgStatXactCommit == 0 && >> pgStatXactRollback >> == 0" >> is checked? IOW, is there really the case where WAL activity counters >> are updated >> even when both pgStatXactCommit and pgStatXactRollback are zero? > > Thanks for checking. > > I haven't found the case yet. But, I added the condition because there > is a > discussion that it's safer. > > (It seems the mail thread chain is broken, Sorry...) > I copy the discussion at the time below. > > https://www.postgresql.org/message-id/20210330.172843.267174731834281075.horikyota.ntt%40gmail.com >>>>> 3) Doing if (memcmp(&WalStats, &all_zeroes, sizeof(PgStat_MsgWal)) >>>>> == 0) >>>>> just to figure out if there's been any changes isn't all that >>>>> cheap. This is regularly exercised in read-only workloads too. >>>>> Seems >>>>> adding a boolean WalStats.have_pending = true or such would be >>>>> better. >>>>> 4) For plain backends pgstat_report_wal() is called by >>>>> pgstat_report_stat() - but it is not checked as part of the >>>>> "Don't >>>>> expend a clock check if nothing to do" check at the top. It's >>>>> probably rare to have pending wal stats without also passing one >>>>> of >>>>> the other conditions, but ... >>>> >>>> I added the logic to check if the stats counters are updated or not >>>> in >>>> pgstat_report_stat() using not only generated wal record but also >>>> write/sync >>>> counters, and it can skip to call reporting function. >>> >>> I removed the checking code whether the wal stats counters were >>> updated or not >>> in pgstat_report_stat() since I couldn't understand the valid case >>> yet. >>> pgstat_report_stat() is called by backends when the transaction is >>> finished. >>> This means that the condition seems always pass. >> >> Doesn't the same holds for all other counters? If you are saying that >> "wal counters should be zero if all other stats counters are zero", we >> need an assertion to check that and a comment to explain that. >> >> I personally find it safer to add the WAL-stats condition to the >> fast-return check, rather than addin such assertion. > > >> + if (pgWalUsage.wal_records != prevWalUsage.wal_records) >> + { >> + WalUsage walusage; >> + >> + /* >> + * Calculate how much WAL usage counters were increased by >> substracting >> + * the previous counters from the current ones. Fill the >> results in >> + * WAL stats message. >> + */ >> + MemSet(&walusage, 0, sizeof(WalUsage)); >> + WalUsageAccumDiff(&walusage, &pgWalUsage, &prevWalUsage); >> >> Isn't it better to move the code "prevWalUsage = pgWalUsage" into >> here? >> Because it's necessary only when pgWalUsage.wal_records != >> prevWalUsage.wal_records. > > Yes, I fixed it. > > > Regards, Thanks for updating the patch! > + * is executed, wal records aren't nomally generated (although HOT > makes nomally -> normally? > + * It's not enough to check the number of generated wal records, for > + * example the walwriter may write/sync the WAL although it doesn't You use both 'wal' and 'WAL' in the comments, but are there any intension? Regards,
Thanks for your comments! >> + * is executed, wal records aren't nomally generated (although HOT makes > > nomally -> normally? Yes, fixed. >> + * It's not enough to check the number of generated wal records, for >> + * example the walwriter may write/sync the WAL although it doesn't > > You use both 'wal' and 'WAL' in the comments, but are there any intension? No, I changed to use 'WAL' only. Although some other comments have 'wal' and 'WAL', it seems that 'WAL' is often used. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
On 2021/05/17 16:39, Masahiro Ikeda wrote: > > Thanks for your comments! > >>> + * is executed, wal records aren't nomally generated (although HOT makes >> >> nomally -> normally? > > Yes, fixed. > >>> + * It's not enough to check the number of generated wal records, for >>> + * example the walwriter may write/sync the WAL although it doesn't >> >> You use both 'wal' and 'WAL' in the comments, but are there any intension? > > No, I changed to use 'WAL' only. Although some other comments have 'wal' and > 'WAL', it seems that 'WAL' is often used. Thanks for updating the patch! + * Buffer and generated WAL usage counters. + * + * The counters are accumulated values. There are infrastructures + * to add the values and calculate the difference within a specific period. Is it really worth adding these comments here? + * Note: regarding to WAL statistics counters, it's not enough to check + * only whether the WAL record is generated or not. If a read transaction + * is executed, WAL records aren't normally generated (although HOT makes + * WAL records). But, just writes and syncs the WAL data may happen when + * to flush buffers. Aren't the following comments better? ------------------------------ To determine whether any WAL activity has occurred since last time, not only the number of generated WAL records but alsothe numbers of WAL writes and syncs need to be checked. Because even transaction that generates no WAL records can writeor sync WAL data when flushing the data pages. ------------------------------ - * This function can be called even if nothing at all has happened. In - * this case, avoid sending a completely empty message to the stats - * collector. I think that it's better to leave this comment as it is. + * First, to check the WAL stats counters were updated. + * + * Even if the 'force' is true, we don't need to send the stats if the + * counters were not updated. + * + * We can know whether the counters were updated or not to check only + * three counters. So, for performance, we don't allocate another memory + * spaces and check the all stats like pgstat_send_slru(). Is it really worth adding these comments here? + * The current 'wal_records' is the same as the previous one means that + * WAL records weren't generated. So, the counters of 'wal_fpi', + * 'wal_bytes', 'm_wal_buffers_full' are not updated neither. + * + * It's not enough to check the number of generated WAL records, for + * example the walwriter may write/sync the WAL although it doesn't + * generate WAL records. 'm_wal_write' and 'm_wal_sync' are zero means the + * counters of time spent are zero too. Aren't the following comments better? ------------------------ Check wal_records counter to determine whether any WAL activity has happened since last time. Note that other WalUsage countersdon't need to be checked because they are incremented always together with wal_records counter. m_wal_buffers_full also doesn't need to be checked because it's incremented only when at least one WAL record is generated(i.e., wal_records counter is incremented). But for safely, we assert that m_wal_buffers_full is always zero whenno WAL record is generated This function can be called by a process like walwriter that normally generates no WAL records. To determine whether anyWAL activity has happened at that process since the last time, the numbers of WAL writes and syncs are also checked. ------------------------ + * The accumulated counters for buffer usage. + * + * The reason the counters are accumulated values is to avoid unexpected + * reset because many callers may use them. Aren't the following comments better? ------------------------ These counters keep being incremented infinitely, i.e., must never be reset to zero, so that we can calculate how much thecounters are incremented in an arbitrary period. ------------------------ Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2021/05/17 22:34, Fujii Masao wrote: > > > On 2021/05/17 16:39, Masahiro Ikeda wrote: >> >> Thanks for your comments! >> >>>> + * is executed, wal records aren't nomally generated (although HOT makes >>> >>> nomally -> normally? >> >> Yes, fixed. >> >>>> + * It's not enough to check the number of generated wal records, for >>>> + * example the walwriter may write/sync the WAL although it doesn't >>> >>> You use both 'wal' and 'WAL' in the comments, but are there any intension? >> >> No, I changed to use 'WAL' only. Although some other comments have 'wal' and >> 'WAL', it seems that 'WAL' is often used. > > Thanks for updating the patch! Thanks a lot of comments! > + * Buffer and generated WAL usage counters. > + * > + * The counters are accumulated values. There are infrastructures > + * to add the values and calculate the difference within a specific period. > > Is it really worth adding these comments here? BufferUsage has almost same comments. So, I removed it. > + * Note: regarding to WAL statistics counters, it's not enough to check > + * only whether the WAL record is generated or not. If a read transaction > + * is executed, WAL records aren't normally generated (although HOT makes > + * WAL records). But, just writes and syncs the WAL data may happen when > + * to flush buffers. > > Aren't the following comments better? > > ------------------------------ > To determine whether any WAL activity has occurred since last time, not only > the number of generated WAL records but also the numbers of WAL writes and > syncs need to be checked. Because even transaction that generates no WAL > records can write or sync WAL data when flushing the data pages. > ------------------------------ Thanks. Yes, I fixed it. > - * This function can be called even if nothing at all has happened. In > - * this case, avoid sending a completely empty message to the stats > - * collector. > > I think that it's better to leave this comment as it is. OK. I leave it. > + * First, to check the WAL stats counters were updated. > + * > + * Even if the 'force' is true, we don't need to send the stats if the > + * counters were not updated. > + * > + * We can know whether the counters were updated or not to check only > + * three counters. So, for performance, we don't allocate another memory > + * spaces and check the all stats like pgstat_send_slru(). > > Is it really worth adding these comments here? I removed them because the following comments are enough. * This function can be called even if nothing at all has happened. In * this case, avoid sending a completely empty message to the stats * collector. > + * The current 'wal_records' is the same as the previous one means that > + * WAL records weren't generated. So, the counters of 'wal_fpi', > + * 'wal_bytes', 'm_wal_buffers_full' are not updated neither. > + * > + * It's not enough to check the number of generated WAL records, for > + * example the walwriter may write/sync the WAL although it doesn't > + * generate WAL records. 'm_wal_write' and 'm_wal_sync' are zero means the > + * counters of time spent are zero too. > > Aren't the following comments better? > > ------------------------ > Check wal_records counter to determine whether any WAL activity has happened > since last time. Note that other WalUsage counters don't need to be checked > because they are incremented always together with wal_records counter. > > m_wal_buffers_full also doesn't need to be checked because it's incremented > only when at least one WAL record is generated (i.e., wal_records counter is > incremented). But for safely, we assert that m_wal_buffers_full is always zero > when no WAL record is generated > > This function can be called by a process like walwriter that normally > generates no WAL records. To determine whether any WAL activity has happened > at that process since the last time, the numbers of WAL writes and syncs are > also checked. > ------------------------ Yes, I modified them. > + * The accumulated counters for buffer usage. > + * > + * The reason the counters are accumulated values is to avoid unexpected > + * reset because many callers may use them. > > Aren't the following comments better? > > ------------------------ > These counters keep being incremented infinitely, i.e., must never be reset to > zero, so that we can calculate how much the counters are incremented in an > arbitrary period. > ------------------------ Yes, thanks. I fixed it. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
On 2021/05/18 9:57, Masahiro Ikeda wrote: > > > On 2021/05/17 22:34, Fujii Masao wrote: >> >> >> On 2021/05/17 16:39, Masahiro Ikeda wrote: >>> >>> Thanks for your comments! >>> >>>>> + * is executed, wal records aren't nomally generated (although HOT makes >>>> >>>> nomally -> normally? >>> >>> Yes, fixed. >>> >>>>> + * It's not enough to check the number of generated wal records, for >>>>> + * example the walwriter may write/sync the WAL although it doesn't >>>> >>>> You use both 'wal' and 'WAL' in the comments, but are there any intension? >>> >>> No, I changed to use 'WAL' only. Although some other comments have 'wal' and >>> 'WAL', it seems that 'WAL' is often used. >> >> Thanks for updating the patch! > > Thanks a lot of comments! > >> + * Buffer and generated WAL usage counters. >> + * >> + * The counters are accumulated values. There are infrastructures >> + * to add the values and calculate the difference within a specific period. >> >> Is it really worth adding these comments here? > > BufferUsage has almost same comments. So, I removed it. > >> + * Note: regarding to WAL statistics counters, it's not enough to check >> + * only whether the WAL record is generated or not. If a read transaction >> + * is executed, WAL records aren't normally generated (although HOT makes >> + * WAL records). But, just writes and syncs the WAL data may happen when >> + * to flush buffers. >> >> Aren't the following comments better? >> >> ------------------------------ >> To determine whether any WAL activity has occurred since last time, not only >> the number of generated WAL records but also the numbers of WAL writes and >> syncs need to be checked. Because even transaction that generates no WAL >> records can write or sync WAL data when flushing the data pages. >> ------------------------------ > > Thanks. Yes, I fixed it. > >> - * This function can be called even if nothing at all has happened. In >> - * this case, avoid sending a completely empty message to the stats >> - * collector. >> >> I think that it's better to leave this comment as it is. > > OK. I leave it. > >> + * First, to check the WAL stats counters were updated. >> + * >> + * Even if the 'force' is true, we don't need to send the stats if the >> + * counters were not updated. >> + * >> + * We can know whether the counters were updated or not to check only >> + * three counters. So, for performance, we don't allocate another memory >> + * spaces and check the all stats like pgstat_send_slru(). >> >> Is it really worth adding these comments here? > > I removed them because the following comments are enough. > > * This function can be called even if nothing at all has happened. In > * this case, avoid sending a completely empty message to the stats > * collector. > >> + * The current 'wal_records' is the same as the previous one means that >> + * WAL records weren't generated. So, the counters of 'wal_fpi', >> + * 'wal_bytes', 'm_wal_buffers_full' are not updated neither. >> + * >> + * It's not enough to check the number of generated WAL records, for >> + * example the walwriter may write/sync the WAL although it doesn't >> + * generate WAL records. 'm_wal_write' and 'm_wal_sync' are zero means the >> + * counters of time spent are zero too. >> >> Aren't the following comments better? >> >> ------------------------ >> Check wal_records counter to determine whether any WAL activity has happened >> since last time. Note that other WalUsage counters don't need to be checked >> because they are incremented always together with wal_records counter. >> >> m_wal_buffers_full also doesn't need to be checked because it's incremented >> only when at least one WAL record is generated (i.e., wal_records counter is >> incremented). But for safely, we assert that m_wal_buffers_full is always zero >> when no WAL record is generated >> >> This function can be called by a process like walwriter that normally >> generates no WAL records. To determine whether any WAL activity has happened >> at that process since the last time, the numbers of WAL writes and syncs are >> also checked. >> ------------------------ > > Yes, I modified them. > >> + * The accumulated counters for buffer usage. >> + * >> + * The reason the counters are accumulated values is to avoid unexpected >> + * reset because many callers may use them. >> >> Aren't the following comments better? >> >> ------------------------ >> These counters keep being incremented infinitely, i.e., must never be reset to >> zero, so that we can calculate how much the counters are incremented in an >> arbitrary period. >> ------------------------ > > Yes, thanks. > I fixed it. Thanks for updating the patch! I modified some comments slightly and pushed that version of the patch. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2021/05/19 11:40, Fujii Masao wrote: > Thanks for updating the patch! I modified some comments slightly and > pushed that version of the patch. Thanks a lot! Regards, -- Masahiro Ikeda NTT DATA CORPORATION