Re: pg_stat_io for the startup process - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: pg_stat_io for the startup process |
Date | |
Msg-id | 20230428.111551.498122818064927621.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: pg_stat_io for the startup process (Melanie Plageman <melanieplageman@gmail.com>) |
Responses |
Re: pg_stat_io for the startup process
|
List | pgsql-hackers |
At Thu, 27 Apr 2023 17:30:40 -0400, Melanie Plageman <melanieplageman@gmail.com> wrote in > On Wed, Apr 26, 2023 at 2:53 PM Melih Mutlu <m.melihmutlu@gmail.com> wrote: > > > > Hi all, > > > > Robert Haas <robertmhaas@gmail.com>, 26 Nis 2023 Çar, 15:34 tarihinde şunu yazdı: > >> > >> On Wed, Apr 26, 2023 at 5:47 AM Kyotaro Horiguchi > >> <horikyota.ntt@gmail.com> wrote: > >> > 3. When should we call pgstat_report_stats on the startup process? > >> > > >> > During recovery, I think we can call pgstat_report_stats() (or a > >> > subset of it) right before invoking WaitLatch and at segment > >> > boundaries. > >> > >> I think this kind of idea is worth exploring. Andres mentioned timers, > >> but it seems to me that something where we just do it at certain > >> "convenient" points might be good enough and simpler. I'd much rather > >> have statistics that were up to date as of the last time we finished a > >> segment than have nothing at all. > > > > > > I created a rough prototype of a timer-based approach for comparison. Please see attached. > > > > Registered a new timeout named "STARTUP_STAT_FLUSH_TIMEOUT", The timeout's > > handler sets a flag indicating that io stats need to be flushed. > > HandleStartupProcInterrupts checks the flag to flush stats. It's enabled if > > any WAL record is read (in the main loop of PerformWalRecovery). And it's > > disabled before WaitLatch to avoid unnecessary wakeups if the startup process > > decides to sleep. > > I was reviewing this and found that if I remove the calls to > disable_startup_stat_flush_timeout() the number of calls to > pgstat_report_wal() on a briefly active and then idle standby are about > 8 in 100 seconds whereas with timer disablement, the calls over the same > period are about 40. I would have thought that disabling the timer would > have caused us to try and flush the stats less often. > > With the calls to disable_startup_stat_flush_timeout(), we do have much > fewer calls to setitimer(), of course. > > Given the below suggestion by Andres, I tried doing some traces of the > various approaches. > > > > + disable_startup_stat_flush_timeout(); > > > + > > > (void) WaitLatch(&XLogRecoveryCtl->recoveryWakeupLatch, > > > WL_LATCH_SET | WL_TIMEOUT | > > > WL_EXIT_ON_PM_DEATH, > > > > I think always disabling the timer here isn't quite right - we want to wake up > > *once* in WaitForWALToBecomeAvailable(), otherwise we'll not submit pending > > stats before waiting - potentially for a long time - for WAL. One way would be > > just explicitly report before the WaitLatch(). > > > > Another approach, I think better, would be to not use enable_timeout_every(), > > and to explicitly rearm the timer in HandleStartupProcInterrupts(). When > > called from WaitForWALToBecomeAvailable(), we'd not rearm, and instead do so > > at the end of WaitForWALToBecomeAvailable(). That way we also wouldn't > > repeatedly fire between calls to HandleStartupProcInterrupts(). > > After a quick example implementation of this, I found that it seemed to > try and flush the stats less often on an idle standby (good) than using > enable_timeout_every(). Just rearming with the full-interval will work that way. Our existing strategy for this is seen in PostgresMain(). stats_timeout = pgstat_report_stat(false); if (stats_timeout > 0) { if (!get_timeout_active(BLAH_TIMEOUT)) enable_timeout_after(BLAH_TIMEOUT, stats_timeout); } else { if (get_timeout_active(BLAH_TIMEOUT)) disable_timeout(BLAH_TIMEOUT, false); } WaitLatch(); regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: