Re: Enhancing Memory Context Statistics Reporting - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Enhancing Memory Context Statistics Reporting
Date
Msg-id 70146bd0-da0f-48ee-bece-1e35536189e0@oss.nttdata.com
Whole thread Raw
In response to Re: Enhancing Memory Context Statistics Reporting  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: Enhancing Memory Context Statistics Reporting
List pgsql-hackers

On 2025/01/21 20:27, Rahila Syed wrote:
> Hi Tomas,
> 
>       I've tried the pgbench test
>     again, to see if it gets stuck somewhere, and I'm observing this on a
>     new / idle cluster:
> 
>     $ pgbench -n -f test.sql -P 1 test -T 60
>     pgbench (18devel)
>     progress: 1.0 s, 1647.9 tps, lat 0.604 ms stddev 0.438, 0 failed
>     progress: 2.0 s, 1374.3 tps, lat 0.727 ms stddev 0.386, 0 failed
>     progress: 3.0 s, 1514.4 tps, lat 0.661 ms stddev 0.330, 0 failed
>     progress: 4.0 s, 1563.4 tps, lat 0.639 ms stddev 0.212, 0 failed
>     progress: 5.0 s, 1665.0 tps, lat 0.600 ms stddev 0.177, 0 failed
>     progress: 6.0 s, 1538.0 tps, lat 0.650 ms stddev 0.192, 0 failed
>     progress: 7.0 s, 1491.4 tps, lat 0.670 ms stddev 0.261, 0 failed
>     progress: 8.0 s, 1539.5 tps, lat 0.649 ms stddev 0.443, 0 failed
>     progress: 9.0 s, 1517.0 tps, lat 0.659 ms stddev 0.167, 0 failed
>     progress: 10.0 s, 1594.0 tps, lat 0.627 ms stddev 0.227, 0 failed
>     progress: 11.0 s, 28.0 tps, lat 0.705 ms stddev 0.277, 0 failed
>     progress: 12.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
>     progress: 13.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
>     progress: 14.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
>     progress: 15.0 s, 0.0 tps, lat 0.000 ms stddev 0.000, 0 failed
>     progress: 16.0 s, 1480.6 tps, lat 4.043 ms stddev 130.113, 0 failed
>     progress: 17.0 s, 1524.9 tps, lat 0.655 ms stddev 0.286, 0 failed
>     progress: 18.0 s, 1246.0 tps, lat 0.802 ms stddev 0.330, 0 failed
>     progress: 19.0 s, 1383.1 tps, lat 0.722 ms stddev 0.934, 0 failed
>     progress: 20.0 s, 1432.7 tps, lat 0.698 ms stddev 0.199, 0 failed
>     ...
> 
>     There's always a period of 10-15 seconds when everything seems to be
>     working fine, and then a couple seconds when it gets stuck, with the usual
> 
>        LOG:  Wait for 69454 process to publish stats timed out, trying again
> 
>     The PIDs I've seen were for checkpointer, autovacuum launcher, ... all
>     of that are processes that should be handling the signal, so how come it
>     gets stuck every now and then? The system is entirely idle, there's no
>     contention for the shmem stuff, etc. Could it be forgetting about the
>     signal in some cases, or something like that?
> 
> Yes, This occurs when, due to concurrent signals received by a backend,
> both signals are processed together, and stats are published only once.
> Once the stats are read by the first client that gains access, they are erased,
> causing the second client to wait until timeout.
> 
> If we make clients wait for the latest stats, timeouts may occur during concurrent
> operations. To avoid such timeouts, we can retain the previously published memory
> statistics for every backend and avoid waiting for the latest statistics when the
> previous statistics are newer than STALE_STATS_LIMIT. This limit can be determined
> based on the server load and how fast the memory statistics requests are being
> handled by the server.
> 
> For example, on a server running make -j 4 installcheck-world while concurrently
> probing client backends for memory statistics using pgbench, accepting statistics
> that were approximately 1 second old helped eliminate timeouts. Conversely, on an
> idle system, waiting for new statistics when the previous ones were older than 0.1
> seconds was sufficient to avoid any timeouts caused by concurrent requests.
> 
> PFA an updated and rebased patch that includes the capability to associate
> timestamps with statistics. Additionally, I have made some minor fixes and improved
> the indentation.
> 
> Currently, I have set STALE_STATS_LIMIT to 0.5 seconds in code. which means do not
> do not wait for newer statistics if previous statistics were published within the last
> 5 seconds of current request.
> 
> Inshort, there are following options to design the wait for statistics depending on whether
> we expect concurrent requests to a backend for memory statistics to be common.
> 
> 1. Always get the latest statistics and timeout if not able to.
> 
> This works fine for sequential probing which is going to be the most common use case.
> This can lead to a backend timeouts upto MAX_TRIES * MEMSTATS_WAIT_TIMEOUT.
> 
> 2. Determine the appropriate STALE_STATS_LIMIT and not wait for the latest stats if
> previous statistics are within that limit .
> This will help avoid the timeouts in case of the concurrent requests.
> 
> 3.  Do what v10 patch on this thread does -
> 
> Wait for the latest statistics for up to MEMSTATS_WAIT_TIMEOUT;
> otherwise, display the previous statistics, regardless of when they were published.
> 
> Since timeouts are likely to occur only during concurrent requests, the displayed
> statistics are unlikely to be very outdated.
> However, in this scenario, we observe the behavior you mentioned, i.e., concurrent
> backends can get stuck for the duration of MEMSTATS_WAIT_TIMEOUT
> (currently 5 seconds as per the current settings).
> 
> I am inclined toward the third approach, as concurrent requests are not expected
> to be a common use case for this feature. Moreover, with the second approach,
> determining an appropriate value for STALE_STATS_LIMIT is challenging, as it
> depends on the server's load.

Just idea; as an another option, how about blocking new requests to
the target process (e.g., causing them to fail with an error or
returning NULL with a warning) if a previous request is still pending?
Users can simply retry the request if it fails. IMO failing quickly
seems preferable to getting stuck for a while in cases with concurrent
requests.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Pre-allocating WAL files
Next
From: Robert Haas
Date:
Subject: Re: Eager aggregation, take 3