Re: Measuring replay lag - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Measuring replay lag |
Date | |
Msg-id | CAD21AoAC2BB18ZZ45MA8V=CzqZhHkru71U0hirm8QVqOJL=z-g@mail.gmail.com Whole thread Raw |
In response to | Measuring replay lag (Thomas Munro <thomas.munro@enterprisedb.com>) |
Responses |
Re: Measuring replay lag
|
List | pgsql-hackers |
On Wed, Oct 26, 2016 at 7:34 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote: > Hi hackers, > > Here is a new version of my patch to add a replay_lag column to the > pg_stat_replication view (originally proposed as part of a larger > patch set for 9.6[1]), like this: Thank you for working on this! > postgres=# select application_name, replay_lag from pg_stat_replication; > ┌──────────────────┬─────────────────┐ > │ application_name │ replay_lag │ > ├──────────────────┼─────────────────┤ > │ replica1 │ 00:00:00.595382 │ > │ replica2 │ 00:00:00.598448 │ > │ replica3 │ 00:00:00.541597 │ > │ replica4 │ 00:00:00.551227 │ > └──────────────────┴─────────────────┘ > (4 rows) > > It works by taking advantage of the { time, end-of-WAL } samples that > sending servers already include in message headers to standbys. That > seems to provide a pretty good proxy for when the WAL was written, if > you ignore messages where the LSN hasn't advanced. The patch > introduces a new GUC replay_lag_sample_interval, defaulting to 1s, to > control how often the standby should record these timestamped LSNs > into a small circular buffer. When its recovery process eventually > replays a timestamped LSN, the timestamp is sent back to the upstream > server in a new reply message field. The value visible in > pg_stat_replication.replay_lag can then be updated by comparing with > the current time. replay_lag_sample_interval is 1s by default but I got 1000s by SHOW command. postgres(1:36789)=# show replay_lag_sample_interval ;replay_lag_sample_interval ----------------------------1000s (1 row) Also, I set replay_lag_sample_interval = 500ms, I got 0 by SHOW command. postgres(1:99850)=# select name, setting, applied from pg_file_settings where name = 'replay_lag_sample_interval'; name | setting | applied ----------------------------+---------+---------replay_lag_sample_interval | 500ms | t (1 row) postgres(1:99850)=# show replay_lag_sample_interval ;replay_lag_sample_interval ----------------------------0 (1 row) > Compared to the usual techniques people use to estimate replay lag, > this approach has the following advantages: > > 1. The lag is measured in time, not LSN difference. > 2. The lag time is computed using two observations of a single > server's clock, so there is no clock skew. > 3. The lag is updated even between commits (during large data loads etc). I agree with this approach. > In the previous version I was effectively showing the ping time > between the servers during idle times when the standby was fully > caught up because there was nothing happening. I decided that was not > useful information and that it's more newsworthy and interesting to > see the estimated replay lag for the most recent real replayed > activity, so I changed that. > > In the last thread[1], Robert Haas wrote: >> Well, one problem with this is that you can't put a loop inside of a >> spinlock-protected critical section. > > Fixed. > >> In general, I think this is a pretty reasonable way of attacking this >> problem, but I'd say it's significantly under-commented. Where should >> someone go to get a general overview of this mechanism? The answer is >> not "at place XXX within the patch". (I think it might merit some >> more extensive documentation, too, although I'm not exactly sure what >> that should look like.) > > I have added lots of comments. > >> When you overflow the buffer, you could thin in out in a smarter way, >> like by throwing away every other entry instead of the oldest one. I >> guess you'd need to be careful how you coded that, though, because >> replaying an entry with a timestamp invalidates some of the saved >> entries without formally throwing them out. > > Done, by overwriting the newest sample rather than the oldest if the > buffer is full. That seems to give pretty reasonable degradation, > effectively lowering the sampling rate, without any complicated buffer > or rate management code. > >> Conceivably, 0002 could be split into two patches, one of which >> computes "stupid replay lag" considering only records that naturally >> carry timestamps, and a second adding the circular buffer to handle >> the case where much time passes without finding such a record. > > I contemplated this but decided that it'd be best to use ONLY samples > from walsender headers, and never use the time stamps from commit > records for this. If we use times from commit records, then a > cascading sending server will not be able to compute the difference in > time without introducing clock skew (not to mention the difficulty of > combining timestamps from two sources if we try to do both). I > figured that it's better to have value that shows a cascading > sender->standby->cascading sender round trip time that is free of > clock skew, than a master->cascading sender->standby->cascading sender > incomplete round trip that includes clock skew. > > By the same reasoning I decided against introducing a new periodic WAL > record or field from the master to hold extra time stamps in between > commits to do this, in favour of the buffered transient timestamp > approach I took in this patch. I think that you need to change sendFeedback() in pg_recvlogical.c and receivexlog.c as well. > That said, I can see there are > arguments for doing it with extra periodic WAL timestamps, if people > don't think it'd be too invasive to mess with the WAL for this, and > don't care about cascading standbys giving skewed readings. One > advantage would be that persistent WAL timestamps would still be able > to provide lag estimates if a standby has been down for a while and > was catching up, and this approach can't until it's caught up due to > lack of buffered transient timestamps. Thoughts? > > I plan to post a new "causal reads" patch at some point which will > depend on this, but in any case I think this is a useful feature on > its own. I'd be grateful for any feedback, flames, better ideas etc. > Thanks for reading. > > [1] https://www.postgresql.org/message-id/CAEepm%3D31yndQ7S5RdGofoGz1yQ-cteMrePR2JLf9gWGzxKcV7w%40mail.gmail.com > Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
pgsql-hackers by date: