Home > mailing lists

Re: Measuring replay lag - Mailing list pgsql-hackers

From	Masahiko Sawada
Subject	Re: Measuring replay lag
Date	November 8, 2016 01:36:10
Msg-id	CAD21AoAC2BB18ZZ45MA8V=CzqZhHkru71U0hirm8QVqOJL=z-g@mail.gmail.com Whole thread Raw
In response to	Measuring replay lag (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses	Re: Measuring replay lag
List	pgsql-hackers

Tree view

On Wed, Oct 26, 2016 at 7:34 PM, Thomas Munro
<thomas.munro@enterprisedb.com> wrote:
> Hi hackers,
>
> Here is a new version of my patch to add a replay_lag column to the
> pg_stat_replication view (originally proposed as part of a larger
> patch set for 9.6[1]), like this:

Thank you for working on this!

> postgres=# select application_name, replay_lag from pg_stat_replication;
> ┌──────────────────┬─────────────────┐
> │ application_name │   replay_lag    │
> ├──────────────────┼─────────────────┤
> │ replica1         │ 00:00:00.595382 │
> │ replica2         │ 00:00:00.598448 │
> │ replica3         │ 00:00:00.541597 │
> │ replica4         │ 00:00:00.551227 │
> └──────────────────┴─────────────────┘
> (4 rows)
>
> It works by taking advantage of the { time, end-of-WAL } samples that
> sending servers already include in message headers to standbys.  That
> seems to provide a pretty good proxy for when the WAL was written, if
> you ignore messages where the LSN hasn't advanced.  The patch
> introduces a new GUC replay_lag_sample_interval, defaulting to 1s, to
> control how often the standby should record these timestamped LSNs
> into a small circular buffer.  When its recovery process eventually
> replays a timestamped LSN, the timestamp is sent back to the upstream
> server in a new reply message field.  The value visible in
> pg_stat_replication.replay_lag can then be updated by comparing with
> the current time.

replay_lag_sample_interval is 1s by default but I got 1000s by SHOW command.
postgres(1:36789)=# show replay_lag_sample_interval ;replay_lag_sample_interval
----------------------------1000s
(1 row)

Also, I set replay_lag_sample_interval = 500ms, I got 0 by SHOW command.
postgres(1:99850)=# select name, setting, applied from
pg_file_settings where name = 'replay_lag_sample_interval';           name            | setting | applied
----------------------------+---------+---------replay_lag_sample_interval | 500ms   | t
(1 row)

postgres(1:99850)=# show replay_lag_sample_interval ;replay_lag_sample_interval
----------------------------0
(1 row)


> Compared to the usual techniques people use to estimate replay lag,
> this approach has the following advantages:
>
> 1.  The lag is measured in time, not LSN difference.
> 2.  The lag time is computed using two observations of a single
> server's clock, so there is no clock skew.
> 3.  The lag is updated even between commits (during large data loads etc).

I agree with this approach.

> In the previous version I was effectively showing the ping time
> between the servers during idle times when the standby was fully
> caught up because there was nothing happening.  I decided that was not
> useful information and that it's more newsworthy and interesting to
> see the estimated replay lag for the most recent real replayed
> activity, so I changed that.
>
> In the last thread[1], Robert Haas wrote:
>> Well, one problem with this is that you can't put a loop inside of a
>> spinlock-protected critical section.
>
> Fixed.
>
>> In general, I think this is a pretty reasonable way of attacking this
>> problem, but I'd say it's significantly under-commented.  Where should
>> someone go to get a general overview of this mechanism?  The answer is
>> not "at place XXX within the patch".  (I think it might merit some
>> more extensive documentation, too, although I'm not exactly sure what
>> that should look like.)
>
> I have added lots of comments.
>
>> When you overflow the buffer, you could thin in out in a smarter way,
>> like by throwing away every other entry instead of the oldest one.  I
>> guess you'd need to be careful how you coded that, though, because
>> replaying an entry with a timestamp invalidates some of the saved
>> entries without formally throwing them out.
>
> Done, by overwriting the newest sample rather than the oldest if the
> buffer is full.  That seems to give pretty reasonable degradation,
> effectively lowering the sampling rate, without any complicated buffer
> or rate management code.
>
>> Conceivably, 0002 could be split into two patches, one of which
>> computes "stupid replay lag" considering only records that naturally
>> carry timestamps, and a second adding the circular buffer to handle
>> the case where much time passes without finding such a record.
>
> I contemplated this but decided that it'd be best to use ONLY samples
> from walsender headers, and never use the time stamps from commit
> records for this.  If we use times from commit records, then a
> cascading sending server will not be able to compute the difference in
> time without introducing clock skew (not to mention the difficulty of
> combining timestamps from two sources if we try to do both).  I
> figured that it's better to have value that shows a cascading
> sender->standby->cascading sender round trip time that is free of
> clock skew, than a master->cascading sender->standby->cascading sender
> incomplete round trip that includes clock skew.
>
> By the same reasoning I decided against introducing a new periodic WAL
> record or field from the master to hold extra time stamps in between
> commits to do this, in favour of the buffered transient timestamp
> approach I took in this patch.

I think that you need to change sendFeedback() in pg_recvlogical.c and
receivexlog.c as well.

> That said, I can see there are
> arguments for doing it with extra periodic WAL timestamps, if people
> don't think it'd be too invasive to mess with the WAL for this, and
> don't care about cascading standbys giving skewed readings.  One
> advantage would be that persistent WAL timestamps would still be able
> to provide lag estimates if a standby has been down for a while and
> was catching up, and this approach can't until it's caught up due to
> lack of buffered transient timestamps.  Thoughts?
>
> I plan to post a new "causal reads" patch at some point which will
> depend on this, but in any case I think this is a useful feature on
> its own.  I'd be grateful for any feedback, flames, better ideas etc.
> Thanks for reading.
>
> [1] https://www.postgresql.org/message-id/CAEepm%3D31yndQ7S5RdGofoGz1yQ-cteMrePR2JLf9gWGzxKcV7w%40mail.gmail.com
>

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

pgsql-hackers by date:

From: Michael Paquier
Date: 08 November 2016, 01:33:36
Subject: Re: Re: BUG #13755: pgwin32_is_service not checking if SECURITY_SERVICE_SID is disabled

From: Michael Paquier
Date: 08 November 2016, 01:37:19
Subject: Re: Re: BUG #13755: pgwin32_is_service not checking if SECURITY_SERVICE_SID is disabled

Re: Measuring replay lag - Mailing list pgsql-hackers

Previous

Next