Re: [HACKERS] Measuring replay lag - Mailing list pgsql-hackers
From | Craig Ringer |
---|---|
Subject | Re: [HACKERS] Measuring replay lag |
Date | |
Msg-id | CAMsr+YGX1tAKPvjmWoq1X8nOodenP5sCwuEfSuR21DjpYNuD0A@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Measuring replay lag (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: [HACKERS] Measuring replay lag
|
List | pgsql-hackers |
On 5 March 2017 at 15:31, Simon Riggs <simon@2ndquadrant.com> wrote: > On 1 March 2017 at 10:47, Thomas Munro <thomas.munro@enterprisedb.com> wrote: >> This seems to be problematic. Logical peers report LSN changes for >> all three operations (write, flush, commit) only on commit. I suppose >> that might work OK for synchronous replication, but it makes it a bit >> difficult to get lag measurements that don't look really strange and >> sawtoothy when you have long transactions, and overlapping >> transactions might interfere with the measurements in odd ways. They do, but those sawtoothy measurements are a true reflection of the aspect of lag that's most important - what the downstream replica has flushed to disk and made visible. If we have xacts X1, X2 and X3, which commit in that order, then after X1 commits the lag is the difference between X1's commit time and "now". A peer only sends feedback updating flush position for commits. Once X1 is confirmed replayed by the peer, the lag flush_location is the difference between X2's later commit time and "now". And so on. So I'd very much expect a sawtooth lag graph for flush_location, because that's how logical replication really replays changes. We only start replaying any xact once it commits on the upstream, and we replay changes strictly in upstream commit order. It'll rise linearly then fall vertically in abrupt drops. sent_location is updated based on the last-decoded WAL position, per src/backend/replication/walsender.c:2396 or so: record = XLogReadRecord(logical_decoding_ctx->reader, logical_startptr, &errm); logical_startptr = InvalidXLogRecPtr; /* xlog record was invalid */ if (errm != NULL) elog(ERROR, "%s", errm); if (record != NULL) { LogicalDecodingProcessRecord(logical_decoding_ctx, logical_decoding_ctx->reader); sentPtr = logical_decoding_ctx->reader->EndRecPtr; } so I would expect to see a smoother graph for sent_location based on the last record processed by the XLogReader. Though it's also a misleading graph, we haven't really sent that at all, just decoded it and buffered it in a reorder buffer to send once we decode a commit. Really, pg_stat_replication isn't quite expressive enough to cover logical replication due to its reordering behaviour. We can't really report the actual last LSN sent to the client very easily, since we call into ReorderBufferCommit() and don't return until we finish streaming the whole buffered xact, we'd need some kind of callback to update the walsender with the lsn of the last row we sent. And if we did this, sent_location would actually go backwards sometimes, since usually with concurrent xacts the newest row in xact committed at time T is newer, with higher LSN, than the oldest row in xact with comit time T+n. Later I'd like to add support for xact interleaving, where we can speculatively stream rows from big in-progress xacts to the downstream before the xact commits, and the downstream has to roll the xact back if it aborts on the upstream. There are some snapshot management issues to deal with there (not to mention downstream deadlock hazards), but maybe we can limit the optimisation to xacts that made no catalog changes to start with. I'm not at all sure how to report sent_location then, though, or what a reasonable measure of lag will look like. > What we want from this patch is something that works for both, as much > as that is possible. If it shows a sawtooth pattern for flush lag, that's good, because it's truthful. We can only flush after we replay commit, therefore lag is always going to be sawtooth, with tooth size approximating xact size and the baseline lag trend representing any sustained increase or decrease in lag over time. This would be extremely valuable to have. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: