Re: [BUG] False indication in pg_stat_replication.sync_state - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: [BUG] False indication in pg_stat_replication.sync_state |
Date | |
Msg-id | CAHGQGwGGjYH2yPFwftZFi7hHTsqjVytbPjGmB530+xudz4VonQ@mail.gmail.com Whole thread Raw |
In response to | [BUG] False indication in pg_stat_replication.sync_state (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>) |
Responses |
Re: [BUG] False indication in
pg_stat_replication.sync_state
|
List | pgsql-hackers |
On Thu, Oct 18, 2012 at 5:42 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > Hello. My colleague found that pg_stat_replication.sync_state > shows false state for some condition. > > This prevents Pacemaker from completing fail-over that could > safely be done. > > The point is in walsender.c, pg_stat_get_wal_senders() below, (as > of REL9_2_1) > > 1555: if (walsnd->pid != 0) > 1556: { > 1557: /* > 1558: * Treat a standby such as a pg_basebackup background process > 1559: * which always returns an invalid flush location, as an > 1560: * asynchronous standby. > 1561: */ > ! 1562: sync_priority[i] = XLogRecPtrIsInvalid(walsnd->flush) ? > 1563: 0 : walsnd->sync_standby_priority; > > Here, XLogRecPtrIsInvalid(walsnd->flush) is defined as > (walsnd->flush.xrecoff == 0) which becomes true as usual at every > WAL 'file's (not segments) boundary. xrecoff == 0 is certainly > invalid for the start point of WAL *RECORD*, but should be > considered valid in replication stream. This check was introduced > at 9.2.0 and the version up between 9.1.4 and 9.1.5. > > | DEBUG: write 4/0 flush 3/FEFFEC68 apply 3/FEFFEC68 > | DEBUG: write 4/0 flush 4/0 apply 3/FEFFEC68 > | DEBUG: HOGE: flush = 3/FEFFEC68 sync_priority[0] = 1 > | DEBUG: write 4/111C0 flush 4/0 apply 3/FEFFECC0 > !| DEBUG: HOGE: flush = 4/0 sync_priority[0] = 0 > > This value zero of sync_priority[0] makes sync_status 'async' > errorneously and confuses Pacemaker. > > # The log line marked with 'HOGE' above printed by applying the > # patch at the bottom of this message and invoking 'select > # sync_state from pg_stat_replication' periodically. To increase > # the chance to see the symptom, sleep 1 second for 'file' > # boundaries :-) > > The Heikki's recent(?) commit > 0ab9d1c4b31622e9176472b4276f3e9831e3d6ba which changes the format > of XLogRecPtr from logid:xrecoff struct to 64 bit linear address > would fix the false indication. But I suppose this patch won't be > applied to existing 9.1.x and 9.2.x because of the modification > onto streaming protocol. > > As far as I see the patch, it would'nt change the meaning of > XLogRecPtr to change XLogRecPtrIsInvalid from (xrecoff == 0) to > (xrecoff == 0 && xlogid == 0). But this change affects rather > wide portion where handling WAL nevertheless what is needed here > is only to stop the false indication. > > On the other hand, pg_basebackup seems return 0/0 as flush and > apply positions so it seems enough only to add xlogid == 0 into > the condition. The patch attached for REL9_2_1 does this and > yields the result following. > > | DEBUG: write 2/FEFFFD48 flush 2/FEFFFD48 apply 2/FEFF7AB0 > | DEBUG: write 3/0 flush 2/FEFFFD48 apply 2/FEFF7E88 > | DEBUG: write 3/0 flush 3/0 apply 2/FEFFFD48 > | DEBUG: HOGE: flush = 2/FEFFFD48 sync_priority[0] = 1 > | DEBUG: write 3/E338 flush 3/0 apply 2/FEFFFF80 > !| DEBUG: HOGE: flush = 3/0 sync_priority[0] = 1 > > I think this patch should be applied for 9.2.2 and 9.1.7. Looks good to me, though I don't think the source code comment needs to be updated in the way the patch does. Regards, -- Fujii Masao
pgsql-hackers by date: