BUG #15151: Error with wal replay after planned manual switchover. - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #15151: Error with wal replay after planned manual switchover. |
Date | |
Msg-id | 152353560977.31233.11406243320876395727@wrigleys.postgresql.org Whole thread Raw |
Responses |
Re: BUG #15151: Error with wal replay after planned manualswitchover.
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 15151 Logged by: Maxim Boguk Email address: mb@dataegret.com PostgreSQL version: 9.6.8 Operating system: Linux Description: Hi, Today after planned manual switch-over (via remove all load from master, checkpoint master, wait replica in sync, stop master, stop replica, switch master and replica roles without promote), I got very strange error on new replica (log_min_messages=debug5): 2018-04-12 04:23:28 MSK 32059 @ from [vxid: txid:0] [] LOG: database system was shut down in recovery at 2018-04-12 04:23:26 MSK 2018-04-12 04:23:28 MSK 32059 @ from [vxid: txid:0] [] LOG: entering standby mode 2018-04-12 04:23:28 MSK 32059 @ from [vxid: txid:0] [] LOG: invalid resource manager ID 211 at C7BB/EDC42C68 2018-04-12 04:23:28 MSK 32060 rohh.app@hh from 192.168.1.2 [vxid: txid:0] [] FATAL: the database system is starting up 2018-04-12 04:23:28 MSK 32061 rohh.app@hh from [local] [vxid: txid:0] [] FATAL: the database system is starting up 2018-04-12 04:23:28 MSK 32063 [unknown]@[unknown] from [local] [vxid: txid:0] [] LOG: incomplete startup packet 2018-04-12 04:23:28 MSK 32064 postgres@postgres from [local] [vxid: txid:0] [] FATAL: the database system is starting up 2018-04-12 04:23:28 MSK 32062 @ from [vxid: txid:0] [] LOG: started streaming WAL from primary at C7BB/ED000000 on timeline 1 2018-04-12 04:23:28 MSK 32059 @ from [vxid: txid:0] [] LOG: invalid resource manager ID 211 at C7BB/EDC42C68 2018-04-12 04:23:30 MSK 32062 @ from [vxid: txid:0] [] FATAL: terminating walreceiver process due to administrator command 2018-04-12 04:23:30 MSK 32059 @ from [vxid: txid:0] [] LOG: invalid resource manager ID 211 at C7BB/EDC42C68 2018-04-12 04:23:30 MSK 32059 @ from [vxid: txid:0] [] LOG: invalid resource manager ID 211 at C7BB/EDC42C68 and so on. Start with log_min_messages=debug5 - doesn't add any useful information. Restart of new slave doesn't help. Stop new slave, manual clean xlog directory on slave and start slave - doesn't help (e.g. it doesn't look like the single wal file was damaged during network transfer). Server offline at this moment and open for experiments for few days. pg_xlogdump show no errors in wal files: postgres@db-hh3:~$ /usr/lib/postgresql/9.6/bin/pg_xlogdump --start=C7BB/EDC42C68 /var/lib/postgresql/9.6/main/pg_xlog/000000010000C7BB000000ED | less first record is after C7BB/EDC42C68, at C7BB/EDC433B8, skipping over 1872 bytes rmgr: XLOG len (rec/tot): 51/ 2556, tx: 0, lsn: C7BB/EDC433B8, prev C7BB/EDC42928, desc: FPI_FOR_HINT , blkref #0: rel 1663/16400/18105 blk 326257 FPW rmgr: XLOG len (rec/tot): 51/ 3140, tx: 0, lsn: C7BB/EDC43DB8, prev C7BB/EDC433B8, desc: FPI_FOR_HINT , blkref #0: rel 1663/16400/150046445 blk 32094 FPW rmgr: XLOG len (rec/tot): 51/ 2595, tx: 0, lsn: C7BB/EDC44A18, prev C7BB/EDC43DB8, desc: FPI_FOR_HINT , blkref #0: rel 1663/16400/18105 blk 326250 FPW rmgr: XLOG len (rec/tot): 51/ 3130, tx: 0, lsn: C7BB/EDC45440, prev C7BB/EDC44A18, desc: FPI_FOR_HINT , blkref #0: rel 1663/16400/150046445 blk 15989 FPW rmgr: XLOG len (rec/tot): 51/ 2590, tx: 0, lsn: C7BB/EDC46098, prev C7BB/EDC45440, desc: FPI_FOR_HINT , blkref #0: rel 1663/16400/18105 blk 326298 FPW Few records before C7BB/EDC42C68: rmgr: XLOG len (rec/tot): 51/ 2520, tx: 0, lsn: C7BB/EDC41F38, prev C7BB/EDC41348, desc: FPI_FOR_HINT , blkref #0: rel 1663/16400/18105 blk 326155 FPW rmgr: XLOG len (rec/tot): 51/ 2703, tx: 0, lsn: C7BB/EDC42928, prev C7BB/EDC41F38, desc: FPI_FOR_HINT , blkref #0: rel 1663/16400/18105 blk 326262 FPW rmgr: XLOG len (rec/tot): 51/ 2556, tx: 0, lsn: C7BB/EDC433B8, prev C7BB/EDC42928, desc: FPI_FOR_HINT , blkref #0: rel 1663/16400/18105 blk 326257 FPW rmgr: XLOG len (rec/tot): 51/ 3140, tx: 0, lsn: C7BB/EDC43DB8, prev C7BB/EDC433B8, desc: FPI_FOR_HINT , blkref #0: rel 1663/16400/150046445 blk 32094 FPW rmgr: XLOG len (rec/tot): 51/ 2595, tx: 0, lsn: C7BB/EDC44A18, prev C7BB/EDC43DB8, desc: FPI_FOR_HINT , blkref #0: rel 1663/16400/18105 blk 326250 FPW pg_xlogdump show no visible to me anomalies in 000000010000C7BB000000ED or 000000010000C7BB000000EC files. Any additional information I'll try provide, but my experience with internal wal structure outside of using pg_xlogdump output very limited.
pgsql-bugs by date: