Re: BUG #7500: hot-standby replica crash after an initial rsync - Mailing list pgsql-bugs
From | Maxim Boguk |
---|---|
Subject | Re: BUG #7500: hot-standby replica crash after an initial rsync |
Date | |
Msg-id | CAK-MWwR-FBf3bFVtD8vqPq6hErAujE447mniS6Us1ORQ6Z3XdA@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #7500: hot-standby replica crash after an initial rsync (Maxim Boguk <maxim.boguk@gmail.com>) |
Responses |
Re: BUG #7500: hot-standby replica crash after an initial rsync
|
List | pgsql-bugs |
On Tue, Aug 21, 2012 at 8:08 PM, Maxim Boguk <maxim.boguk@gmail.com> wrote: > > > >> > >> > I have kept all that database files for the future investigation. >> > >> > What I should look into first? >> Could you reproduce the error with log_error_verbosity=verbose? Or even >> better >> provide a backtrace with gdb? >> >> > There log with log_error_verbosity=verbose: > > > 2012-08-21 14:04:18.973 MSK 33950 @ from [vxid: txid:0] []LOG: 00000: > database system was interrupted while in recovery at log time 2012-08-21 > 08:21:42 MSK > 2012-08-21 14:04:18.973 MSK 33950 @ from [vxid: txid:0] []HINT: If this > has occurred more than once some data might be corrupted and you might need > to choose an earlier recovery target. > 2012-08-21 14:04:18.973 MSK 33950 @ from [vxid: txid:0] []LOCATION: > StartupXLOG, xlog.c:5757 > 2012-08-21 14:04:18.973 MSK 33950 @ from [vxid: txid:0] []LOG: 00000: > creating missing WAL directory "pg_xlog/archive_status" > 2012-08-21 14:04:18.973 MSK 33950 @ from [vxid: txid:0] []LOCATION: > ValidateXLOGDirectoryStructure, xlog.c:3452 > 2012-08-21 14:04:19.009 MSK 33950 @ from [vxid: txid:0] []LOG: 00000: > entering standby mode > 2012-08-21 14:04:19.009 MSK 33950 @ from [vxid: txid:0] []LOCATION: > StartupXLOG, xlog.c:5827 > 2012-08-21 14:04:19.041 MSK 33951 @ from [vxid: txid:0] []LOG: 00000: > streaming replication successfully connected to primary > 2012-08-21 14:04:19.041 MSK 33951 @ from [vxid: txid:0] []LOCATION: > libpqrcv_connect, libpqwalreceiver.c:169 > 2012-08-21 14:04:20.140 MSK 33950 @ from [vxid:1/0 txid:0] []LOG: 00000: > redo starts at 925/2C0004C8 > 2012-08-21 14:04:20.140 MSK 33950 @ from [vxid:1/0 txid:0] []LOCATION: > StartupXLOG, xlog.c:6199 > 2012-08-21 14:04:20.160 MSK 33950 @ from [vxid:1/0 txid:0] []FATAL: > XX000: could not access status of transaction 361788644 > 2012-08-21 14:04:20.161 MSK 33950 @ from [vxid:1/0 txid:0] []DETAIL: > Could not read from file "pg_multixact/offsets/1590" at offset 114688: > Unknown error: 0. > 2012-08-21 14:04:20.161 MSK 33950 @ from [vxid:1/0 txid:0] []CONTEXT: > xlog redo create multixact 361788644 offset 822580965: 1768508061 1768508078 > 2012-08-21 14:04:20.161 MSK 33950 @ from [vxid:1/0 txid:0] []LOCATION: > SlruReportIOError, slru.c:840 > 2012-08-21 14:04:20.169 MSK 33948 @ from [vxid: txid:0] []LOG: 00000: > startup process (PID 33950) exited with exit code 1 > 2012-08-21 14:04:20.169 MSK 33948 @ from [vxid: txid:0] []LOCATION: > LogChildExit, postmaster.c:2840 > 2012-08-21 14:04:20.169 MSK 33948 @ from [vxid: txid:0] []LOG: 00000: > terminating any other active server processes > 2012-08-21 14:04:20.169 MSK 33948 @ from [vxid: txid:0] []LOCATION: > HandleChildCrash, postmaster.c:2674 > > I tried A bit play with more detailed debug info and got mode detailed log: LOG: 00000: database system was interrupted while in recovery at log time 2012-08-21 08:21:42 MSK HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. LOCATION: StartupXLOG, xlog.c:5757 DEBUG: 00000: standby_mode = 'on' LOCATION: readRecoveryCommandFile, xlog.c:5269 DEBUG: 00000: primary_conninfo = 'host=10.12.22.19 port=5433 user=replica password=acilper' LOCATION: readRecoveryCommandFile, xlog.c:5276 DEBUG: 00000: trigger_file = '/db/data/failover' LOCATION: readRecoveryCommandFile, xlog.c:5283 LOG: 00000: entering standby mode LOCATION: StartupXLOG, xlog.c:5827 DEBUG: 00000: checkpoint record is at 925/2D41CAA0 LOCATION: StartupXLOG, xlog.c:5894 DEBUG: 00000: redo record is at 925/2C0004C8; shutdown FALSE LOCATION: StartupXLOG, xlog.c:5929 DEBUG: 00000: next transaction ID: 0/1768511856; next OID: 59880938 LOCATION: StartupXLOG, xlog.c:5933 DEBUG: 00000: next MultiXactId: 361788644; next MultiXactOffset: 822580965 LOCATION: StartupXLOG, xlog.c:5936 DEBUG: 00000: oldest unfrozen transaction ID: 1569182525, in database 16424 LOCATION: StartupXLOG, xlog.c:5939 DEBUG: 00000: transaction ID wrap limit is 3716666172, limited by database with OID 16424 LOCATION: SetTransactionIdLimit, varsup.c:334 DEBUG: 00000: initializing for hot standby LOCATION: StartupXLOG, xlog.c:6070 DEBUG: 00000: my backend id is 1 LOCATION: SharedInvalBackendInit, sinvaladt.c:326 LOG: 00000: redo starts at 925/2C0004C8 LOCATION: StartupXLOG, xlog.c:6199 DEBUG: 00000: record known xact 1768508078 latestObservedXid 0 CONTEXT: xlog redo commit: 2012-08-21 08:21:42.079655+04 LOCATION: RecordKnownAssignedTransactionIds, procarray.c:2375 DEBUG: 00000: record known xact 1768508078 latestObservedXid 0 CONTEXT: xlog redo commit: 2012-08-21 08:21:42.079655+04 LOCATION: RecordKnownAssignedTransactionIds, procarray.c:2375 DEBUG: 00000: remove KnownAssignedXid 1768508078 CONTEXT: xlog redo commit: 2012-08-21 08:21:42.079655+04 LOCATION: KnownAssignedXidsRemove, procarray.c:2869 DEBUG: 00000: record known xact 1768508080 latestObservedXid 0 CONTEXT: xlog redo exclusive_lock: xid 1768508080 rel 1663/16424/17282; tid 725829/8 LOCATION: RecordKnownAssignedTransactionIds, procarray.c:2375 DEBUG: 00000: record known xact 1768508080 latestObservedXid 0 CONTEXT: xlog redo update: rel 1663/16424/17282; tid 725829/8; new 726017/7 LOCATION: RecordKnownAssignedTransactionIds, procarray.c:2375 DEBUG: 00000: record known xact 1768508076 latestObservedXid 0 CONTEXT: xlog redo exclusive_lock: xid 1768508076 rel 1663/16424/47760635; tid 3075/136 LOCATION: RecordKnownAssignedTransactionIds, procarray.c:2375 DEBUG: 00000: record known xact 1768508080 latestObservedXid 0 CONTEXT: xlog redo insert: rel 1663/16424/47747; tid 43866/197 LOCATION: RecordKnownAssignedTransactionIds, procarray.c:2375 DEBUG: 00000: record known xact 1768508080 latestObservedXid 0 CONTEXT: xlog redo insert: rel 1663/16424/52546872; tid 44700/276 LOCATION: RecordKnownAssignedTransactionIds, procarray.c:2375 DEBUG: 00000: record known xact 1768508061 latestObservedXid 0 CONTEXT: xlog redo log: rel 1663/16424/19211 LOCATION: RecordKnownAssignedTransactionIds, procarray.c:2375 DEBUG: 00000: record known xact 1768508082 latestObservedXid 0 CONTEXT: xlog redo exclusive_lock: xid 1768508082 rel 1663/16424/4759107; tid 1098/7 LOCATION: RecordKnownAssignedTransactionIds, procarray.c:2375 DEBUG: 00000: record known xact 1768508076 latestObservedXid 0 CONTEXT: xlog redo hot_update: rel 1663/16424/47760635; tid 3075/136; new 3075/140 LOCATION: RecordKnownAssignedTransactionIds, procarray.c:2375 DEBUG: 00000: record known xact 1768508080 latestObservedXid 0 CONTEXT: xlog redo exclusive_lock: xid 1768508080 rel 1663/16424/17282; tid 725918/2 LOCATION: RecordKnownAssignedTransactionIds, procarray.c:2375 DEBUG: 00000: record known xact 1768508082 latestObservedXid 0 CONTEXT: xlog redo hot_update: rel 1663/16424/4759107; tid 1098/7; new 1098/39 LOCATION: RecordKnownAssignedTransactionIds, procarray.c:2375 DEBUG: 00000: record known xact 1768508080 latestObservedXid 0 CONTEXT: xlog redo update: rel 1663/16424/17282; tid 725918/2; new 725918/6 LOCATION: RecordKnownAssignedTransactionIds, procarray.c:2375 DEBUG: 00000: record known xact 1768508080 latestObservedXid 0 CONTEXT: xlog redo insert: rel 1663/16424/47747; tid 43866/201 LOCATION: RecordKnownAssignedTransactionIds, procarray.c:2375 DEBUG: 00000: record known xact 1768508080 latestObservedXid 0 CONTEXT: xlog redo insert: rel 1663/16424/52546872; tid 44700/276 LOCATION: RecordKnownAssignedTransactionIds, procarray.c:2375 DEBUG: 00000: record known xact 1768508080 latestObservedXid 0 CONTEXT: xlog redo commit: 2012-08-21 08:21:42.082508+04 LOCATION: RecordKnownAssignedTransactionIds, procarray.c:2375 DEBUG: 00000: record known xact 1768508080 latestObservedXid 0 CONTEXT: xlog redo commit: 2012-08-21 08:21:42.082508+04 LOCATION: RecordKnownAssignedTransactionIds, procarray.c:2375 DEBUG: 00000: remove KnownAssignedXid 1768508080 CONTEXT: xlog redo commit: 2012-08-21 08:21:42.082508+04 LOCATION: KnownAssignedXidsRemove, procarray.c:2869 DEBUG: 00000: record known xact 1768508061 latestObservedXid 0 CONTEXT: xlog redo insert: rel 1663/16424/64654; tid 178183/44 LOCATION: RecordKnownAssignedTransactionIds, procarray.c:2375 DEBUG: 00000: record known xact 1768508061 latestObservedXid 0 CONTEXT: xlog redo insert: rel 1663/16424/95419; tid 38650/318 LOCATION: RecordKnownAssignedTransactionIds, procarray.c:2375 DEBUG: 00000: record known xact 1768508061 latestObservedXid 0 CONTEXT: xlog redo create multixact 361788644 offset 822580965: 1768508061 1768508078 LOCATION: RecordKnownAssignedTransactionIds, procarray.c:2375 FATAL: XX000: could not access status of transaction 361788644 DETAIL: Could not read from file "pg_multixact/offsets/1590" at offset 114688: Unknown error: 0. CONTEXT: xlog redo create multixact 361788644 offset 822580965: 1768508061 1768508078 LOCATION: SlruReportIOError, slru.c:840 DEBUG: 00000: shmem_exit(1): 5 callbacks to make I have some problems with debug startup process with gdb... I following next sequence of commands (and got no useful results): [pgsql@db1 ~/tmp/postgresql-9.0.8]$ /usr/local/bin/gdb /usr/local/bin/postgres ... Reading symbols from /usr/local/bin/postgres...done. (gdb) b exit Breakpoint 1 at 0x44d38c (gdb) run -D /db/data_back/ Starting program: /usr/local/bin/postgres -D /db/data_back/ Program received signal SIGUSR1, User defined signal 1. 0x00000008018ddadc in select () from /lib/libc.so.7 (gdb) bt #0 0x00000008018ddadc in select () from /lib/libc.so.7 #1 0x000000000065641c in ServerLoop () at postmaster.c:1396 #2 0x0000000000655d05 in PostmasterMain (argc=3, argv=0x7fffffffead0) at postmaster.c:1097 #3 0x00000000005d974a in main (argc=3, argv=0x7fffffffead0) at main.c:188 (gdb) cont Continuing. Breakpoint 1, 0x000000080186bab0 in exit () from /lib/libc.so.7 (gdb) bt #0 0x000000080186bab0 in exit () from /lib/libc.so.7 #1 0x000000000067d0a6 in proc_exit (code=1) at ipc.c:137 #2 0x0000000000659933 in ExitPostmaster (status=1) at postmaster.c:4172 #3 0x0000000000657d2d in reaper (postgres_signal_arg=20) at postmaster.c:2354 #4 <signal handler called> #5 <signal handler called> #6 0x00000008018ddadc in select () from /lib/libc.so.7 #7 0x000000000065641c in ServerLoop () at postmaster.c:1396 #8 0x0000000000655d05 in PostmasterMain (argc=3, argv=0x7fffffffead0) at postmaster.c:1097 #9 0x00000000005d974a in main (argc=3, argv=0x7fffffffead0) at main.c:188 (gdb) cont Continuing. [Inferior 1 (process 98857) exited with code 01] (gdb) quit
pgsql-bugs by date: