BUG #6725: hot-standby slave repeatable crashed after restart (GIN error) - Mailing list pgsql-bugs
From | maxim.boguk@gmail.com |
---|---|
Subject | BUG #6725: hot-standby slave repeatable crashed after restart (GIN error) |
Date | |
Msg-id | E1SoNn4-00086s-Ie@wrigleys.postgresql.org Whole thread Raw |
Responses |
Re: BUG #6725: hot-standby slave repeatable crashed after restart (GIN error)
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 6725 Logged by: Maxim Boguk Email address: maxim.boguk@gmail.com PostgreSQL version: 9.0.7 Operating system: Debian Linux Description:=20=20=20=20=20=20=20=20 Hi, On one of my production databases hot-standby server start repeatable crash after restart. There is log: 2012-07-10 03:30:20 MSK 12034 postgres@postgres from [local] [vxid:2/0 txid:0] [CHECKPOINT] LOG: duration: 100.262 ms statement: checkpoint; 2012-07-10 03:30:24 MSK 12034 postgres@postgres from [local] [vxid:2/0 txid:0] [CHECKPOINT] LOG: duration: 100.207 ms statement: checkpoint; 2012-07-10 03:30:41 MSK 24618 @ from [vxid: txid:0] [] LOG: received fast shutdown request 2012-07-10 03:30:41 MSK 24618 @ from [vxid: txid:0] [] LOG: aborting any active transactions 2012-07-10 03:30:41 MSK 25312 @ from [vxid: txid:0] [] FATAL: terminating walreceiver process due to administrator command 2012-07-10 03:30:41 MSK 24646 @ from [vxid: txid:0] [] LOG: shutting down 2012-07-10 03:30:41 MSK 24646 @ from [vxid: txid:0] [] LOG: database system is shut down 2012-07-10 03:30:45 MSK 12202 @ from [vxid: txid:0] [] LOG: database system was shut down in recovery at 2012-07-10 03:30:41 MSK 2012-07-10 03:30:45 MSK 12202 @ from [vxid: txid:0] [] LOG: entering standby mode 2012-07-10 03:30:45 MSK 12202 @ from [vxid:1/0 txid:0] [] LOG: redo starts at 127C/51EF6798 2012-07-10 03:31:35 MSK 12202 @ from [vxid:1/0 txid:0] [] PANIC: GIN metapage disappeared 2012-07-10 03:31:35 MSK 12202 @ from [vxid:1/0 txid:0] [] CONTEXT: xlog redo Update metapage, node: 1663/62450/27663577 blkno: 2164 2012-07-10 03:31:35 MSK 12189 @ from [vxid: txid:0] [] LOG: startup process (PID 12202) was terminated by signal 6: Aborted 2012-07-10 03:31:35 MSK 12189 @ from [vxid: txid:0] [] LOG: terminating any other active server processes 2012-07-10 03:33:37 MSK 13071 @ from [vxid: txid:0] [] LOG: database system was interrupted while in recovery at log time 2012-07-10 02:17:21 MSK 2012-07-10 03:33:37 MSK 13071 @ from [vxid: txid:0] [] HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target. 2012-07-10 03:33:37 MSK 13071 @ from [vxid: txid:0] [] LOG: entering standby mode 2012-07-10 03:33:37 MSK 13071 @ from [vxid:1/0 txid:0] [] LOG: redo starts at 127C/51EF6798 2012-07-10 03:34:15 MSK 13071 @ from [vxid:1/0 txid:0] [] PANIC: GIN metapage disappeared 2012-07-10 03:34:15 MSK 13071 @ from [vxid:1/0 txid:0] [] CONTEXT: xlog redo Update metapage, node: 1663/62450/27663577 blkno: 2164 2012-07-10 03:34:15 MSK 13065 @ from [vxid: txid:0] [] LOG: startup process (PID 13071) was terminated by signal 6: Aborted 2012-07-10 03:34:15 MSK 13065 @ from [vxid: txid:0] [] LOG: terminating any other active server processes and so on. I almost sure there are no disk corruption or such errors. Now some additional information from the master server: 1)1663/62450/27663577 - corresponding to the pg_default tablespace and wsfilm database. However, no entries with relfilenode=3D27663577 were found= in pg_class. In the same time:=20=20 ls -la ./9.0/main/base/62450/27663577 -rw------- 1 postgres postgres 0 =D0=98=D1=8E=D0=BB 10 02:29 ./9.0/main/base/62450/27663577 In wsfilm db only one GIN index exists: wsfilm=3D# SELECT * from pg_indexes where indexdef like '%gin%'; -[ RECORD 1 ]--------------------------------------------------- schemaname | public tablename | movie indexname | movie_fti_idx tablespace | indexdef | CREATE INDEX movie_fti_idx ON movie USING gin (fti) But it has different relfilenode: wsfilm=3D# SELECT relfilenode from pg_class where relname=3D'movie_fti_idx'; -[ RECORD 1 ]--------- relfilenode | 27677714 2)Just few minutes before that index was rebuild on the master database: 2012-07-10 02:29:50 MSK 1480 postgres@wsfilm from [local] [vxid:33/3324338 txid:0] [idle] LOG: statement: CREATE INDEX CONCURRENTLY pgcompactor_tmp1436 ON movie USING gin (fti); 2012-07-10 02:29:52 MSK 1480 postgres@wsfilm from [local] [vxid:33/3324343 txid:0] [idle] LOG: statement: BEGIN; DROP INDEX public.movie_fti_idx; ALTER INDEX public.pgcompactor_tmp1436 RENAME TO movie_fti_idx; END; My theory that the wal replay trying replay GIN index information on already dropped index.
pgsql-bugs by date: