Re: WAL logging problem in 9.4.3? - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: WAL logging problem in 9.4.3? |
Date | |
Msg-id | 20150702222102.GG30708@awork2.anarazel.de Whole thread Raw |
In response to | WAL logging problem in 9.4.3? (Martijn van Oosterhout <kleptog@svana.org>) |
Responses |
Re: WAL logging problem in 9.4.3?
|
List | pgsql-hackers |
Hi, On 2015-07-03 00:05:24 +0200, Martijn van Oosterhout wrote: > === Start with an empty database My guess is you have wal_level = minimal? > ctmp=# begin; > BEGIN > ctmp=# create table test(id serial primary key); > CREATE TABLE > ctmp=# truncate table test; > TRUNCATE TABLE > ctmp=# commit; > COMMIT > ctmp=# select relname, relfilenode from pg_class where relname like 'test%'; > relname | relfilenode > -------------+------------- > test | 16389 > test_id_seq | 16387 > test_pkey | 16393 > (3 rows) > > === Note the index file is 8KB. > === At this point nuke the database server (in this case it was simply > === destroying the container it was running in. How did you continue from there? The container has persistent storage? Or are you repapplying the WAL to somewhere else? > === Dump the xlogs just to show what got recorded. Note there's a > === truncate for the data file and the index file. That should be ok. > martijn@martijn-jessie:$ sudo /usr/lib/postgresql/9.4/bin/pg_xlogdump -p /data/postgres/pg_xlog/ 000000010000000000000001|grep -wE '16389|16387|16393' > rmgr: XLOG len (rec/tot): 72/ 104, tx: 0, lsn: 0/016A9240, prev 0/016A9200, bkp: 0000, desc: checkpoint:redo 0/16A9240; tli 1; prev tli 1; fpw true; xid 0/686; oid 16387; multi 1; offset 0; oldest xid 673 in DB 1;oldest multi 1 in DB 1; oldest running xid 0; shutdown > rmgr: Storage len (rec/tot): 16/ 48, tx: 0, lsn: 0/016A92D0, prev 0/016A92A8, bkp: 0000, desc: filecreate: base/16385/16387 > rmgr: Sequence len (rec/tot): 158/ 190, tx: 686, lsn: 0/016B5E50, prev 0/016B5D88, bkp: 0000, desc: log:rel 1663/16385/16387 > rmgr: Storage len (rec/tot): 16/ 48, tx: 686, lsn: 0/016B5F10, prev 0/016B5E50, bkp: 0000, desc: filecreate: base/16385/16389 > rmgr: Storage len (rec/tot): 16/ 48, tx: 686, lsn: 0/016BB028, prev 0/016BAFD8, bkp: 0000, desc: filecreate: base/16385/16393 > rmgr: Sequence len (rec/tot): 158/ 190, tx: 686, lsn: 0/016BE4F8, prev 0/016BE440, bkp: 0000, desc: log:rel 1663/16385/16387 > rmgr: Storage len (rec/tot): 16/ 48, tx: 686, lsn: 0/016BE6B0, prev 0/016BE660, bkp: 0000, desc: filetruncate: base/16385/16389 to 0 blocks > rmgr: Storage len (rec/tot): 16/ 48, tx: 686, lsn: 0/016BE6E0, prev 0/016BE6B0, bkp: 0000, desc: filetruncate: base/16385/16393 to 0 blocks > pg_xlogdump: FATAL: error in WAL record at 0/16BE710: record with zero length at 0/16BE740 Note that the truncate will lead to a new, different, relfilenode. > === Start the DB up again > > database_1 | LOG: database system was interrupted; last known up at 2015-07-02 21:08:05 UTC > database_1 | LOG: database system was not properly shut down; automatic recovery in progress > database_1 | LOG: redo starts at 0/16A92A8 > database_1 | LOG: record with zero length at 0/16BE740 > database_1 | LOG: redo done at 0/16BE710 > database_1 | LOG: last completed transaction was at log time 2015-07-02 21:34:45.664989+00 > database_1 | LOG: database system is ready to accept connections > database_1 | LOG: autovacuum launcher started > > === Oops, the index file is empty now That's probably just the old index file? > martijn@martijn-jessie:$ sudo ls -l /data/postgres/base/16385/{16389,16387,16393} > -rw------- 1 messagebus ssl-cert 8192 Jul 2 23:37 /data/postgres/base/16385/16387 > -rw------- 1 messagebus ssl-cert 0 Jul 2 23:34 /data/postgres/base/16385/16389 > -rw------- 1 messagebus ssl-cert 0 Jul 2 23:37 /data/postgres/base/16385/16393 > > martijn@martijn-jessie:$ psql ctmp -h localhost -U username > Password for user username: > psql (9.4.3) > Type "help" for help. > > === And now the index is broken. I think the only reason it doesn't > === complain about the data file is because zero bytes there is OK. But if > === the table had data before it would be gone now. > > ctmp=# select * from test; > ERROR: could not read block 0 in file "base/16385/16393": read only 0 of 8192 bytes Hm. I can't reproduce this. Can you include a bit more details about how to reproduce? Greetings, Andres Freund
pgsql-hackers by date: