WAL logging problem in 9.4.3? - Mailing list pgsql-hackers
From | Martijn van Oosterhout |
---|---|
Subject | WAL logging problem in 9.4.3? |
Date | |
Msg-id | 20150702220524.GA9392@svana.org Whole thread Raw |
Responses |
Re: WAL logging problem in 9.4.3?
|
List | pgsql-hackers |
Hoi, I ran into this in our CI setup and I thinks it's an actual bug. The issue appears to be that when a table is created *and truncated* in a single transaction, that the WAL log is logging a truncate record it shouldn't, such that if the database crashes you have a broken index. It would also lose any data that was in the table at commit time. I didn't test 9.4.4 yet, though I don't see anything in the release notes that resemble this. Detail: === Start with an empty database martijn@martijn-jessie:$ psql ctmp -h localhost -U username Password for user username: psql (9.4.3) Type "help" for help. ctmp=# begin; BEGIN ctmp=# create table test(id serial primary key); CREATE TABLE ctmp=# truncate table test; TRUNCATE TABLE ctmp=# commit; COMMIT ctmp=# select relname, relfilenode from pg_class where relname like 'test%'; relname | relfilenode -------------+-------------test | 16389test_id_seq | 16387test_pkey | 16393 (3 rows) === Note: if you do a CHECKPOINT here the issue doesn't happen === obviously. ctmp=# \q martijn@martijn-jessie:$ sudo ls -l /data/postgres/base/16385/{16389,16387,16393} [sudo] password for martijn: -rw------- 1 messagebus ssl-cert 8192 Jul 2 23:34 /data/postgres/base/16385/16387 -rw------- 1 messagebus ssl-cert 0 Jul 2 23:34 /data/postgres/base/16385/16389 -rw------- 1 messagebus ssl-cert 8192 Jul 2 23:34 /data/postgres/base/16385/16393 === Note the index file is 8KB. === At this point nuke the database server (in this case it was simply === destroying the container it was running in. === Dump the xlogs just to show what got recorded. Note there's a === truncate for the data file and the index file. martijn@martijn-jessie:$ sudo /usr/lib/postgresql/9.4/bin/pg_xlogdump -p /data/postgres/pg_xlog/ 000000010000000000000001|grep -wE '16389|16387|16393' rmgr: XLOG len (rec/tot): 72/ 104, tx: 0, lsn: 0/016A9240, prev 0/016A9200, bkp: 0000, desc: checkpoint:redo 0/16A9240; tli 1; prev tli 1; fpw true; xid 0/686; oid 16387; multi 1; offset 0; oldest xid 673 in DB 1;oldest multi 1 in DB 1; oldest running xid 0; shutdown rmgr: Storage len (rec/tot): 16/ 48, tx: 0, lsn: 0/016A92D0, prev 0/016A92A8, bkp: 0000, desc: file create:base/16385/16387 rmgr: Sequence len (rec/tot): 158/ 190, tx: 686, lsn: 0/016B5E50, prev 0/016B5D88, bkp: 0000, desc: log: rel1663/16385/16387 rmgr: Storage len (rec/tot): 16/ 48, tx: 686, lsn: 0/016B5F10, prev 0/016B5E50, bkp: 0000, desc: file create:base/16385/16389 rmgr: Storage len (rec/tot): 16/ 48, tx: 686, lsn: 0/016BB028, prev 0/016BAFD8, bkp: 0000, desc: file create:base/16385/16393 rmgr: Sequence len (rec/tot): 158/ 190, tx: 686, lsn: 0/016BE4F8, prev 0/016BE440, bkp: 0000, desc: log: rel1663/16385/16387 rmgr: Storage len (rec/tot): 16/ 48, tx: 686, lsn: 0/016BE6B0, prev 0/016BE660, bkp: 0000, desc: file truncate:base/16385/16389 to 0 blocks rmgr: Storage len (rec/tot): 16/ 48, tx: 686, lsn: 0/016BE6E0, prev 0/016BE6B0, bkp: 0000, desc: file truncate:base/16385/16393 to 0 blocks pg_xlogdump: FATAL: error in WAL record at 0/16BE710: record with zero length at 0/16BE740 === Start the DB up again database_1 | LOG: database system was interrupted; last known up at 2015-07-02 21:08:05 UTC database_1 | LOG: database system was not properly shut down; automatic recovery in progress database_1 | LOG: redo starts at 0/16A92A8 database_1 | LOG: record with zero length at 0/16BE740 database_1 | LOG: redo done at 0/16BE710 database_1 | LOG: last completed transaction was at log time 2015-07-02 21:34:45.664989+00 database_1 | LOG: database system is ready to accept connections database_1 | LOG: autovacuum launcher started === Oops, the index file is empty now martijn@martijn-jessie:$ sudo ls -l /data/postgres/base/16385/{16389,16387,16393} -rw------- 1 messagebus ssl-cert 8192 Jul 2 23:37 /data/postgres/base/16385/16387 -rw------- 1 messagebus ssl-cert 0 Jul 2 23:34 /data/postgres/base/16385/16389 -rw------- 1 messagebus ssl-cert 0 Jul 2 23:37 /data/postgres/base/16385/16393 martijn@martijn-jessie:$ psql ctmp -h localhost -U username Password for user username: psql (9.4.3) Type "help" for help. === And now the index is broken. I think the only reason it doesn't === complain about the data file is because zero bytes there is OK. But if === the table had data before it would be gone now. ctmp=# select * from test; ERROR: could not read block 0 in file "base/16385/16393": read only 0 of 8192 bytes ctmp=# select version(); version -----------------------------------------------------------------------------------------------PostgreSQL 9.4.3 on x86_64-unknown-linux-gnu,compiled by gcc (Debian 4.9.2-10) 4.9.2, 64-bit (1 row) Hope this helps. -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > He who writes carelessly confesses thereby at the very outset that he does > not attach much importance to his own thoughts. -- Arthur Schopenhauer
pgsql-hackers by date: