Home > mailing lists

Re: WAL logging problem in 9.4.3? - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: WAL logging problem in 9.4.3?
Date	July 2, 2015 22:21:13
Msg-id	20150702222102.GG30708@awork2.anarazel.de Whole thread Raw
In response to	WAL logging problem in 9.4.3? (Martijn van Oosterhout <kleptog@svana.org>)
Responses	Re: WAL logging problem in 9.4.3?
List	pgsql-hackers

Tree view

Hi,

On 2015-07-03 00:05:24 +0200, Martijn van Oosterhout wrote:
> === Start with an empty database

My guess is you have wal_level = minimal?

> ctmp=# begin;
> BEGIN
> ctmp=# create table test(id serial primary key);
> CREATE TABLE
> ctmp=# truncate table test;
> TRUNCATE TABLE
> ctmp=# commit;
> COMMIT
> ctmp=# select relname, relfilenode from pg_class where relname like 'test%';
>    relname   | relfilenode
> -------------+-------------
>  test        |       16389
>  test_id_seq |       16387
>  test_pkey   |       16393
> (3 rows)
> 

> === Note the index file is 8KB.
> === At this point nuke the database server (in this case it was simply 
> === destroying the container it was running in.

How did you continue from there? The container has persistent storage?
Or are you repapplying the WAL to somewhere else?

> === Dump the xlogs just to show what got recorded. Note there's a
> === truncate for the data file and the index file.

That should be ok.

> martijn@martijn-jessie:$ sudo /usr/lib/postgresql/9.4/bin/pg_xlogdump -p /data/postgres/pg_xlog/
000000010000000000000001|grep -wE '16389|16387|16393'
 
> rmgr: XLOG        len (rec/tot):     72/   104, tx:          0, lsn: 0/016A9240, prev 0/016A9200, bkp: 0000, desc:
checkpoint:redo 0/16A9240; tli 1; prev tli 1; fpw true; xid 0/686; oid 16387; multi 1; offset 0; oldest xid 673 in DB
1;oldest multi 1 in DB 1; oldest running xid 0; shutdown
 
> rmgr: Storage     len (rec/tot):     16/    48, tx:          0, lsn: 0/016A92D0, prev 0/016A92A8, bkp: 0000, desc:
filecreate: base/16385/16387
 
> rmgr: Sequence    len (rec/tot):    158/   190, tx:        686, lsn: 0/016B5E50, prev 0/016B5D88, bkp: 0000, desc:
log:rel 1663/16385/16387
 
> rmgr: Storage     len (rec/tot):     16/    48, tx:        686, lsn: 0/016B5F10, prev 0/016B5E50, bkp: 0000, desc:
filecreate: base/16385/16389
 
> rmgr: Storage     len (rec/tot):     16/    48, tx:        686, lsn: 0/016BB028, prev 0/016BAFD8, bkp: 0000, desc:
filecreate: base/16385/16393
 
> rmgr: Sequence    len (rec/tot):    158/   190, tx:        686, lsn: 0/016BE4F8, prev 0/016BE440, bkp: 0000, desc:
log:rel 1663/16385/16387
 
> rmgr: Storage     len (rec/tot):     16/    48, tx:        686, lsn: 0/016BE6B0, prev 0/016BE660, bkp: 0000, desc:
filetruncate: base/16385/16389 to 0 blocks
 
> rmgr: Storage     len (rec/tot):     16/    48, tx:        686, lsn: 0/016BE6E0, prev 0/016BE6B0, bkp: 0000, desc:
filetruncate: base/16385/16393 to 0 blocks
 
> pg_xlogdump: FATAL:  error in WAL record at 0/16BE710: record with zero length at 0/16BE740

Note that the truncate will lead to a new, different, relfilenode.

> === Start the DB up again
> 
> database_1 | LOG:  database system was interrupted; last known up at 2015-07-02 21:08:05 UTC
> database_1 | LOG:  database system was not properly shut down; automatic recovery in progress
> database_1 | LOG:  redo starts at 0/16A92A8
> database_1 | LOG:  record with zero length at 0/16BE740
> database_1 | LOG:  redo done at 0/16BE710
> database_1 | LOG:  last completed transaction was at log time 2015-07-02 21:34:45.664989+00
> database_1 | LOG:  database system is ready to accept connections
> database_1 | LOG:  autovacuum launcher started
> 
> === Oops, the index file is empty now

That's probably just the old index file?

> martijn@martijn-jessie:$ sudo ls -l /data/postgres/base/16385/{16389,16387,16393}
> -rw------- 1 messagebus ssl-cert 8192 Jul  2 23:37 /data/postgres/base/16385/16387
> -rw------- 1 messagebus ssl-cert    0 Jul  2 23:34 /data/postgres/base/16385/16389
> -rw------- 1 messagebus ssl-cert    0 Jul  2 23:37 /data/postgres/base/16385/16393
> 
> martijn@martijn-jessie:$ psql ctmp -h localhost -U username
> Password for user username:
> psql (9.4.3)
> Type "help" for help.
> 
> === And now the index is broken. I think the only reason it doesn't
> === complain about the data file is because zero bytes there is OK.  But if
> === the table had data before it would be gone now.
> 
> ctmp=# select * from test;
> ERROR:  could not read block 0 in file "base/16385/16393": read only 0 of 8192 bytes

Hm. I can't reproduce this. Can you include a bit more details about how
to reproduce?

Greetings,

Andres Freund

pgsql-hackers by date:

From: Jim Nasby
Date: 02 July 2015, 22:20:33
Subject: Determine operator from it's function

From: Tom Lane
Date: 02 July 2015, 23:07:57
Subject: Re: bugfix: incomplete implementation of errhidecontext

Re: WAL logging problem in 9.4.3? - Mailing list pgsql-hackers

Previous

Next