Re: BUG #16172: failure of vacuum file truncation can causepermanent data corruption - Mailing list pgsql-bugs
From | TAKATSUKA Haruka |
---|---|
Subject | Re: BUG #16172: failure of vacuum file truncation can causepermanent data corruption |
Date | |
Msg-id | 20191220110028.471b95ff8b9443046d9603a4@sraoss.co.jp Whole thread Raw |
In response to | Re: BUG #16172: failure of vacuum file truncation can causepermanent data corruption (TAKATSUKA Haruka <harukat@sraoss.co.jp>) |
Responses |
Re: BUG #16172: failure of vacuum file truncation can causepermanent data corruption
|
List | pgsql-bugs |
I found moving DropRelFileNodeBuffers() from top to end in function smgrtruncate() is a proper modification. It passed the regression test and this reproduction test. with best regards, Haruka Takatsuka / SRA OSS, Inc. Japan On Fri, 20 Dec 2019 10:19:52 +0900 TAKATSUKA Haruka <harukat@sraoss.co.jp> wrote: > I also tested PostgreSQL with the attached patch avoided this data > corruption. The patch just removes DropRelFileNodeBuffers() from > smgrtruncate(). > > > On Thu, 19 Dec 2019 07:14:42 +0000 > PG Bug reporting form <noreply@postgresql.org> wrote: > > > The following bug has been logged on the website: > > > > Bug reference: 16172 > > Logged by: TAKATSUKA Haruka > > Email address: harukat@sraoss.co.jp > > PostgreSQL version: 12.1 > > Operating system: Windows/Linux > > Description: > > > > Hello, pgsql hackers, > > > > I found that failure of vacuum file truncation can cause permanent data > > corruption. > > I am reporting the reproduce steps below. > > > > In Windows installation, the truncation sometime fails by permission > > denied error because of anti-virus software. It has caused just ERROR > > and people have offen dismissed it. > > > > Truncation failure can also make the standby panic with the following > > messages when replaying Heap2/VISIBLE or Heap2/CLEAN, because truncation > > wal is emitted even if it doesn't complete actually in the primary. > > > > WARNING: page .. of relation base/..../.... does not exist > > CONTEXT: WAL redo at ..... for ....: cutoff xid ... flags ... > > PANIC: WAL contains references to invalid pages > > > > I think truncation failure is to be handled as more severe level. > > Any thoughts? > > > > with best regards, > > Haruka Takatsuka / SRA OSS, Inc. Japan > > > > > > reproduce steps (PG12) > > ====================== > > > > $ psql -U postgres -d db1 > > Pager usage is off. > > psql (12.1) > > Type "help" for help. > > > > db1=# > > > > $ gdb -p {its backend process} > > > > (gdb) b FileTruncate > > Breakpoint 1 at 0x73d320: file fd.c, line 2057. > > (gdb) c > > Continuing. > > > > db1=# SHOW autovacuum; > > autovacuum > > ------------ > > off > > (1 row) > > > > db1=# CREATE TABLE t1 (id int primary key, v text); > > CREATE > > > > db1=# INSERT INTO t1 SELECT g, md5(g::text) FROM generate_series(1, 10000) > > as g; > > INSERT 0 10000 > > > > db1=# CHECKPOINT; > > > > Program received signal SIGUSR1, User defined signal 1. > > 0x00000036caae91a3 in __epoll_wait_nocancel () from /lib64/libc.so.6 > > (gdb) c > > Continuing. > > > > CHECKPOINT > > > > db1=# DELETE FROM t1 WHERE id > 50; > > DELETE 9950 > > > > db1=# VACUUM t1; > > > > Breakpoint 1, FileTruncate (file=59, offset=8192, > > wait_event_info=167772175) > > at fd.c:2057 > > 2057 { > > (gdb) n > > 2065 returnCode = FileAccess(file); > > (gdb) n > > 2066 if (returnCode < 0) > > (gdb) p returnCode = -100 > > $6 = -100 > > (gdb) c > > Continuing. > > > > ERROR: could not truncate file "base/16384/16645" to 1 blocks: Success > > > > db1=# SELECT count(*) FROM t1; > > count > > ------- > > 9930 > > (1 row) > > > (snip) ______________________________________________________________________ 高塚 遥 harukat@sraoss.co.jp SRA OSS, Inc. http://www.sraoss.co.jp 〒171-0022 東京都豊島区南池袋2-32-8 TEL: 03-5979-2701 FAX: 03-5979-2702 CellPhone: 080-1292-3396
Attachment
pgsql-bugs by date: