BUG #16172: failure of vacuum file truncation can cause permanent data corruption - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #16172: failure of vacuum file truncation can cause permanent data corruption |
Date | |
Msg-id | 16172-8d7483d5119bad9d@postgresql.org Whole thread Raw |
Responses |
Re: BUG #16172: failure of vacuum file truncation can causepermanent data corruption
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 16172 Logged by: TAKATSUKA Haruka Email address: harukat@sraoss.co.jp PostgreSQL version: 12.1 Operating system: Windows/Linux Description: Hello, pgsql hackers, I found that failure of vacuum file truncation can cause permanent data corruption. I am reporting the reproduce steps below. In Windows installation, the truncation sometime fails by permission denied error because of anti-virus software. It has caused just ERROR and people have offen dismissed it. Truncation failure can also make the standby panic with the following messages when replaying Heap2/VISIBLE or Heap2/CLEAN, because truncation wal is emitted even if it doesn't complete actually in the primary. WARNING: page .. of relation base/..../.... does not exist CONTEXT: WAL redo at ..... for ....: cutoff xid ... flags ... PANIC: WAL contains references to invalid pages I think truncation failure is to be handled as more severe level. Any thoughts? with best regards, Haruka Takatsuka / SRA OSS, Inc. Japan reproduce steps (PG12) ====================== $ psql -U postgres -d db1 Pager usage is off. psql (12.1) Type "help" for help. db1=# $ gdb -p {its backend process} (gdb) b FileTruncate Breakpoint 1 at 0x73d320: file fd.c, line 2057. (gdb) c Continuing. db1=# SHOW autovacuum; autovacuum ------------ off (1 row) db1=# CREATE TABLE t1 (id int primary key, v text); CREATE db1=# INSERT INTO t1 SELECT g, md5(g::text) FROM generate_series(1, 10000) as g; INSERT 0 10000 db1=# CHECKPOINT; Program received signal SIGUSR1, User defined signal 1. 0x00000036caae91a3 in __epoll_wait_nocancel () from /lib64/libc.so.6 (gdb) c Continuing. CHECKPOINT db1=# DELETE FROM t1 WHERE id > 50; DELETE 9950 db1=# VACUUM t1; Breakpoint 1, FileTruncate (file=59, offset=8192, wait_event_info=167772175) at fd.c:2057 2057 { (gdb) n 2065 returnCode = FileAccess(file); (gdb) n 2066 if (returnCode < 0) (gdb) p returnCode = -100 $6 = -100 (gdb) c Continuing. ERROR: could not truncate file "base/16384/16645" to 1 blocks: Success db1=# SELECT count(*) FROM t1; count ------- 9930 (1 row) db1=# SELECT * FROM t1 WHERE id > 50 LIMIT 10; id | v -----+---------------------------------- 121 | 4c56ff4ce4aaf9573aa5dff913df997a 122 | a0a080f42e6f13b3a2df133f073095dd 123 | 202cb962ac59075b964b07152d234b70 124 | c8ffe9a587b126f152ed3d89a146b445 125 | 3def184ad8f4755ff269862ea77393dd 126 | 069059b7ef840f0c74a814ec9237b6ec 127 | ec5decca5ed3d6b8079e2e7e7bacc9f2 128 | 76dc611d6ebaafc66cc0879c71b5db5c 129 | d1f491a404d6854880943e5c3cd9ca25 130 | 9b8619251a19057cff70779273e95aa6 (10 rows) db1=# VACUUM FULL FREEZE t1; VACUUM db1=# REINDEX TABLE t1; REINDEX db1=# SELECT count(*) FROM t1; count ------- 9930 (1 row) db1=# SELECT * FROM t1 WHERE id > 50 LIMIT 10; id | v -----+---------------------------------- 121 | 4c56ff4ce4aaf9573aa5dff913df997a 122 | a0a080f42e6f13b3a2df133f073095dd 123 | 202cb962ac59075b964b07152d234b70 124 | c8ffe9a587b126f152ed3d89a146b445 125 | 3def184ad8f4755ff269862ea77393dd 126 | 069059b7ef840f0c74a814ec9237b6ec 127 | ec5decca5ed3d6b8079e2e7e7bacc9f2 128 | 76dc611d6ebaafc66cc0879c71b5db5c 129 | d1f491a404d6854880943e5c3cd9ca25 130 | 9b8619251a19057cff70779273e95aa6 (10 rows) real incident log sample (from Windows PG10.x) ============================================== primary server: --------------- 2019-12-17 19:10:19 JST [14321] ERROR: 42501: could not truncate file "base/444807/447791" to 7 blocks: Permission denied 2019-12-17 19:10:19 JST [14321] CONTEXT: automatic vacuum of table "ddd.ssss.tttt" hot standby server: ------------------- 2019-12-17 19:11:20 JST [12345] WARNING: 01000: page 7 of relation base/444807/447791 does not exist 2019-12-17 19:11:20 JST [12345] CONTEXT: WAL redo at 4E/FC5AD060 for Heap2/VISIBLE: cutoff xid 0 flags 3 2019-12-17 19:11:20 JST [12345] PANIC: XX000: WAL contains references to invalid pages 2019-12-17 19:11:20 JST [12345] CONTEXT: WAL redo at 4E/FC5AD060 for Heap2/VISIBLE: cutoff xid 0 flags 3
pgsql-bugs by date: