RE: prevent immature WAL streaming - Mailing list pgsql-hackers
From | Jakub Wartak |
---|---|
Subject | RE: prevent immature WAL streaming |
Date | |
Msg-id | VI1PR0701MB69608F806BD102C8B4C3C6FDF6C69@VI1PR0701MB6960.eurprd07.prod.outlook.com Whole thread Raw |
In response to | Re: prevent immature WAL streaming ("alvherre@alvh.no-ip.org" <alvherre@alvh.no-ip.org>) |
Responses |
Re: prevent immature WAL streaming
|
List | pgsql-hackers |
Hi Álvaro, -hackers, > I attach the patch with the change you suggested. I've gave a shot to to the v02 patch on top of REL_12_STABLE (already including 5065aeafb0b7593c04d3bc5bc2a86037f32143fc).Previously(yesterday) without the v02 patch I was getting standby corruption alwaysvia simulation by having separate /pg_xlog dedicated fs, and archive_mode=on, wal_keep_segments=120, archive_commandset to rsync to different dir on same fs, wal_init_zero at default(true). Today (with v02) I've got corruption in only initial 2 runs out of ~ >30 tries on standby. Probably the 2 failures were somehowmy fault (?) or some rare condition (and in 1 of those 2 cases simply restarting standby did help). To be honest I'vetried to force this error, but with v02 I simply cannot force this error anymore, so that's good! :) > I didn't have a lot of luck with a reliable reproducer script. I was able to > reproduce the problem starting with Ryo Matsumura's script and attaching > a replica; most of the time the replica would recover by restarting from a > streaming position earlier than where the problem occurred; but a few > times it would just get stuck with a WAL segment containing a bogus > record. In order to get reliable reproducer and get proper the fault injection instead of playing with really filling up fs, apparentlyone could substitute fd with fd of /dev/full using e.g. dup2() so that every write is going to throw this errortoo: root@hive:~# ./t & # simple while(1) { fprintf() flush () } testcase root@hive:~# ls -l /proc/27296/fd/3 lrwx------ 1 root root 64 Aug 25 06:22 /proc/27296/fd/3 -> /tmp/testwrite root@hive:~# gdb -q -p 27296 -- 1089 is bitmask O_WRONLY|.. (gdb) p dup2(open("/dev/full", 1089, 0777), 3) $1 = 3 (gdb) c Continuing. ==> fflush/write(): : No space left on device So I've also tried to be malicious while writing to the DB and inject ENOSPCE near places like: a) XLogWrite()->XLogFileInit() near line 3322 // assuming: if (wal_init_zero) is true, one gets classic "PANIC: could notwrite to file "pg_wal/xlogtemp.90670": No space left on device" b) XLogWrite() near line 2547 just after pg_pwrite // one can get "PANIC: could not write to log file 000000010000003B000000A8at offset 0, length 15466496: No space left on device" (that would be possible with wal_init_zero=false?) c) XLogWrite() near line 2592 // just before issue_xlog_fsync to get "PANIC: could not fdatasync file "000000010000004300000004":Invalid argument" that would pretty much mean same as above but with last possible offset nearend of WAL? This was done with gdb voodoo: handle SIGUSR1 noprint nostop break xlog.c:<LINE> // https://github.com/postgres/postgres/blob/REL_12_STABLE/src/backend/access/transam/xlog.c#L3311 c print fd or openLogFile -- to verify it is 3 p dup2(open("/dev/full", 1089, 0777), 3) -- during most of walwriter runtime it has current log as fd=3 After restarting master and inspecting standby - in all of those above 3 cases - the standby didn't inhibit the "invalidcontrecord length" at least here, while without corruption this v02 patch it is notorious. So if it passes the worst-casecode review assumptions I would be wondering if it shouldn't even be committed as it stands right now. -J.
pgsql-hackers by date: