Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS - Mailing list pgsql-hackers

From Gasper Zejn
Subject Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date
Msg-id 75bfe2e2-90b0-d411-56b3-14d440c6b5b0@owca.info
Whole thread Raw
In response to Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On 09. 04. 2018 15:42, Tomas Vondra wrote:
> On 04/09/2018 12:29 AM, Bruce Momjian wrote:
>> An crazy idea would be to have a daemon that checks the logs and
>> stops Postgres when it seems something wrong.
>>
> That doesn't seem like a very practical way. It's better than nothing,
> of course, but I wonder how would that work with containers (where I
> think you may not have access to the kernel log at all). Also, I'm
> pretty sure the messages do change based on kernel version (and possibly
> filesystem) so parsing it reliably seems rather difficult. And we
> probably don't want to PANIC after I/O error on an unrelated device, so
> we'd need to understand which devices are related to PostgreSQL.
>
> regards
>

For a bit less (or more) crazy idea, I'd imagine creating a Linux kernel
module with kprobe/kretprobe capturing the file passed to fsync or even
byte range within file and corresponding return value shouldn't be that
hard. Kprobe has been a part of Linux kernel for a really long time, and
from first glance it seems like it could be backported to 2.6 too.

Then you could have stable log messages or implement some kind of "fsync
error log notification" via whatever is the most sane way to get this
out of kernel.

If the kernel is new enough and has eBPF support (seems like >=4.4),
using bcc-tools[1] should enable you to write a quick script to get
exactly that info via perf events[2].

Obviously, that's a stopgap solution ...


Kind regards,
Gasper


[1] https://github.com/iovisor/bcc
[2]
https://blog.yadutaf.fr/2016/03/30/turn-any-syscall-into-event-introducing-ebpf-kernel-probes/


pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: Re: Documentation for bootstrap data conversion
Next
From: Heikki Linnakangas
Date:
Subject: Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)