Home > mailing lists

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS - Mailing list pgsql-hackers

From	Thomas Munro
Subject	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Date	March 30, 2018 00:18:14
Msg-id	CAEepm=1KFaVPdOxYkP6bmtevOZHfdHTNf8bjZWSkJxoxy0X+7A@mail.gmail.com Whole thread Raw
In response to	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS (Catalin Iacob <iacobcatalin@gmail.com>)
Responses	Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
List	pgsql-hackers

Tree view

On Fri, Mar 30, 2018 at 5:20 AM, Catalin Iacob <iacobcatalin@gmail.com> wrote:
> Jeff's comments in the pull request that merged errseq_t are worth
> reading as well:
>
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=088737f44bbf6378745f5b57b035e57ee3dc4750

Wow.  It looks like there may be a separate question of when each
filesystem adopted this new infrastructure?

>> Yeah, I see why you want to PANIC.
>
> Indeed. Even doing that leaves question marks about all the kernel
> versions before v4.13, which at this point is pretty much everything
> out there, not even detecting this reliably. This is messy.

The pre-errseq_t problems are beyond our control.  There's nothing we
can do about that in userspace (except perhaps abandon OS-buffered IO,
a big project).  We just need to be aware that this problem exists in
certain kernel versions and be grateful to Layton for fixing it.

The dropped dirty flag problem is something we can and in my view
should do something about, whatever we might think about that design
choice.  As Andrew Gierth pointed out to me in an off-list chat about
this, by the time you've reached this state, both PostgreSQL's buffer
and the kernel's buffer are clean and might be reused for another
block at any time, so your data might be gone from the known universe
-- we don't even have the option to rewrite our buffers in general.
Recovery is the only option.

Thank you to Craig for chasing this down and +1 for his proposal, on Linux only.

-- 
Thomas Munro
http://www.enterprisedb.com

pgsql-hackers by date:

From: Fujii Masao
Date: 29 March 2018, 23:37:47
Subject: Re: [HACKERS] Replication status in logical replication

From: Tom Lane
Date: 30 March 2018, 00:18:20
Subject: Re: Changing WAL Header to reduce contention during ReserveXLogInsertLocation()

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS - Mailing list pgsql-hackers

Previous

Next