Home > mailing lists

Re: fsync reliability - Mailing list pgsql-hackers

From	Greg Smith
Subject	Re: fsync reliability
Date	April 25, 2011 12:59:28
Msg-id	4DB59A8D.9060004@2ndQuadrant.com Whole thread Raw
In response to	Re: fsync reliability (Matthew Woodcraft <matthew@woodcraft.me.uk>)
Responses	Re: fsync reliability Re: fsync reliability
List	pgsql-hackers

Tree view

On 04/23/2011 09:58 AM, Matthew Woodcraft wrote:
> As far as I can make out, the current situation is that this fix (the
> auto_da_alloc mount option) doesn't work as advertised, and the ext4
> maintainers are not treating this as a bug.
>
> See https://bugzilla.kernel.org/show_bug.cgi?id=15910
>    

I agree with the resolution that this isn't a bug.  As pointed out 
there, XFS does the same thing, and this behavior isn't going away any 
time soon.  Leaving behind zero-length files in situations where 
developers tried to optimize away a necessary fsync happens.

Here's the part where the submitter goes wrong:

"We first added a fsync() call for each extracted file. But scattered 
fsyncs resulted in a massive performance degradation during package 
installation (factor 10 or more, some reported that it took over an hour 
to unpack a linux-headers-* package!) In order to reduce the I/O 
performance degradation, fsync calls were deferred..."

Stop right there; the slow path was the only one that had any hope of 
being correct.  It can actually slow things by a factor of 100X or more, 
worst-case.  "So, we currently have the choice between filesystem 
corruption or major performance loss":  yes, you do.  Writing files is 
tricky and it can either be slow or safe.  If you're going to avoid even 
trying to enforce the right thing here, you're really going to get 
really burned.

It's unfortunate that so many people are used to the speed you get in 
the common situation for a while now with ext3 and cheap hard drives:  
all writes are cached unsafely, but the filesystem resists a few bad 
behaviors.  Much of the struggle where people say "this is so much 
slower, I won't put up with it" and try to code around it is futile, and 
it's hard to separate out the attempts to find such optimizations from 
the legitimate complaints.

Anyway, you're right to point out that the filesystem is not necessarily 
going to save anyone from some of the tricky rename situations even with 
the improvements made to delayed allocation.  They've fixed some of the 
worst behavior of the earlier implementation, but there are still 
potential issues in that area it seems.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us

pgsql-hackers by date:

From: Aidan Van Dyk
Date: 25 April 2011, 12:57:27
Subject: Re: branching for 9.2devel

From: "David E. Wheeler"
Date: 25 April 2011, 13:00:39
Subject: Re: Extension Packaging

Re: fsync reliability - Mailing list pgsql-hackers

Previous

Next