Re: fsync reliability - Mailing list pgsql-hackers
From | Greg Smith |
---|---|
Subject | Re: fsync reliability |
Date | |
Msg-id | 4DB59A8D.9060004@2ndQuadrant.com Whole thread Raw |
In response to | Re: fsync reliability (Matthew Woodcraft <matthew@woodcraft.me.uk>) |
Responses |
Re: fsync reliability
Re: fsync reliability |
List | pgsql-hackers |
On 04/23/2011 09:58 AM, Matthew Woodcraft wrote: > As far as I can make out, the current situation is that this fix (the > auto_da_alloc mount option) doesn't work as advertised, and the ext4 > maintainers are not treating this as a bug. > > See https://bugzilla.kernel.org/show_bug.cgi?id=15910 > I agree with the resolution that this isn't a bug. As pointed out there, XFS does the same thing, and this behavior isn't going away any time soon. Leaving behind zero-length files in situations where developers tried to optimize away a necessary fsync happens. Here's the part where the submitter goes wrong: "We first added a fsync() call for each extracted file. But scattered fsyncs resulted in a massive performance degradation during package installation (factor 10 or more, some reported that it took over an hour to unpack a linux-headers-* package!) In order to reduce the I/O performance degradation, fsync calls were deferred..." Stop right there; the slow path was the only one that had any hope of being correct. It can actually slow things by a factor of 100X or more, worst-case. "So, we currently have the choice between filesystem corruption or major performance loss": yes, you do. Writing files is tricky and it can either be slow or safe. If you're going to avoid even trying to enforce the right thing here, you're really going to get really burned. It's unfortunate that so many people are used to the speed you get in the common situation for a while now with ext3 and cheap hard drives: all writes are cached unsafely, but the filesystem resists a few bad behaviors. Much of the struggle where people say "this is so much slower, I won't put up with it" and try to code around it is futile, and it's hard to separate out the attempts to find such optimizations from the legitimate complaints. Anyway, you're right to point out that the filesystem is not necessarily going to save anyone from some of the tricky rename situations even with the improvements made to delayed allocation. They've fixed some of the worst behavior of the earlier implementation, but there are still potential issues in that area it seems. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us
pgsql-hackers by date: