Home > mailing lists

Re: checkpointer continuous flushing - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: checkpointer continuous flushing
Date	August 17, 2015 15:13:13
Msg-id	20150817151306.GB10786@awork2.anarazel.de Whole thread Raw
In response to	Re: checkpointer continuous flushing (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses	Re: checkpointer continuous flushing
List	pgsql-hackers

Tree view

On 2015-08-17 15:21:22 +0200, Fabien COELHO wrote:
> My current thinking is "maybe yes, maybe no":-), as it may depend on the OS
> implementation of posix_fadvise, so it may differ between OS.

As long as fadvise has no 'undirty' option, I don't see how that
problem goes away. You're telling the OS to throw the buffer away, so
unless it ignores it that'll have consequences when you read the page
back in.

> This is a reason why I think that flushing should be kept a guc, even if the
> sort guc is removed and always on. The sync_file_range implementation is
> clearly always very beneficial for Linux, and the posix_fadvise may or may
> not induce a good behavior depending on the underlying system.

That's certainly an argument.

> This is also a reason why the default value for the flush guc is currently
> set to false in the patch. The documentation should advise to turn it on for
> Linux and to test otherwise. Or if Linux is assumed to be often a host, then
> maybe to set the default to on and to suggest that on some systems it may be
> better to have it off. 

I'd say it should then be an os-specific default. No point in making
people work for it needlessly on linux and/or elsewhere.

> (Another reason to keep it "off" is that I'm not sure about what
> happens with such HD flushing features on virtual servers).

I don't see how that matters? Either the host will entirely ignore
flushing, and thus the sync_file_range and the fsync won't cost much, or
fsync will be honored, in which case the pre-flushing is helpful.

> Overall, I'm not pessimistic, because I've seen I/O storms on a FreeBSD host
> and it was as bad as Linux (namely the database and even the box was offline
> for long minutes...), and if you can avoid that having to read back some
> data may be not that bad a down payment.

I don't see how that'd alleviate my fear. Sure, the latency for many
workloads will be better, but I don't how that argument says anything
about the reads? And we'll not just use this in cases it'd be
beneficial...

> The issue is largely mitigated if the data is not removed from
> shared_buffers, because the OS buffer is just a copy of already hold data.
> What I would do on such systems is to increase shared_buffers and keep
> flushing on, that is to count less on the system cache and more on postgres
> own cache.

That doesn't work that well for a bunch of reasons. For one it's
completely non-adaptive. With the OS's page cache you can rely on free
memory being used for caching *and* it be available should a query or
another program need lots of memory.

> Overall, I'm not convince that the practice of relying on the OS cache is a
> good one, given what it does with it, at least on Linux.

The alternatives aren't super realistic near-term though. Using direct
IO efficiently on the set of operating systems we support is
*hard*. It's more or less trivial to hack pg up to use direct IO for
relations/shared_buffers, but it'll perform utterly horribly in many
many cases.

To pick one thing out: Without the OS buffering writes any write will
have to wait for the disks, instead being asynchronous. That'll make
writes performed by backends a massive bottleneck.

> Now, if someone could provide a dedicated box with posix_fadvise (say
> FreeBSD, maybe others...) for testing that would allow to provide data
> instead of speculating... and then maybe to decide to change its default
> value.

Testing, as an approximation, how it turns out to work on linux would be
a good step.

Greetings,

Andres Freund

pgsql-hackers by date:

From: Andrew Dunstan
Date: 17 August 2015, 15:03:37
Subject: Re: what would tar file FDW look like?

From: Mark Johnston
Date: 17 August 2015, 15:36:08
Subject: DTrace build dependency rules

Re: checkpointer continuous flushing - Mailing list pgsql-hackers

Previous

Next