Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance - Mailing list pgsql-hackers
From | Mel Gorman |
---|---|
Subject | Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance |
Date | |
Msg-id | 20140115114909.GI4963@suse.de Whole thread Raw |
In response to | Re: [Lsf-pc] Linux kernel impact on PostgreSQL performance (James Bottomley <James.Bottomley@HansenPartnership.com>) |
Responses |
Re: [Lsf-pc] Linux kernel impact on PostgreSQL
performance
|
List | pgsql-hackers |
On Mon, Jan 13, 2014 at 02:19:56PM -0800, James Bottomley wrote: > On Mon, 2014-01-13 at 22:12 +0100, Andres Freund wrote: > > On 2014-01-13 12:34:35 -0800, James Bottomley wrote: > > > On Mon, 2014-01-13 at 14:32 -0600, Jim Nasby wrote: > > > > Well, if we were to collaborate with the kernel community on this then > > > > presumably we can do better than that for eviction... even to the > > > > extent of "here's some data from this range in this file. It's (clean| > > > > dirty). Put it in your cache. Just trust me on this." > > > > > > This should be the madvise() interface (with MADV_WILLNEED and > > > MADV_DONTNEED) is there something in that interface that is > > > insufficient? > > > > For one, postgres doesn't use mmap for files (and can't without major > > new interfaces). > > I understand, that's why you get double buffering: because we can't > replace a page in the range you give us on read/write. However, you > don't have to switch entirely to mmap: you can use mmap/madvise > exclusively for cache control and still use read/write (and still pay > the double buffer penalty, of course). It's only read/write with > directio that would cause problems here (unless you're planning to > switch to DIO?). > There are hazards with using mmap/madvise that may or may not be a problem for them. I think these are well known but just in case; mmap/munmap intensive workloads may get hammered on taking mmap_sem for write. The greatest costs are incurred if the application is threaded if the parallel threads are fault-intensive. I do not think this is the case for PostgreSQL as it is process based but it is a concern. Even it's a single-threaded process, the cost of the mmap_sem cache line bouncing can be a concern. Outside of that, the mmap/munmap paths are just really costly and take a lot of work. madvise has different hazards but lets take DONTNEED as an example because it's the most likely candidate for use. A DONTNEED hint has three potential downsides. The first is that mmap_sem taken for read can be very costly for threaded applications as the cache line bounces. On NUMA machines it can be a major problem for madvise-intensive workloads. The second is that the page table teardown frees the pages with the associated costs but most importantly, an IPI is required afterwards to flush the TLB. If that process has been running on a lot of different CPUs then the IPI cost can be very high. The third hazard is that a madvise(DONTNEED) region will incur page faults on the next accesses again hammering into mmap_sem and all the faults associated with faulting (allocating the same pages again, zeroing etc) It may be the case that mmap/madvise is still required to handle a double buffering problem but it's far from being a free lunch and it has costs that read/write does not have to deal with. Maybe some of these problems can be fixed or mitigated but it is a case where a test case demonstrates the problem even if that requires patching PostgreSQL. -- Mel Gorman SUSE Labs
pgsql-hackers by date: