Home > mailing lists

Re: [WIP] Double-write with Fast Checksums - Mailing list pgsql-hackers

From	Aidan Van Dyk
Subject	Re: [WIP] Double-write with Fast Checksums
Date	January 11, 2012 10:47:36
Msg-id	CAC_2qU95EtBBo0GeGfd9rimUyjs3Ot1H5X4NP_=JWR1zZWrF0w@mail.gmail.com Whole thread
In response to	Re: [WIP] Double-write with Fast Checksums (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses	Re: [WIP] Double-write with Fast Checksums
List	pgsql-hackers

Tree view

On Wed, Jan 11, 2012 at 7:13 AM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:

> At the moment, double-writes are done in one batch, fsyncing the
> double-write area first and the data files immediately after that. That's
> probably beneficial if you have a BBU, and/or a fairly large shared_buffers
> setting, so that pages don't get swapped between OS and PostgreSQL cache too
> much. But when those assumptions don't hold, it would be interesting to
> treat the double-write buffers more like a 2nd WAL for full-page images.
> Whenever a dirty page is evicted from shared_buffers, write it to the
> double-write area, but don't fsync it or write it back to the data file yet.
> Instead, let it sit in the double-write area, and grow the double-write
> file(s) as necessary, until the next checkpoint comes along.

Ok, but for correctness, you need to *fsync* the double-write buffer
(WAL) before you can issue the write on the normal datafile at all.

All the double write can do is move the FPW from the WAL stream (done
at commit time) to some other "double buffer space" (which can be done
at write time).

It still has to fsync the "write-ahead" part of the double write
before it can write any of the "normal" part, or you leave the the
torn-page possibility.

And you still need to keep all the "write-ahead" part of the
double-write around until all the "normal" writes have been fsynced
(checkpoint time) so you can redo them all on crash recovery.

So, I think that the work in double-writes has merit, but if it's done
correctly, it isn't this "magic bullet" that suddenly gives us atomic,
durable writes for free.

It has major advantages (including, but not limited too)
1) Moving the FPW out of normal WAL/commit processing
2) Allowing fine control of (possibly seperate) FPW locations on a per
tablespace/relation basis

It does this by moving the FPW/IO penalty from the commit time of a
backend dirtying the buffer first, to the eviction time of a backend
evicting a dirty buffer.  And if you're lucky enough that the
background writer is the only one writing dirty buffers, you'll see
lots of improvements in your performance (equivilent of running with
current FPW off).  But I have a feeling that many of us see backends
having to write dirty buffers often enough too that the reduction in
commit/WAL latency will be offset (hopefully not as much) by increased
query processing time as backends double-write dirty buffers.

a.

--
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

pgsql-hackers by date:

From: Pavel Stehule
Date: 11 January 2012, 10:42:45
Subject: Re: JSON for PG 9.2

From: Robert Haas
Date: 11 January 2012, 11:15:58
Subject: Re: JSON for PG 9.2

Re: [WIP] Double-write with Fast Checksums - Mailing list pgsql-hackers

Previous

Next