Re: New replication mode: write - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: New replication mode: write
Date
Msg-id CAHGQGwE+Zxk_yNw0rw8bo__+YzFOvEw3HtCb+8FQL=fzTaPxJA@mail.gmail.com
Whole thread Raw
In response to New replication mode: write  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: New replication mode: write
Re: New replication mode: write
List pgsql-hackers
On Fri, Jan 13, 2012 at 7:30 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On Fri, Jan 13, 2012 at 9:15 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On Fri, Jan 13, 2012 at 7:41 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
>>
>>> Thought? Comments?
>>
>> This is almost exactly the same as my patch series
>> "syncrep_queues.v[1,2].patch" earlier this year. Which I know because
>> I was updating that patch myself last night for 9.2. I'm about half
>> way through doing that, since you and I agreed in Ottawa I would do
>> this. Perhaps it is better if we work together?
>
> I think this comment is mostly pointless. We don't have time to work
> together and there's no real reason to. You know what you're doing, so
> I'll leave you to do it.
>
> Please add the Apply mode.

OK, will do.

> In my patch, the reason I avoided doing WRITE mode (which we had
> previously referred to as RECV) was that no fsync of the WAL contents
> takes place. In that case we are applying changes using un-fsynced WAL
> data and in case of crash this would cause a problem.

My patch has not changed the execution order of WAL flush and replay.
WAL records are always replayed after they are flushed by walreceiver.
So, such a problem doesn't happen.

But which means that transaction might need to wait for WAL flush caused
by previous transaction even if WRITE mode is chosen. Which limits the
performance gain by WRITE mode, and should be improved later, I think.

> I was going to
> make the WalWriter available during recovery to cater for that. Do you
> not think that is no longer necessary?

That's still necessary to improve the performance in sync rep further, I think.
What I'd like to do (maybe in 9.3dev) after supporting WRITE mode is:

* Allow WAL records to be replayed before they are flushed to the disk.
* Add new GUC parameter specifying whether to allow the standby to defer  WAL flush. If the parameter is false,
walreceiverflushes WAL whenever it  receives WAL (i.e., it's same as the current behavior). If true, walreceiver
doesn'tflush WAL at all. Instead, walwriter, backend or startup process  does that. Walwriter periodically checks
whetherthere is un-flushed WAL  file, and flushes it if exists. When the buffer page is written out, backend  or
startupprocess forces WAL flush up to buffer's LSN.
 

If the above GUC parameter is set to true (i.e., walreceiver doesn't flush
WAL at all) and WRITE mode is chosen, transaction doesn't need to wait
for WAL flush on the standby at all. Also the frequency of WAL flush on
the standby would become lower, which significantly reduces I/O load.
After all, the performance in sync rep would improve very much.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: read transaction and sync rep
Next
From: Simon Riggs
Date:
Subject: Re: New replication mode: write