Thread: Reducing walreceiver latency with a latch

Reducing walreceiver latency with a latch

From

Heikki Linnakangas

Date:

13 September 2010, 08:40:30

Now that we have the wonderful latch facility, let's use it to reduce
the delay between receiving a piece of WAL and applying in the standby.
Currently, the startup process polls every 100ms to see if new WAL has
arrived, which adds an average a 50 ms delay between a transaction
commit in the master and it appearing as committed in a hot standby
server. The latch patch eliminated a similar polling delay in walsender
already, the attached patch does the same for walreceiver.

After this patch, there is no unnecessary delays in the streaming
replication code path. Note that this is all still asynchronous, just
with reduced latency.

This is pretty straightforward, but any comments?

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment

walreceiver-latch-1.patch

Re: Reducing walreceiver latency with a latch

From

Thom Brown

Date:

13 September 2010, 08:48:21

On 13 September 2010 12:40, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Now that we have the wonderful latch facility, let's use it to reduce the
> delay between receiving a piece of WAL and applying in the standby.
> Currently, the startup process polls every 100ms to see if new WAL has
> arrived, which adds an average a 50 ms delay between a transaction commit in
> the master and it appearing as committed in a hot standby server. The latch
> patch eliminated a similar polling delay in walsender already, the attached
> patch does the same for walreceiver.
>
> After this patch, there is no unnecessary delays in the streaming
> replication code path. Note that this is all still asynchronous, just with
> reduced latency.
>
> This is pretty straightforward, but any comments?

Is that supposed to be waiting 5000ms?

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

Re: Reducing walreceiver latency with a latch

From

Thom Brown

Date:

13 September 2010, 08:53:29

On 13 September 2010 12:47, Thom Brown <thom@linux.com> wrote:
> On 13 September 2010 12:40, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com> wrote:
>> Now that we have the wonderful latch facility, let's use it to reduce the
>> delay between receiving a piece of WAL and applying in the standby.
>> Currently, the startup process polls every 100ms to see if new WAL has
>> arrived, which adds an average a 50 ms delay between a transaction commit in
>> the master and it appearing as committed in a hot standby server. The latch
>> patch eliminated a similar polling delay in walsender already, the attached
>> patch does the same for walreceiver.
>>
>> After this patch, there is no unnecessary delays in the streaming
>> replication code path. Note that this is all still asynchronous, just with
>> reduced latency.
>>
>> This is pretty straightforward, but any comments?
>
> Is that supposed to be waiting 5000ms?

Ignore me, I can see that it's right.

-- 
Thom Brown
Twitter: @darkixion
IRC (freenode): dark_ixion
Registered Linux user: #516935

Re: Reducing walreceiver latency with a latch

From

Heikki Linnakangas

Date:

13 September 2010, 08:54:22

On 13/09/10 14:47, Thom Brown wrote:
> On 13 September 2010 12:40, Heikki Linnakangas
> <heikki.linnakangas@enterprisedb.com>  wrote:
>> Now that we have the wonderful latch facility, let's use it to reduce the
>> delay between receiving a piece of WAL and applying in the standby.
>> Currently, the startup process polls every 100ms to see if new WAL has
>> arrived, which adds an average a 50 ms delay between a transaction commit in
>> the master and it appearing as committed in a hot standby server. The latch
>> patch eliminated a similar polling delay in walsender already, the attached
>> patch does the same for walreceiver.
>>
>> After this patch, there is no unnecessary delays in the streaming
>> replication code path. Note that this is all still asynchronous, just with
>> reduced latency.
>>
>> This is pretty straightforward, but any comments?
>
> Is that supposed to be waiting 5000ms?

Yes, it gets interrupted as soon as WAL arrives, that timeout is to poll 
for the standby trigger file to appear or SIGTERM.

BTW, I noticed that I missed incrementing the latch count in 
win32_latch.c, and the owning/disowning the latch was done correctly, 
you get an error if you restart the master and reconnect. I'll post an 
updated patch shortly.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com

Re: Reducing walreceiver latency with a latch

From

Heikki Linnakangas

Date:

13 September 2010, 09:13:32

On 13/09/10 14:54, Heikki Linnakangas wrote:
> BTW, I noticed that I missed incrementing the latch count in
> win32_latch.c, and the owning/disowning the latch was done correctly,
> you get an error if you restart the master and reconnect. I'll post an
> updated patch shortly.

Here's an updated patch with those bugs fixed.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment

walreceiver-latch-2.patch

Re: Reducing walreceiver latency with a latch

From

Fujii Masao

Date:

13 September 2010, 23:03:05

On Mon, Sep 13, 2010 at 9:13 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> Here's an updated patch with those bugs fixed.

Great!

+    /*
+     * Walreceiver sets this latch every time new WAL has been received and
+     * fsync'd to disk, allowing startup process to wait for new WAL to
+     * arrive.
+     */
+    Latch        receivedLatch;

I think that this latch should be available for other than walreceiver -
startup process communication. For example, backend - startup process
communication, which can be used for requesting a failover via SQL function
by users in the future. What about putting the latch in XLogCtl instead of
WalRcv and calling OwnLatch at the beginning of the startup process instead
of RequestXLogStreaming?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: Reducing walreceiver latency with a latch

From

Heikki Linnakangas

Date:

14 September 2010, 05:51:18

On 14/09/10 05:02, Fujii Masao wrote:
> +    /*
> +     * Walreceiver sets this latch every time new WAL has been received and
> +     * fsync'd to disk, allowing startup process to wait for new WAL to
> +     * arrive.
> +     */
> +    Latch        receivedLatch;
>
> I think that this latch should be available for other than walreceiver -
> startup process communication. For example, backend - startup process
> communication, which can be used for requesting a failover via SQL function
> by users in the future. What about putting the latch in XLogCtl instead of
> WalRcv and calling OwnLatch at the beginning of the startup process instead
> of RequestXLogStreaming?

Yes, good point. I updated the patch along those lines, attached.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Attachment

walreceiver-latch-3.patch

Re: Reducing walreceiver latency with a latch

From

Fujii Masao

Date:

14 September 2010, 10:46:21

On Tue, Sep 14, 2010 at 5:51 PM, Heikki Linnakangas
<heikki.linnakangas@enterprisedb.com> wrote:
> On 14/09/10 05:02, Fujii Masao wrote:
>>
>> +       /*
>> +        * Walreceiver sets this latch every time new WAL has been
>> received and
>> +        * fsync'd to disk, allowing startup process to wait for new WAL
>> to
>> +        * arrive.
>> +        */
>> +       Latch           receivedLatch;
>>
>> I think that this latch should be available for other than walreceiver -
>> startup process communication. For example, backend - startup process
>> communication, which can be used for requesting a failover via SQL
>> function
>> by users in the future. What about putting the latch in XLogCtl instead of
>> WalRcv and calling OwnLatch at the beginning of the startup process
>> instead
>> of RequestXLogStreaming?
>
> Yes, good point. I updated the patch along those lines, attached.

Looks good.

+    /*
+     * Take ownership of the wakup latch if we're going to sleep during
+     * recovery.
+     */
+    if (StandbyMode)
+        OwnLatch(&XLogCtl->recoveryWakeupLatch);

Since automatic restart after backend crash always performs a normal crash
recovery, the startup process will never call OwnLatch more than once. So
there might be no harm even if the startup process doesn't disown the shared
latch. But... what about calling DisownLatch at the end of recovery just in
case?

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center