Thread: Reducing walreceiver latency with a latch
Now that we have the wonderful latch facility, let's use it to reduce the delay between receiving a piece of WAL and applying in the standby. Currently, the startup process polls every 100ms to see if new WAL has arrived, which adds an average a 50 ms delay between a transaction commit in the master and it appearing as committed in a hot standby server. The latch patch eliminated a similar polling delay in walsender already, the attached patch does the same for walreceiver. After this patch, there is no unnecessary delays in the streaming replication code path. Note that this is all still asynchronous, just with reduced latency. This is pretty straightforward, but any comments? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Attachment
On 13 September 2010 12:40, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Now that we have the wonderful latch facility, let's use it to reduce the > delay between receiving a piece of WAL and applying in the standby. > Currently, the startup process polls every 100ms to see if new WAL has > arrived, which adds an average a 50 ms delay between a transaction commit in > the master and it appearing as committed in a hot standby server. The latch > patch eliminated a similar polling delay in walsender already, the attached > patch does the same for walreceiver. > > After this patch, there is no unnecessary delays in the streaming > replication code path. Note that this is all still asynchronous, just with > reduced latency. > > This is pretty straightforward, but any comments? Is that supposed to be waiting 5000ms? -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935
On 13 September 2010 12:47, Thom Brown <thom@linux.com> wrote: > On 13 September 2010 12:40, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> Now that we have the wonderful latch facility, let's use it to reduce the >> delay between receiving a piece of WAL and applying in the standby. >> Currently, the startup process polls every 100ms to see if new WAL has >> arrived, which adds an average a 50 ms delay between a transaction commit in >> the master and it appearing as committed in a hot standby server. The latch >> patch eliminated a similar polling delay in walsender already, the attached >> patch does the same for walreceiver. >> >> After this patch, there is no unnecessary delays in the streaming >> replication code path. Note that this is all still asynchronous, just with >> reduced latency. >> >> This is pretty straightforward, but any comments? > > Is that supposed to be waiting 5000ms? Ignore me, I can see that it's right. -- Thom Brown Twitter: @darkixion IRC (freenode): dark_ixion Registered Linux user: #516935
On 13/09/10 14:47, Thom Brown wrote: > On 13 September 2010 12:40, Heikki Linnakangas > <heikki.linnakangas@enterprisedb.com> wrote: >> Now that we have the wonderful latch facility, let's use it to reduce the >> delay between receiving a piece of WAL and applying in the standby. >> Currently, the startup process polls every 100ms to see if new WAL has >> arrived, which adds an average a 50 ms delay between a transaction commit in >> the master and it appearing as committed in a hot standby server. The latch >> patch eliminated a similar polling delay in walsender already, the attached >> patch does the same for walreceiver. >> >> After this patch, there is no unnecessary delays in the streaming >> replication code path. Note that this is all still asynchronous, just with >> reduced latency. >> >> This is pretty straightforward, but any comments? > > Is that supposed to be waiting 5000ms? Yes, it gets interrupted as soon as WAL arrives, that timeout is to poll for the standby trigger file to appear or SIGTERM. BTW, I noticed that I missed incrementing the latch count in win32_latch.c, and the owning/disowning the latch was done correctly, you get an error if you restart the master and reconnect. I'll post an updated patch shortly. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 13/09/10 14:54, Heikki Linnakangas wrote: > BTW, I noticed that I missed incrementing the latch count in > win32_latch.c, and the owning/disowning the latch was done correctly, > you get an error if you restart the master and reconnect. I'll post an > updated patch shortly. Here's an updated patch with those bugs fixed. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Attachment
On Mon, Sep 13, 2010 at 9:13 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > Here's an updated patch with those bugs fixed. Great! + /* + * Walreceiver sets this latch every time new WAL has been received and + * fsync'd to disk, allowing startup process to wait for new WAL to + * arrive. + */ + Latch receivedLatch; I think that this latch should be available for other than walreceiver - startup process communication. For example, backend - startup process communication, which can be used for requesting a failover via SQL function by users in the future. What about putting the latch in XLogCtl instead of WalRcv and calling OwnLatch at the beginning of the startup process instead of RequestXLogStreaming? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
On 14/09/10 05:02, Fujii Masao wrote: > + /* > + * Walreceiver sets this latch every time new WAL has been received and > + * fsync'd to disk, allowing startup process to wait for new WAL to > + * arrive. > + */ > + Latch receivedLatch; > > I think that this latch should be available for other than walreceiver - > startup process communication. For example, backend - startup process > communication, which can be used for requesting a failover via SQL function > by users in the future. What about putting the latch in XLogCtl instead of > WalRcv and calling OwnLatch at the beginning of the startup process instead > of RequestXLogStreaming? Yes, good point. I updated the patch along those lines, attached. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Attachment
On Tue, Sep 14, 2010 at 5:51 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > On 14/09/10 05:02, Fujii Masao wrote: >> >> + /* >> + * Walreceiver sets this latch every time new WAL has been >> received and >> + * fsync'd to disk, allowing startup process to wait for new WAL >> to >> + * arrive. >> + */ >> + Latch receivedLatch; >> >> I think that this latch should be available for other than walreceiver - >> startup process communication. For example, backend - startup process >> communication, which can be used for requesting a failover via SQL >> function >> by users in the future. What about putting the latch in XLogCtl instead of >> WalRcv and calling OwnLatch at the beginning of the startup process >> instead >> of RequestXLogStreaming? > > Yes, good point. I updated the patch along those lines, attached. Looks good. + /* + * Take ownership of the wakup latch if we're going to sleep during + * recovery. + */ + if (StandbyMode) + OwnLatch(&XLogCtl->recoveryWakeupLatch); Since automatic restart after backend crash always performs a normal crash recovery, the startup process will never call OwnLatch more than once. So there might be no harm even if the startup process doesn't disown the shared latch. But... what about calling DisownLatch at the end of recovery just in case? Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center