Re: Logical replication timeout problem - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Logical replication timeout problem |
Date | |
Msg-id | CAA4eK1JN9Ary4PTqFdoW==HQdJi_VGb52K=8RyV4QOOMDmjhhg@mail.gmail.com Whole thread Raw |
In response to | Re: Logical replication timeout problem (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Logical replication timeout problem
|
List | pgsql-hackers |
On Wed, Mar 16, 2022 at 7:38 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Wed, Mar 16, 2022 at 11:57 AM wangw.fnst@fujitsu.com > <wangw.fnst@fujitsu.com> wrote: > > > But it really depends on the workload, the server condition, and the > > > timeout value, right? The logical decoding might involve disk I/O much > > > to spill/load intermediate data and the system might be under the > > > high-load condition. Why don't we check both the count and the time? > > > That is, I think we can send a keep-alive either if we skipped 10000 > > > changes or if we didn't sent anything for wal_sender_timeout / 2. > > Yes, you are right. > > Do you mean that when skipping every change, check if it has been more than > > (wal_sender_timeout / 2) without sending anything? > > IIUC, I tried to send keep-alive messages based on time before[1], but after > > testing, I found that it will brings slight overhead. So I am not sure, in a > > function(pgoutput_change) that is invoked frequently, should this kind of > > overhead be introduced? > > > > > Also, the patch changes the current behavior of wal senders; with the > > > patch, we send keep-alive messages even when wal_sender_timeout = 0. > > > But I'm not sure it's a good idea. The subscriber's > > > wal_receiver_timeout might be lower than wal_sender_timeout. Instead, > > > I think it's better to periodically check replies and send a reply to > > > the keep-alive message sent from the subscriber if necessary, for > > > example, every 10000 skipped changes. > > Sorry, I could not follow what you said. I am not sure, do you mean the > > following? > > 1. When we didn't sent anything for (wal_sender_timeout / 2) or we skipped > > 10000 changes continuously, we will invoke the function WalSndKeepalive in the > > function WalSndUpdateProgress, and send a keepalive message to the subscriber > > with requesting an immediate reply. > > 2. If after sending a keepalive message, and then 10000 changes are skipped > > continuously again. In this case, we need to handle the reply from the > > subscriber-side when processing the 10000th change. The handling approach is to > > reply to the confirmation message from the subscriber. > > After more thought, can we check only wal_sender_timeout without > skip-count? That is, in WalSndUpdateProgress(), if we have received > any reply from the subscriber in last (wal_sender_timeout / 2), we > don't need to do anything in terms of keep-alive. If not, we do > ProcessRepliesIfAny() (and probably WalSndCheckTimeOut()?) then > WalSndKeepalivesIfNecessary(). That way, we can send keep-alive > messages every (wal_sender_timeout / 2). And since we don't call them > for every change, we would not need to worry about the overhead much. > But won't that lead to a call to GetCurrentTimestamp() for each change we skip? IIUC from previous replies that lead to a slight slowdown in previous tests of Wang-San. > Actually, WalSndWriteData() does similar things; > That also every time seems to be calling GetCurrentTimestamp(). I think it might be okay when we are sending the change but not sure if the overhead of the same is negligible when we are skipping the changes. -- With Regards, Amit Kapila.
pgsql-hackers by date: