Re: Switching XLog source from archive to streaming when primary available - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: Switching XLog source from archive to streaming when primary available |
Date | |
Msg-id | CALj2ACVYVgq+QxeLmhY5DO+D6+7PwerdsVTeBxPuSeFyrgOPtg@mail.gmail.com Whole thread Raw |
In response to | Re: Switching XLog source from archive to streaming when primary available (Nathan Bossart <nathandbossart@gmail.com>) |
Responses |
Re: Switching XLog source from archive to streaming when primary available
|
List | pgsql-hackers |
On Tue, Mar 5, 2024 at 7:34 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > cfbot claims that this one needs another rebase. Yeah, the conflict was with the new TAP test file name in src/test/recovery/meson.build. > I've spent some time thinking about this one. I'll admit I'm a bit worried > about adding more complexity to this state machine, but I also haven't > thought of any other viable approaches, Right. I understand that the WaitForWALToBecomeAvailable()'s state machine is a complex piece. > and this still seems like a useful > feature. So, for now, I think we should continue with the current > approach. Yes, the feature is useful like mentioned in the docs as below: + Reading WAL from archive may not always be as efficient and fast as + reading from primary. This can be due to the differences in disk types, + IO costs, network latencies etc. All of these can impact the recovery + performance on standby, and can increase the replication lag on + primary. In addition, the primary keeps accumulating WAL needed for the + standby while the standby reads WAL from archive, because the standby + replication slot stays inactive. To avoid these problems, one can use + this parameter to make standby switch to stream mode sooner. > + fails to switch to stream mode, it falls back to archive mode. If this > + parameter value is specified without units, it is taken as > + milliseconds. Default is <literal>5min</literal>. With a lower value > > Does this really need to be milliseconds? I would think that any > reasonable setting would at least on the order of seconds. Agreed. Done that way. > + attempts. To avoid this, it is recommended to set a reasonable value. > > I think we might want to suggest what a "reasonable value" is. It really depends on the WAL generation rate on the primary. If the WAL files grow faster, the disk runs out of space sooner, so setting a value to make frequent WAL source switch attempts can help. It's hard to suggest a one-size-fits-all value. Therefore, I've tweaked the docs a bit to reflect the fact that it depends on the WAL generation rate. > + static bool canSwitchSource = false; > + bool switchSource = false; > > IIUC "canSwitchSource" indicates that we are trying to force a switch to > streaming, but we are currently exhausting anything that's present in the > pg_wal directory, Right. > while "switchSource" indicates that we should force a > switch to streaming right now. It's not indicating force switch, it says "previously I was asked to switch source via canSwitchSource, now that I've exhausted all the WAL from the pg_wal directory, I'll make a source switch attempt now". > Furthermore, "canSwitchSource" is static > while "switchSource" is not. This is because the WaitForWALToBecomeAvailable() has to remember the decision (that streaming_replication_retry_interval has occurred) across the calls. And, switchSource is decided within WaitForWALToBecomeAvailable() for every function call. > Is there any way to simplify this? For > example, would it be possible to make an enum that tracks the > streaming_replication_retry_interval state? I guess the way it is right now looks simple IMHO. If the suggestion is to have an enum like below; it looks overkill for just two states. typedef enum { CAN_SWITCH_SOURCE, SWITCH_SOURCE } XLogSourceSwitchState; > /* > * Don't allow any retry loops to occur during nonblocking > - * readahead. Let the caller process everything that has been > - * decoded already first. > + * readahead if we failed to read from the current source. Let the > + * caller process everything that has been decoded already first. > */ > - if (nonblocking) > + if (nonblocking && lastSourceFailed) > return XLREAD_WOULDBLOCK; > > Why do we skip this when "switchSource" is set? It was leftover from the initial version of the patch - I was then encountering some issue and had that piece there. Removed it now. > + /* Reset the WAL source switch state */ > + if (switchSource) > + { > + Assert(canSwitchSource); > + Assert(currentSource == XLOG_FROM_STREAM); > + Assert(oldSource == XLOG_FROM_ARCHIVE); > + switchSource = false; > + canSwitchSource = false; > + } > > How do we know that oldSource is guaranteed to be XLOG_FROM_ARCHIVE? Is > there no way it could be XLOG_FROM_PG_WAL? No. switchSource is set to true only when canSwitchSource is set to true, which happens only when currentSource is XLOG_FROM_ARCHIVE (see SwitchWALSourceToPrimary()). > +#streaming_replication_retry_interval = 5min # time after which standby > + # attempts to switch WAL source from archive to > + # streaming replication > + # in milliseconds; 0 disables > > I think we might want to turn this feature off by default, at least for the > first release. Agreed. Done that way. Please see the attached v21 patch. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
pgsql-hackers by date: