Home > mailing lists

Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From	Drouvot, Bertrand
Subject	Re: Synchronizing slots from primary to standby
Date	December 5, 2023 08:48:34
Msg-id	40e1d73f-a09d-4608-8efa-750c998c3287@gmail.com Whole thread Raw
In response to	Re: Synchronizing slots from primary to standby (shveta malik <shveta.malik@gmail.com>)
Responses	Re: Synchronizing slots from primary to standby
List	pgsql-hackers

Tree view

Hi,

On 12/5/23 6:08 AM, shveta malik wrote:
> On Mon, Dec 4, 2023 at 10:07 PM Drouvot, Bertrand
> <bertranddrouvot.pg@gmail.com> wrote:
>> Maybe another option could be to have the walreceiver a way to let the slot sync
>> worker knows that it (the walreceiver) was not able to start due to non existing
>> replication slot on the primary? (that way we'd avoid the slot sync worker having
>> to talk to the primary).
> 
> Few points:
> 1) I think if we do it, we should do it in generic way i.e. slotsync
> worker should go to no-op if walreceiver is not able to start due to
> any reason and not only due to invalid primary_slot_name.

Agree.

> 2) Secondly, slotsync worker needs to make sure it has synced the
> slots so far i.e. worker should not go to no-op immediately on seeing
> missing WalRcv process if there are pending slots to be synced.

Agree.

> So the generic way I see to have this optimization is:
> 1) Slotsync worker can use 'WalRcv->pid' to figure out if WalReceiver
> is running or not.

Not sure that would work because the walreceiver keeps try re-starting
and so get a pid before reaching the "could not start WAL streaming: ERROR:  replication slot "XXXX" does not exist"
error.

We may want to add an extra check on walrcv->walRcvState (or should/could be enough by its own).
But walrcv->walRcvState is set to WALRCV_STREAMING way before walrcv_startstreaming().

Wouldn't that make sense to move it once we are sure that
walrcv_startstreaming() returns true and first_stream is true, here?

"
                         if (first_stream)
+                       {
                                 ereport(LOG,
                                                 (errmsg("started streaming WAL from primary at %X/%X on timeline %u",
                                                                 LSN_FORMAT_ARGS(startpoint), startpointTLI)));
+                               SpinLockAcquire(&walrcv->mutex);
+                               walrcv->walRcvState = WALRCV_STREAMING;
+                               SpinLockRelease(&walrcv->mutex);
+                       }
"

> 2) Slotsync worker should check null 'WalRcv->pid' only when
> no-activity is observed for threshold time i.e. it can do it during
> existing logic of increasing naptime.
> 3) On finding null  'WalRcv->pid', worker can mark a flag to go to
> no-op unless WalRcv->pid becomes valid again. Marking this flag during
> increasing naptime will guarantee that the worker has taken all the
> changes so far i.e. standby is not lagging in terms of slots.
> 

2) and 3) looks good to me but with a check on walrcv->walRcvState
looking for WALRCV_STREAMING state instead of looking for a non null
WalRcv->pid.

And only if it makes sense to move the walrcv->walRcvState = WALRCV_STREAMING as
mentioned above (I think it does).

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

pgsql-hackers by date:

From: Nikita Malakhov
Date: 05 December 2023, 08:38:58
Subject: Re: Avoid detoast overhead when possible

From: zhihuifan1213@163.com
Date: 05 December 2023, 08:54:59
Subject: Re: Avoid detoast overhead when possible

Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

Previous

Next