Re: Excessive number of replication slots for 12->14 logical replication - Mailing list pgsql-bugs

From Masahiko Sawada
Subject Re: Excessive number of replication slots for 12->14 logical replication
Date
Msg-id CAD21AoBGXi9nFsQTvgF-NfqcvoGOA9SrgSRwoEdGevEHvTm53g@mail.gmail.com
Whole thread Raw
In response to Re: Excessive number of replication slots for 12->14 logical replication  (Ajin Cherian <itsajin@gmail.com>)
Responses Re: Excessive number of replication slots for 12->14 logical replication
Re: Excessive number of replication slots for 12->14 logical replication
List pgsql-bugs
On Mon, Aug 1, 2022 at 3:46 PM Ajin Cherian <itsajin@gmail.com> wrote:
>
> On Mon, Aug 1, 2022 at 11:50 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Here is my review comment for v4-0001.
> >
> > (Just a nitpick about comments)
> >
> > ======
> >
> > 1. src/backend/replication/logical/tablesync.c - process_syncing_tables_for_sync
> >
> > There are really just 2 main things that are being done for the cleanup here:
> > - Drop the origin tracking
> > - Drop the tablesync slot
> >
> > So, IMO the code will have more clarity by having one bigger comment
> > for each of those drops, instead of commenting every step separately.
> >
> > e.g.
> >
> > BEFORE
> > /*
> > * Cleanup the tablesync origin tracking if exists.
> > */
> > ReplicationOriginNameForTablesync(MyLogicalRepWorker->subid,
> >   MyLogicalRepWorker->relid,
> >   originname,
> >   sizeof(originname));
> >
> > /*
> > * Reset the origin session before dropping.
> > *
> > * This is required to reset the ownership of the slot
> > * and allow the origin to be dropped.
> > */
> > replorigin_session_reset();
> >
> > replorigin_session_origin = InvalidRepOriginId;
> > replorigin_session_origin_lsn = InvalidXLogRecPtr;
> > replorigin_session_origin_timestamp = 0;
> >
> > /*
> > * There is a chance that the user is concurrently performing
> > * refresh for the subscription where we remove the table
> > * state and its origin and by this time the origin might be
> > * already removed. So passing missing_ok = true.
> > */
> > replorigin_drop_by_name(originname, true, false);
> >
> > /* Cleanup the tablesync slot. */
> > ReplicationSlotNameForTablesync(MyLogicalRepWorker->subid,
> > MyLogicalRepWorker->relid,
> > syncslotname,
> > sizeof(syncslotname));
> >
> > /*
> > * It is important to give an error if we are unable to drop the slot,
> > * otherwise, it won't be dropped till the corresponding subscription
> > * is dropped. So passing missing_ok = false.
> > */
> > ReplicationSlotDropAtPubNode(LogRepWorkerWalRcvConn, syncslotname, false);
> >
> >
> > SUGGESTION
> >
> > /*
> > * Cleanup the tablesync origin tracking.
> > *
> > * Resetting the origin session removes the ownership of the slot.
> > * This is needed to allow the origin to be dropped.
> > *
> > * Also, there is a chance that the user is concurrently performing
> > * refresh for the subscription where we remove the table
> > * state and its origin and by this time the origin might be
> > * already removed. So passing missing_ok = true.
> > */
> > ReplicationOriginNameForTablesync(MyLogicalRepWorker->subid,
> >   MyLogicalRepWorker->relid,
> >   originname,
> >   sizeof(originname));
> > replorigin_session_reset();
> > replorigin_session_origin = InvalidRepOriginId;
> > replorigin_session_origin_lsn = InvalidXLogRecPtr;
> > replorigin_session_origin_timestamp = 0;
> >
> > replorigin_drop_by_name(originname, true, false);
> >
> > /*
> > * Cleanup the tablesync slot.
> > *
> > * It is important to give an error if we are unable to drop the slot,
> > * otherwise, it won't be dropped till the corresponding subscription
> > * is dropped. So passing missing_ok = false.
> > */
> > ReplicationSlotNameForTablesync(MyLogicalRepWorker->subid,
> > MyLogicalRepWorker->relid,
> > syncslotname,
> > sizeof(syncslotname));
> > ReplicationSlotDropAtPubNode(LogRepWorkerWalRcvConn, syncslotname, false);
> >
> Updated.

Thank you for working on this. I have a comment and a question:

                  * This has to be done after updating the state
because otherwise if
                  * there is an error while doing the database
operations we won't be
-                 * able to rollback dropped slot.
+                 * able to rollback dropped slot or origin tracking.

I think we can actually roll back dropping the replication origin. So
the above comment is true for only replication slots.

---
+       replorigin_session_reset();
+       replorigin_session_origin = InvalidRepOriginId;
+       replorigin_session_origin_lsn = InvalidXLogRecPtr;
+       replorigin_session_origin_timestamp = 0;
+
+       replorigin_drop_by_name(originname, true, false);

With this change, the committing the ongoing transaction will be done
without replication origin. Is this okay? it's probably okay, but
since tablesync worker commits other changes with replication origin
I'm concerned a bit there might be a corner case.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/



pgsql-bugs by date:

Previous
From: "Ravulapati, Gautham"
Date:
Subject: ERROR: unterminated dollar-quoted string at or near "$$"
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: [PATCH] BUG FIX: inconsistent page found in BRIN_REGULAR_PAGE