Re: Fix slot synchronization with two_phase decoding enabled - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Fix slot synchronization with two_phase decoding enabled |
Date | |
Msg-id | CAA4eK1LhrjqfFFQY4zc=UVd3Hazv4OMPg9F10NaWpko5hFanPw@mail.gmail.com Whole thread Raw |
In response to | Re: Fix slot synchronization with two_phase decoding enabled (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: Fix slot synchronization with two_phase decoding enabled
|
List | pgsql-hackers |
On Fri, Apr 25, 2025 at 9:57 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Fri, Apr 25, 2025 at 3:43 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Fri, Apr 25, 2025 at 6:02 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > > > I realized that users who create a logical slot using > > > pg_create_logical_replication_slot() would not be able to enable both > > > options at slot creation, and there is no easy way to enable the > > > failover after two_phase-enabled-slot creation. Users would need to > > > use ALTER_REPLICATION_SLOT replication command, which seems > > > unrealistics for users to use. On the other hand, if we allow creating > > > a logical slot with enabling failover and two_phase using SQL API, > > > there is still a chance for this bug to occur. Would it be worth > > > considering that if a logical slot is created with enabling failover > > > and two_phase using SQL API, we create the slot with only > > > two_phase=true, then advance the slot until the slot satisfies > > > restart_lsn >= two_phase_at, and then enable the failover? > > > > > > > This means we either need to maintain somewhere that user has provided > > failover flag till restart_lsn >= two_phase_at or and then set > > failover flag in the slot > > I was thinking of this idea. > > > or initially mark it but enable the > > functionality of failover when we reach the condition restart_lsn >= > > two_phase_at. > > IIUC the slot could be synchronized to the standby as soon as we > complete DecodingContextFindStartpoint() for a failover-enabled slot. > So we would need some mechanisms to make sure that the slot is not > synchronized while we're waiting to reach the condition restart_lsn >= > two_phase_at even if the failover is enabled. > So, then we need any state or persistent flag for this. > > Both seem to have different kinds of problems. The first > > idea seems to have an issue with persistence, which means we can lose > > track of the flag after the restart. > > I think we can do this series of operations while the slot is not > persistent, that is the slot is still RS_EPHEMERAL. > But we still need a persistent flag to indicate such slots shouldn't be synced to standby till we reach the condition restart_lsn >= two_phase_at. > > The second can mislead the user > > for a long period in cases where prepare and commit have a large time > > gap. I feel this will introduce complexity either in the form of code > > or in giving the information to the user. > > Agreed. Both ways introduce complexity so we need to consider the > user-unfriendliness (by not having a proper way to enable failover for > the two_phase-enabled-slot using SQL API) vs. risk (of introducing > complexity). > Right, to me it sounds risky to provide such functionality for SQL API in the back branch. -- With Regards, Amit Kapila.
pgsql-hackers by date: