Re: Physical replication slot advance is not persistent - Mailing list pgsql-hackers
From | Alexey Kondratov |
---|---|
Subject | Re: Physical replication slot advance is not persistent |
Date | |
Msg-id | ada09c82-df58-8204-9296-283514adeb6e@postgrespro.ru Whole thread Raw |
In response to | Re: Physical replication slot advance is not persistent (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
Responses |
Re: Physical replication slot advance is not persistent
|
List | pgsql-hackers |
On 25.12.2019 07:03, Kyotaro Horiguchi wrote: > At Tue, 24 Dec 2019 20:12:32 +0300, Alexey Kondratov <a.kondratov@postgrespro.ru> wrote in >> I dig into the code and it happens because of this if statement: >> >> /* Update the on disk state when lsn was updated. */ >> if (XLogRecPtrIsInvalid(endlsn)) >> { >> ReplicationSlotMarkDirty(); >> ReplicationSlotsComputeRequiredXmin(false); >> ReplicationSlotsComputeRequiredLSN(); >> ReplicationSlotSave(); >> } > Yes, it seems just broken. > >> Attached is a small patch, which fixes this bug. I have tried to >> stick to the same logic in this 'if (XLogRecPtrIsInvalid(endlsn))' >> and now pg_logical_replication_slot_advance and >> pg_physical_replication_slot_advance return InvalidXLogRecPtr if >> no-op. >> >> What do you think? > I think we shoudn't change the definition of > pg_*_replication_slot_advance since the result is user-facing. Yes, that was my main concern too. OK. > The functions return a invalid value only when the slot had the > invalid value and failed to move the position. I think that happens > only for uninitalized slots. > > Anyway what we should do there is dirtying the slot when the operation > can be assumed to have been succeeded. > > As the result I think what is needed there is just checking if the > returned lsn is equal or larger than moveto. Doen't the following > change work? > > - if (XLogRecPtrIsInvalid(endlsn)) > + if (moveto <= endlsn) Yep, it helps with physical replication slot persistence after advance, but the whole validation (moveto <= endlsn) does not make sense for me. The value of moveto should be >= than minlsn == confirmed_flush / restart_lsn, while endlsn == retlsn is also always initialized with confirmed_flush / restart_lsn. Thus, your condition seems to be true in any case, even if it was no-op one, which we were intended to catch. Actually, if we do not want to change pg_*_replication_slot_advance, we can just add straightforward validation that either confirmed_flush, or restart_lsn changed after slot advance guts execution. It will be a little bit bulky, but much more clear and will never be affected by pg_*_replication_slot_advance logic change. Another weird part I have found is this assignment inside pg_logical_replication_slot_advance: /* Initialize our return value in case we don't do anything */ retlsn = MyReplicationSlot->data.confirmed_flush; It looks redundant, since later we do the same assignment, which should be reachable in any case. I will recheck everything again and try to come up with something during this week. Regards -- Alexey Kondratov Postgres Professional https://www.postgrespro.com Russian Postgres Company
pgsql-hackers by date: