Re: Physical replication slot advance is not persistent - Mailing list pgsql-hackers
| From | Kyotaro Horiguchi |
|---|---|
| Subject | Re: Physical replication slot advance is not persistent |
| Date | |
| Msg-id | 20191226.173349.2159296357637287391.horikyota.ntt@gmail.com Whole thread Raw |
| In response to | Re: Physical replication slot advance is not persistent (Alexey Kondratov <a.kondratov@postgrespro.ru>) |
| Responses |
Re: Physical replication slot advance is not persistent
|
| List | pgsql-hackers |
At Wed, 25 Dec 2019 20:28:04 +0300, Alexey Kondratov <a.kondratov@postgrespro.ru> wrote in
> > Yep, it helps with physical replication slot persistence after
> > advance, but the whole validation (moveto <= endlsn) does not make
> > sense for me. The value of moveto should be >= than minlsn ==
> > confirmed_flush / restart_lsn, while endlsn == retlsn is also always
> > initialized with confirmed_flush / restart_lsn. Thus, your condition
> > seems to be true in any case, even if it was no-op one, which we were
> > intended to catch.
...
> If I get it correctly, then we already keep previous slot position in
> the minlsn, so we just have to compare endlsn with minlsn and treat
> endlsn <= minlsn as a no-op without slot state flushing.
I think you're right about the condition. (endlsn cannot be less than
minlsn, though) But I came to think that we shouldn't use locations in
that decision.
> Attached is a patch that does this, so it fixes the bug without
> affecting any user-facing behavior. Detailed comment section and DEBUG
> output are also added. What do you think now?
>
> I have also forgotten to mention that all versions down to 11.0 should
> be affected with this bug.
pg_replication_slot_advance is the only caller of
pg_logical/physical_replication_slot_advacne so there's no apparent
determinant on who-does-what about dirtying and other housekeeping
calculation like *ComputeRequired*() functions, but the current shape
seems a kind of inconsistent between logical and physical.
I think pg_logaical/physical_replication_slot_advance should dirty the
slot if they actually changed anything. And
pg_replication_slot_advance should do the housekeeping if the slots
are dirtied. (Otherwise both the caller function should dirty the
slot in lieu of the two.)
The attached does that.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
From 264b187aac17f0ce10e8748ea86fc967d08a0eb0 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Thu, 26 Dec 2019 17:17:25 +0900
Subject: [PATCH] Make sure to save updated physical slot.
Back to 11, changes of a physical repliation slot made by
pg_replication_slot_advance doesn't survive restart. The cause was
pg_physical_replication_slot_advance forgot to dirty the slot when
updated. Fix it.
---
src/backend/replication/slot.c | 19 +++++++++++++++++++
src/backend/replication/slotfuncs.c | 10 ++++++++--
src/include/replication/slot.h | 1 +
3 files changed, 28 insertions(+), 2 deletions(-)
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 21ae8531b3..40149d0d99 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -672,6 +672,25 @@ ReplicationSlotMarkDirty(void)
SpinLockRelease(&slot->mutex);
}
+/*
+ * Return if the currently acquired slot is dirty.
+ */
+bool
+ReplicationSlotIsDirty(void)
+{
+ bool dirty;
+
+ ReplicationSlot *slot = MyReplicationSlot;
+
+ Assert(MyReplicationSlot != NULL);
+
+ SpinLockAcquire(&slot->mutex);
+ dirty = MyReplicationSlot->dirty;
+ SpinLockRelease(&slot->mutex);
+
+ return dirty;
+}
+
/*
* Convert a slot that's marked as RS_EPHEMERAL to a RS_PERSISTENT slot,
* guaranteeing it will be there after an eventual crash.
diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c
index 6683fc3f9b..7b08ed691b 100644
--- a/src/backend/replication/slotfuncs.c
+++ b/src/backend/replication/slotfuncs.c
@@ -370,6 +370,13 @@ pg_physical_replication_slot_advance(XLogRecPtr moveto)
MyReplicationSlot->data.restart_lsn = moveto;
SpinLockRelease(&MyReplicationSlot->mutex);
retlsn = moveto;
+
+ /*
+ * We don't need to dirty the slot only for the above change, but dirty
+ * this slot for the same reason with
+ * pg_logical_replication_slot_advance.
+ */
+ ReplicationSlotMarkDirty();
}
return retlsn;
@@ -574,9 +581,8 @@ pg_replication_slot_advance(PG_FUNCTION_ARGS)
nulls[0] = false;
/* Update the on disk state when lsn was updated. */
- if (XLogRecPtrIsInvalid(endlsn))
+ if (ReplicationSlotIsDirty())
{
- ReplicationSlotMarkDirty();
ReplicationSlotsComputeRequiredXmin(false);
ReplicationSlotsComputeRequiredLSN();
ReplicationSlotSave();
diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 3a5763fb07..f76b84571f 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -189,6 +189,7 @@ extern void ReplicationSlotRelease(void);
extern void ReplicationSlotCleanup(void);
extern void ReplicationSlotSave(void);
extern void ReplicationSlotMarkDirty(void);
+extern bool ReplicationSlotIsDirty(void);
/* misc stuff */
extern bool ReplicationSlotValidateName(const char *name, int elevel);
--
2.23.0
pgsql-hackers by date: