Re: replication slot restart_lsn initialization - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: replication slot restart_lsn initialization |
Date | |
Msg-id | 20150810141254.GF16192@awork2.anarazel.de Whole thread Raw |
In response to | Re: replication slot restart_lsn initialization (Gurjeet Singh <gurjeet@singh.im>) |
Responses |
Re: replication slot restart_lsn initialization
|
List | pgsql-hackers |
On 2015-07-07 09:42:54 -0700, Gurjeet Singh wrote: > --- a/src/backend/replication/slot.c > +++ b/src/backend/replication/slot.c > @@ -40,10 +40,10 @@ > #include <sys/stat.h> > > #include "access/transam.h" > +#include "access/xlog_internal.h" > #include "common/string.h" > #include "miscadmin.h" > #include "replication/slot.h" > -#include "storage/fd.h" > #include "storage/proc.h" > #include "storage/procarray.h" Why did you remove fd.h? The file definitely uses infrastructure from there. We're not terribly consistent about that but I'd rather not rely on it being included via xlog_internal.h -> xlog.h. > /* > + * Grab and save an LSN value to prevent WAL recycling past that point. > + */ > +void > +ReplicationSlotRegisterRestartLSN() > +{ Didn't like that description and function name very much. What does 'grabbing' mean here? Should probably mention that it works on the currently active slot and modifies it. It's now ReplicationSlotReserveWal() > + ReplicationSlot *slot = MyReplicationSlot; > + > + Assert(slot != NULL); > + Assert(slot->data.restart_lsn == InvalidXLogRecPtr); > + > + /* > + * The replication slot mechanism is used to prevent removal of required > + * WAL. As there is no interlock between this and checkpoints required, WAL > + * segment could be removed before ReplicationSlotsComputeRequiredLSN() has > + * been called to prevent that. In the very unlikely case that this happens > + * we'll just retry. > + */ You removed some punctuation in that sentence converting a sentence in bad english into one without the original meaning ;). See the attached for a new version. > + while (true) > + { > + XLogSegNo segno; > + > + /* > + * Let's start with enough information if we can, so log a standby > + * snapshot and start logical decoding at exactly that position. > + */ > + if (!RecoveryInProgress()) > + { > + XLogRecPtr flushptr; > + > + /* start at current insert position */ > + slot->data.restart_lsn = GetXLogInsertRecPtr(); > + > + /* > + * Log an xid snapshot for logical replication. This snapshot is not > + * needed for physical replication, as it relies on the snapshot > + * created by checkpoint when the base backup starts. > + */ > + if (slot->data.database != InvalidOid) > + { > + /* make sure we have enough information to start */ > + flushptr = LogStandbySnapshot(); > + > + /* and make sure it's fsynced to disk */ > + XLogFlush(flushptr); > + } > + } > + else > + slot->data.restart_lsn = GetRedoRecPtr(); > + > + /* prevent WAL removal as fast as possible */ > + ReplicationSlotsComputeRequiredLSN(); > + > + /* > + * If all required WAL is still there, great, otherwise retry. The > + * slot should prevent further removal of WAL, unless there's a > + * concurrent ReplicationSlotsComputeRequiredLSN() after we've written > + * the new restart_lsn above, so normally we should never need to loop > + * more than twice. > + */ > + XLByteToSeg(slot->data.restart_lsn, segno); > + if (XLogGetLastRemovedSegno() < segno) > + break; > + } > +} The way you added the check for logical vs. physical slots in there looks wrong to me. For a physical slot created !InRecovy we'll possibly return a xlog position from the future (it's the insert position *and* not flushed to disk), which then cannot be received. > +/* > * Flush all replication slots to disk. > * > * This needn't actually be part of a checkpoint, but it's a convenient > @@ -876,7 +942,7 @@ StartupReplicationSlots(void) > } > > /* ---- > - * Manipulation of ondisk state of replication slots > + * Manipulation of on-disk state of replication slots > * > * NB: none of the routines below should take any notice whether a slot is the > * current one or not, that's all handled a layer above. > diff --git a/src/backend/replication/slotfuncs.c b/src/backend/replication/slotfuncs.c > index 9a2793f..01b376a 100644 > --- a/src/backend/replication/slotfuncs.c > +++ b/src/backend/replication/slotfuncs.c > @@ -40,6 +40,7 @@ Datum > pg_create_physical_replication_slot(PG_FUNCTION_ARGS) > { > Name name = PG_GETARG_NAME(0); > + bool immediately_reserve = PG_GETARG_BOOL(1); > Datum values[2]; > bool nulls[2]; > TupleDesc tupdesc; > @@ -58,10 +59,28 @@ pg_create_physical_replication_slot(PG_FUNCTION_ARGS) > /* acquire replication slot, this will check for conflicting names */ > ReplicationSlotCreate(NameStr(*name), false, RS_PERSISTENT); > > - values[0] = NameGetDatum(&MyReplicationSlot->data.name); > + if (immediately_reserve) > + { > + /* Allocate restart-LSN, if the user asked for it */ > + ReplicationSlotRegisterRestartLSN(); > + > + /* Write this slot to disk */ > + ReplicationSlotMarkDirty(); > + ReplicationSlotSave(); > > - nulls[0] = false; > - nulls[1] = true; > + values[0] = NameGetDatum(&MyReplicationSlot->data.name); > + values[1] = LSNGetDatum(MyReplicationSlot->data.restart_lsn); > + > + nulls[0] = false; > + nulls[1] = false; > + } > + else > + { > + values[0] = NameGetDatum(&MyReplicationSlot->data.name); > + > + nulls[0] = false; > + nulls[1] = true; > + } I moved values[0] = NameGetDatum(&MyReplicationSlot->data.name); nulls[0] = false; to outside the conditional block, there seems no reason to have it in there? I also removed a bunch of unrelated minor cleanups that I plan to commit & backpatch separately. What do you think? Andres
Attachment
pgsql-hackers by date: