Re: Minimal logical decoding on standbys - Mailing list pgsql-hackers
From | Amit Khandekar |
---|---|
Subject | Re: Minimal logical decoding on standbys |
Date | |
Msg-id | CAJ3gD9d4oN6ik+0ORrgVTOoO=Z3jsEGsfLLZTwtvUK1ZpF7YWQ@mail.gmail.com Whole thread Raw |
In response to | Re: Minimal logical decoding on standbys (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Minimal logical decoding on standbys
Re: Minimal logical decoding on standbys |
List | pgsql-hackers |
On Wed, 10 Jul 2019 at 08:44, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > Thanks for the new version! Looks like we're making progress towards > something committable here. > > I think it'd be good to split the patch into a few pieces. I'd maybe do > that like: > 1) WAL format changes (plus required other changes) > 2) Recovery conflicts with slots > 3) logical decoding on standby > 4) tests All right. Will do that in the next patch set. For now, I have quickly done the below changes in a single patch again (attached), in order to get early comments if any. > > > > @@ -589,6 +590,7 @@ gistXLogPageReuse(Relation rel, BlockNumber blkno, TransactionId latestRemovedXi > > */ > > > > /* XLOG stuff */ > > + xlrec_reuse.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel); > > xlrec_reuse.node = rel->rd_node; > > xlrec_reuse.block = blkno; > > xlrec_reuse.latestRemovedXid = latestRemovedXid; > > Hm. I think we otherwise only ever use > RelationIsAccessibleInLogicalDecoding() on tables, not on indexes. And > while I think this would mostly work for builtin catalog tables, it > won't work for "user catalog tables" as RelationIsUsedAsCatalogTable() > won't perform any useful checks for indexes. > > So I think we either need to look up the table, or pass it down. Done. Passed down the heap rel. > > > > diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c > > index d768b9b..10b7857 100644 > > --- a/src/backend/access/heap/heapam.c > > +++ b/src/backend/access/heap/heapam.c > > @@ -7149,12 +7149,13 @@ heap_compute_xid_horizon_for_tuples(Relation rel, > > * see comments for vacuum_log_cleanup_info(). > > */ > > XLogRecPtr > > -log_heap_cleanup_info(RelFileNode rnode, TransactionId latestRemovedXid) > > +log_heap_cleanup_info(Relation rel, TransactionId latestRemovedXid) > > { > > xl_heap_cleanup_info xlrec; > > XLogRecPtr recptr; > > > > - xlrec.node = rnode; > > + xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(rel); > > + xlrec.node = rel->rd_node; > > xlrec.latestRemovedXid = latestRemovedXid; > > > > XLogBeginInsert(); > > @@ -7190,6 +7191,7 @@ log_heap_clean(Relation reln, Buffer buffer, > > /* Caller should not call me on a non-WAL-logged relation */ > > Assert(RelationNeedsWAL(reln)); > > > > + xlrec.onCatalogTable = RelationIsAccessibleInLogicalDecoding(reln); > > It'd probably be a good idea to add a comment to > RelationIsUsedAsCatalogTable() that it better never invoke anything > performing catalog accesses. Otherwise there's quite the danger with > recursion (some operation doing RelationIsAccessibleInLogicalDecoding(), > that then accessing the catalog, which in turn could again need to > perform said operation, loop). Added comments in RelationIsUsedAsCatalogTable() as well as RelationIsAccessibleInLogicalDecoding() : * RelationIsAccessibleInLogicalDecoding * True if we need to log enough information to have access via * decoding snapshot. * This definition should not invoke anything that performs catalog * access. Otherwise, e.g. logging a WAL entry for catalog relation may * invoke this function, which will in turn do catalog access, which may * in turn cause another similar WAL entry to be logged, leading to * infinite recursion. > > /* Entry in pending-list of TIDs we need to revisit */ > > @@ -502,6 +503,7 @@ vacuumRedirectAndPlaceholder(Relation index, Buffer buffer) > > OffsetNumber itemnos[MaxIndexTuplesPerPage]; > > spgxlogVacuumRedirect xlrec; > > > > + xlrec.onCatalogTable = get_rel_logical_catalog(index->rd_index->indrelid); > > xlrec.nToPlaceholder = 0; > > xlrec.newestRedirectXid = InvalidTransactionId; > > We should document that it is safe to do catalog acceses here, because > spgist is never used to back catalogs. Otherwise there would be an a > endless recursion danger here. Comments added. > > Did you check how hard it we to just pass down the heap relation? It does look hard. Check my comments in an earlier reply, that I have pasted below : > This one seems harder, but I'm not actually sure why we make it so > hard. It seems like we just ought to add the table to IndexVacuumInfo. This means we have to add heapRel assignment wherever we initialize IndexVacuumInfo structure, namely in lazy_vacuum_index(), lazy_cleanup_index(), validate_index(), analyze_rel(), and make sure these functions have a heap rel handle. Do you think we should do this as part of this patch ? > > > > /* > > + * Get the wal_level from the control file. > > + */ > > +WalLevel > > +GetActiveWalLevel(void) > > +{ > > + return ControlFile->wal_level; > > +} > > What does "Active" mean here? I assume it's supposed to indicate that it > could be different than what's configured in postgresql.conf, for a > replica? If so, that should be mentioned. Done. Here are the new comments : * Get the wal_level from the control file. For a standby, this value should be * considered as its active wal_level, because it may be different from what * was originally configured on standby. > > > > +/* > > * Initialization of shared memory for XLOG > > */ > > Size > > @@ -9843,6 +9852,19 @@ xlog_redo(XLogReaderState *record) > > /* Update our copy of the parameters in pg_control */ > > memcpy(&xlrec, XLogRecGetData(record), sizeof(xl_parameter_change)); > > > > + /* > > + * Drop logical slots if we are in hot standby and master does not have > > + * logical data. > > nitpick: s/master/the primary/ (mostly adding the "the", but I > personally also prefer primary over master) > > s/logical data/a WAL level sufficient for logical decoding/ > > > > Don't bother to search for the slots if standby is > > + * running with wal_level lower than logical, because in that case, > > + * we would have either disallowed creation of logical slots or dropped > > + * existing ones. > > s/Don't bother/No need/ > s/slots/potentially conflicting logically slots/ Done. > > > + if (InRecovery && InHotStandby && > > + xlrec.wal_level < WAL_LEVEL_LOGICAL && > > + wal_level >= WAL_LEVEL_LOGICAL) > > + ResolveRecoveryConflictWithLogicalSlots(InvalidOid, InvalidTransactionId, > > + gettext_noop("Logical decoding on standby requires wal_level >= logical on master.")); > > > > > diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c > > index 151c3ef..c1bd028 100644 > > --- a/src/backend/replication/logical/decode.c > > +++ b/src/backend/replication/logical/decode.c > > @@ -190,11 +190,23 @@ DecodeXLogOp(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) > > * can restart from there. > > */ > > break; > > + case XLOG_PARAMETER_CHANGE: > > + { > > + xl_parameter_change *xlrec = > > + (xl_parameter_change *) XLogRecGetData(buf->record); > > + /* Cannot proceed if master itself does not have logical data */ > > This needs an explanation as to how this is reachable... Done. Here are the comments : * If wal_level on primary is reduced to less than logical, then we * want to prevent existing logical slots from being used. * Existing logical slot on standby gets dropped when this WAL * record is replayed; and further, slot creation fails when the * wal level is not sufficient; but all these operations are not * synchronized, so a logical slot may creep in while the wal_level * is being reduced. Hence this extra check. > > > > + if (xlrec->wal_level < WAL_LEVEL_LOGICAL) > > + ereport(ERROR, > > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > > + errmsg("logical decoding on standby requires " > > + "wal_level >= logical on master"))); > > + break; > > Hm, this strikes me as a not quite good enough error message (same in > other copies of the message). Perhaps something roughly like "could not > continue with logical decoding, the primary's wal level is now too low > (%u)"? Haven't changed this. There is another reply from Robert. I think what you want to emphasize is that we can't *continue*. I am not sure why user can't infer that the "logical decoding could not continue" when we say "logical decoding requires wal_level >= ...." > > > > if (RecoveryInProgress()) > > - ereport(ERROR, > > - (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > > - errmsg("logical decoding cannot be used while in recovery"))); > > + { > > + /* > > + * This check may have race conditions, but whenever > > + * XLOG_PARAMETER_CHANGE indicates that wal_level has changed, we > > + * verify that there are no existing logical replication slots. And to > > + * avoid races around creating a new slot, > > + * CheckLogicalDecodingRequirements() is called once before creating > > + * the slot, and once when logical decoding is initially starting up. > > + */ > > + if (GetActiveWalLevel() < WAL_LEVEL_LOGICAL) > > + ereport(ERROR, > > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > > + errmsg("logical decoding on standby requires " > > + "wal_level >= logical on master"))); > > + } > > } > > > > /* > > @@ -241,6 +240,8 @@ CreateInitDecodingContext(char *plugin, > > LogicalDecodingContext *ctx; > > MemoryContext old_context; > > > > + CheckLogicalDecodingRequirements(); > > + > > This should reference the above explanation. Done. > > > > > > /* > > + * Permanently drop a conflicting replication slot. If it's already active by > > + * another backend, send it a recovery conflict signal, and then try again. > > + */ > > +static void > > +ReplicationSlotDropConflicting(ReplicationSlot *slot) > > > > +void > > +ResolveRecoveryConflictWithLogicalSlots(Oid dboid, TransactionId xid, > > + char *conflict_reason) > > +{ > > + /* > > + * Build the conflict_str which will look like : > > + * "Slot conflicted with xid horizon which was being increased > > + * to 9012 (slot xmin: 1234, slot catalog_xmin: 5678)." > > + */ > > + initStringInfo(&conflict_xmins); > > + if (TransactionIdIsValid(slot_xmin) && > > + TransactionIdPrecedesOrEquals(slot_xmin, xid)) > > + { > > + appendStringInfo(&conflict_xmins, "slot xmin: %d", slot_xmin); > > + } > > + if (TransactionIdIsValid(slot_catalog_xmin) && > > + TransactionIdPrecedesOrEquals(slot_catalog_xmin, xid)) > > + appendStringInfo(&conflict_xmins, "%sslot catalog_xmin: %d", > > + conflict_xmins.len > 0 ? ", " : "", > > + slot_catalog_xmin); > > + > > + if (conflict_xmins.len > 0) > > + { > > + initStringInfo(&conflict_str); > > + appendStringInfo(&conflict_str, "%s %d (%s).", > > + conflict_sentence, xid, conflict_xmins.data); > > + found_conflict = true; > > + conflict_reason = conflict_str.data; > > + } > > + } > > > I think this is going to be a nightmare for translators, no? For translators, I think the .po files will have the required text, because I have used gettext_noop() for both conflict_sentence and the passed in conflict_reason parameter. And the "dropped conflicting slot." is passed to ereport() as usual. The rest portion of errdetail is not language specific. E.g. "slot" remains "slot". > I'm not clear as to why any of this is needed? The conflict can happen for either xmin or catalog_xmin or both, right ? The purpose of the above is to show only conflicting xmin out of the two. > > > > > + /* ReplicationSlotDropPtr() would acquire the lock below */ > > + LWLockRelease(ReplicationSlotControlLock); > > "would acquire"? I think it *does* acquire, right? Yes, Changed to "will". > > > > > @@ -2879,6 +2882,25 @@ RecoveryConflictInterrupt(ProcSignalReason reason) > > case PROCSIG_RECOVERY_CONFLICT_LOCK: > > case PROCSIG_RECOVERY_CONFLICT_TABLESPACE: > > case PROCSIG_RECOVERY_CONFLICT_SNAPSHOT: > > + case PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT: > > + /* > > + * For conflicts that require a logical slot to be dropped, the > > + * requirement is for the signal receiver to release the slot, > > + * so that it could be dropped by the signal sender. So for > > + * normal backends, the transaction should be aborted, just > > + * like for other recovery conflicts. But if it's walsender on > > + * standby, then it has to be killed so as to release an > > + * acquired logical slot. > > + */ > > + if (am_cascading_walsender && > > + reason == PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT && > > + MyReplicationSlot && SlotIsLogical(MyReplicationSlot)) > > + { > > + RecoveryConflictPending = true; > > + QueryCancelPending = true; > > + InterruptPending = true; > > + break; > > + } > > Huh, I'm not following as to why that's needed for walsenders? For normal backends, we ignore this signal if we aren't in a transaction (block). But for walsender, there is no transaction, but we cannot ignore the signal. This is because walsender can keep a logical slot acquired when it was spawned by "pg_recvlogical --start". So we can't ignore the signal. So the only way that we can make it release the acquired slot is to kill it. > > > > @@ -1499,6 +1499,7 @@ pg_stat_get_db_conflict_all(PG_FUNCTION_ARGS) > > dbentry->n_conflict_tablespace + > > dbentry->n_conflict_lock + > > dbentry->n_conflict_snapshot + > > + dbentry->n_conflict_logicalslot + > > dbentry->n_conflict_bufferpin + > > dbentry->n_conflict_startup_deadlock); > > I think this probably needs adjustments in a few more places, > e.g. monitoring.sgml... Oops, yeah, to search for similar additions, I had looked for "conflict_snapshot" using cscope. I should have done the same using "git grep". Done now. -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company
Attachment
pgsql-hackers by date: