RE: How can end users know the cause of LR slot sync delays? - Mailing list pgsql-hackers
From | Hayato Kuroda (Fujitsu) |
---|---|
Subject | RE: How can end users know the cause of LR slot sync delays? |
Date | |
Msg-id | OSCPR01MB14966A618A8C61EC3DEE486A4F517A@OSCPR01MB14966.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: How can end users know the cause of LR slot sync delays? (Shlok Kyal <shlok.kyal.oss@gmail.com>) |
Responses |
Re: How can end users know the cause of LR slot sync delays?
|
List | pgsql-hackers |
Dear Shlok, Thanks for creating the patch. Personally I prefer approach2; approach1 cannot indicate the current status of synchronization, it just shows the history. I feel approach2 has more information than approach1. Below are my comments for your patch. 01. ``` + Number of times the slot sync is skipped. ``` "slot sync" should be "slot synchronization". Same thing can be saied for other attributes. 02. ``` + Reason of the last slot sync skip. ``` Possible values must be clarified. 03. ``` + s.slot_sync_skip_count, + s.last_slot_sync_skip, + s.slot_sync_skip_reason, ``` Last line has tab-blank but others have space-blank. 04. ``` +typedef enum SlotSyncSkipReason +{ + SLOT_SYNC_SKIP_NONE, /* No skip */ + SLOT_SYNC_SKIP_STANDBY_BEHIND, /* Standby is behind the remote slot */ + SLOT_SYNC_SKIP_REMOTE_BEHIND, /* Remote slot is behind the local slot */ + SLOT_SYNC_SKIP_NO_CONSISTENT_SNAPSHOT /* Standby could not reach a + * consistent snapshot */ +} SlotSyncSkipReason ``` a. Can we add comment atop the enum? b. SLOT_SYNC_SKIP_STANDBY_BEHIND is misleading; it indicates the standby server has not received WALs required by slots, but enum does not clarify it. How about SLOT_SYNC_SKIP_MISSING_WAL_RECORDS or something? Better naming are very welcome. c s/reach/build/. 05. ``` @@ -646,11 +652,13 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid) remote_slot->name, LSN_FORMAT_ARGS(latestFlushPtr))); - return false; + /* If the slot is not present on the local */ + if (!(slot = SearchNamedReplicationSlot(remote_slot->name, true))) + return false; } ``` Looks like if users try to sync slots via SQL interfaces, the statistics cannot be updated. 06. update_slot_sync_skip_stats ``` + /* Update the slot sync reason */ + SpinLockAcquire(&slot->mutex); + slot->slot_sync_skip_reason = skip_reason; + SpinLockRelease(&slot->mutex); ``` I feel no need to update the reason if the slot->slot_sync_skip_reason and skip_reason are the same. 07. synchronize_one_slot ``` + /* + * If standby is behind remote slot and the synced slot is present on + * local. + */ + if (remote_slot->confirmed_lsn > latestFlushPtr) + { + if (synced) + update_slot_sync_skip_stats(slot, skip_reason); + return false; + } ``` This condition exist in the same function; can we combine? 08. GetSlotSyncSkipReason() Do we have to do pstrdup() here? I found a similar function get_snapbuild_state_desc(), and it does not use. 09. Can you consider a test for added codes? Best regards, Hayato Kuroda FUJITSU LIMITED
pgsql-hackers by date: