Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData? - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData? |
Date | |
Msg-id | 20211215.120157.422744426665374116.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData? (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Responses |
Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?
|
List | pgsql-hackers |
At Tue, 14 Dec 2021 19:04:09 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > On Fri, Dec 3, 2021 at 7:39 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > On Wed, Dec 1, 2021 at 9:50 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > I think the PIDs are log-worthy for sure, but it's not clear to me that > > > it is desirable to write them to the persistent state file. In case of > > > crashes, the log should serve just fine to aid root cause investigation > > > -- in fact even better than the persistent file, where the data would be > > > lost as soon as the next client acquires that slot. > > > > Thanks. +1 to log a message at LOG level whenever a replication slot > > becomes active (gets assigned a valid pid to active_pid) and becomes > > inactive(gets assigned 0 to active_pid). Having said that, isn't it > > also helpful if we write a bool (1 byte character) whenever the slot > > becomes active and inactive to the disk? > > Here's the patch that adds a LOG message whenever a replication slot > becomes active and inactive. These logs will be extremely useful on > production servers to debug and analyze inactive replication slot > issues. > > Thoughts? If I create a replication slot, I saw the following lines in server log. [22054:client backend] LOG: replication slot "s1" becomes active [22054:client backend] DETAIL: The process with PID 22054 acquired it. [22054:client backend] STATEMENT: select pg_drop_replication_slot('s1'); [22054:client backend] LOG: replication slot "s1" becomes inactive [22054:client backend] DETAIL: The process with PID 22054 released it. [22054:client backend] STATEMENT: select pg_drop_replication_slot('s1'); They are apparently too much as if they were DEBUG3 lines. The process PID shown is of the process the slot operations took place so I think it conveys no information. The STATEMENT lines are also noisy for non-ERROR emssages. Couldn't we hide that line? That is, how about making the log lines as simple as the follows? [17156:walsender] LOG: acquired replication slot "s1" [17156:walsender] LOG: released replication slot "s1" I think in the first place we don't need this log lines at slot creation since it is actually not acquirement nor releasing of a slot. It behaves in a strange way when executing pg_basebackup. [22864:walsender] LOG: replication slot "pg_basebackup_22864" becomes active [22864:walsender] DETAIL: The process with PID 22864 acquired it. [22864:walsender] STATEMENT: CREATE_REPLICATION_SLOT "pg_basebackup_22864" TEMPORARY PHYSICAL ( RESERVE_WAL) [22864:walsender] LOG: replication slot "pg_basebackup_22864" becomes active [22864:walsender] DETAIL: The process with PID 22864 acquired it. [22864:walsender] STATEMENT: START_REPLICATION SLOT "pg_basebackup_22864" 0/6000000 TIMELINE 1 [22864:walsender] LOG: replication slot "pg_basebackup_22864" becomes inactive [22864:walsender] DETAIL: The process with PID 22864 released it. The slot is acquired twice then released once. It is becuase the patch doesn't emit "becomes inactive" line when releasing a temporary slot. However, I'd rather think we don't need the first 'become active' line like the previous example. @@ -658,6 +690,13 @@ ReplicationSlotDropPtr(ReplicationSlot *slot) slot->active_pid = 0; slot->in_use = false; LWLockRelease(ReplicationSlotControlLock); + + if (pid > 0) + ereport(LOG, + (errmsg("replication slot \"%s\" becomes inactive", + NameStr(slot->data.name)), + errdetail("The process with PID %d released it.", pid))); + This is wrong. I see a "become inactive" message if I droped an "inactive" replication slot. The reason the inactive slot looks as if it were acquired is it is temporarily aquired as a preparing step of dropping. Even assuming that the log lines are simplified to this extent, I still see it a bit strange that the "becomes acitve (or acruied)" message shows alone without having a message like "replication connection accepted". But that would be another issue even if it is true. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: