Thread: Re: Mention idle_replication_slot_timeout in pg_replication_slots docs

On Wed, Jun 25, 2025 at 9:56 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
> Hi,
>
> The pg_replication_slots documentation mentions only max_slot_wal_keep_size
> as a condition under which the wal_status column can show unreserved or lost.
> However, since commit ac0e33136ab, idle_replication_slot_timeout can also
> cause this behavior when it is set. This has not been documented yet.
> https://www.postgresql.org/docs/devel/view-pg-replication-slots.html
>

+1 to the doc update.

> So, how about updating the documentation to also mention
> idle_replication_slot_timeout as a factor that can cause wal_status to
> become unreserved or lost? Patch attached.
>

Since idle_replication_slot_timeout can only cause wal_status to
become 'lost' and not 'unreserved', perhaps we can reword the sentence
slightly for clarity, suggestion -
"The last two states are seen when max_slot_wal_keep_size is
non-negative and, the 'lost' state may also appear when
idle_replication_slot_timeout is greater than zero."

Please feel free to rephrase if needed.

--
Thanks,
Nisha




On 2025/06/26 15:46, Nisha Moond wrote:
> On Wed, Jun 25, 2025 at 9:56 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>
>> Hi,
>>
>> The pg_replication_slots documentation mentions only max_slot_wal_keep_size
>> as a condition under which the wal_status column can show unreserved or lost.
>> However, since commit ac0e33136ab, idle_replication_slot_timeout can also
>> cause this behavior when it is set. This has not been documented yet.
>> https://www.postgresql.org/docs/devel/view-pg-replication-slots.html
>>
>
> +1 to the doc update.

Thanks for the review!


>> So, how about updating the documentation to also mention
>> idle_replication_slot_timeout as a factor that can cause wal_status to
>> become unreserved or lost? Patch attached.
>>
>
> Since idle_replication_slot_timeout can only cause wal_status to
> become 'lost' and not 'unreserved', perhaps we can reword the sentence
> slightly for clarity, suggestion -
> "The last two states are seen when max_slot_wal_keep_size is
> non-negative and, the 'lost' state may also appear when
> idle_replication_slot_timeout is greater than zero."

I was thinking that when idle_replication_slot_timeout triggers,
the following functions are called, and that wal_status can become
"unreserved" before ReplicationSlotRelease() runs. It's very short
period, though. Am I wrong?

ReplicationSlotMarkDirty();
ReplicationSlotSave();
ReplicationSlotRelease();

Regards,

--
Fujii Masao
NTT DATA Japan Corporation




On Thu, Jun 26, 2025 at 1:33 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2025/06/26 15:46, Nisha Moond wrote:
> > On Wed, Jun 25, 2025 at 9:56 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> >>
> >> Hi,
> >>
> >> The pg_replication_slots documentation mentions only max_slot_wal_keep_size
> >> as a condition under which the wal_status column can show unreserved or lost.
> >> However, since commit ac0e33136ab, idle_replication_slot_timeout can also
> >> cause this behavior when it is set. This has not been documented yet.
> >> https://www.postgresql.org/docs/devel/view-pg-replication-slots.html
> >>
> >
> > +1 to the doc update.
>
> Thanks for the review!
>
>
> >> So, how about updating the documentation to also mention
> >> idle_replication_slot_timeout as a factor that can cause wal_status to
> >> become unreserved or lost? Patch attached.
> >>
> >
> > Since idle_replication_slot_timeout can only cause wal_status to
> > become 'lost' and not 'unreserved', perhaps we can reword the sentence
> > slightly for clarity, suggestion -
> > "The last two states are seen when max_slot_wal_keep_size is
> > non-negative and, the 'lost' state may also appear when
> > idle_replication_slot_timeout is greater than zero."
>
> I was thinking that when idle_replication_slot_timeout triggers,
> the following functions are called, and that wal_status can become
> "unreserved" before ReplicationSlotRelease() runs. It's very short
> period, though. Am I wrong?
>
> ReplicationSlotMarkDirty();
> ReplicationSlotSave();
> ReplicationSlotRelease();
>

Thank you for pointing it out.
You are correct that while the checkpointer is in the process of
invalidating a slot, it sets its PID as the slot’s active_pid. During
this short window, if a user queries pg_replication_slot, the
underlying function pg_get_replication_slots will compute the
wal_status as 'unreserved' for the invalidated slot because the slot
has a valid active_pid.

That said, it's reasonable to mention in the doc that 'unreserved' may
appear when idle_replication_slot_timeout is greater than zero, as
this can indeed happen. So, let's retain the current description.

However, this behavior isn’t specific to
idle_replication_slot_timeout. For example, when a slot is being
invalidated due to a different cause "wal_level_insufficient",
'unreserved' may also briefly appear in wal_status.

The current patch LGTM.

--
Thanks,
Nisha



On Fri, Jun 27, 2025 at 5:40 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2025/06/27 15:32, Nisha Moond wrote:
> > On Thu, Jun 26, 2025 at 1:33 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> >>
> >>
> >>
> >> On 2025/06/26 15:46, Nisha Moond wrote:
> >>> On Wed, Jun 25, 2025 at 9:56 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> The pg_replication_slots documentation mentions only max_slot_wal_keep_size
> >>>> as a condition under which the wal_status column can show unreserved or lost.
> >>>> However, since commit ac0e33136ab, idle_replication_slot_timeout can also
> >>>> cause this behavior when it is set. This has not been documented yet.
> >>>> https://www.postgresql.org/docs/devel/view-pg-replication-slots.html
> >>>>
> >>>
> >>> +1 to the doc update.
> >>
> >> Thanks for the review!
> >>
> >>
> >>>> So, how about updating the documentation to also mention
> >>>> idle_replication_slot_timeout as a factor that can cause wal_status to
> >>>> become unreserved or lost? Patch attached.
> >>>>
> >>>
> >>> Since idle_replication_slot_timeout can only cause wal_status to
> >>> become 'lost' and not 'unreserved', perhaps we can reword the sentence
> >>> slightly for clarity, suggestion -
> >>> "The last two states are seen when max_slot_wal_keep_size is
> >>> non-negative and, the 'lost' state may also appear when
> >>> idle_replication_slot_timeout is greater than zero."
> >>
> >> I was thinking that when idle_replication_slot_timeout triggers,
> >> the following functions are called, and that wal_status can become
> >> "unreserved" before ReplicationSlotRelease() runs. It's very short
> >> period, though. Am I wrong?
> >>
> >> ReplicationSlotMarkDirty();
> >> ReplicationSlotSave();
> >> ReplicationSlotRelease();
> >>
> >
> > Thank you for pointing it out.
> > You are correct that while the checkpointer is in the process of
> > invalidating a slot, it sets its PID as the slot’s active_pid. During
> > this short window, if a user queries pg_replication_slot, the
> > underlying function pg_get_replication_slots will compute the
> > wal_status as 'unreserved' for the invalidated slot because the slot
> > has a valid active_pid.
> >
> > That said, it's reasonable to mention in the doc that 'unreserved' may
> > appear when idle_replication_slot_timeout is greater than zero, as
> > this can indeed happen. So, let's retain the current description.
> >
> > However, this behavior isn’t specific to
> > idle_replication_slot_timeout. For example, when a slot is being
> > invalidated due to a different cause "wal_level_insufficient",
> > 'unreserved' may also briefly appear in wal_status.
>
> Yes, and "lost" can appear for various reasons, including wal_level_insufficient,
> so it seems odd to highlight max_slot_wal_keep_size as the cause of the "lost"
> status in the note. It would probably be better to remove the mention of "lost"
> from that note.
>

+1

> As for "unreserved", it can also occur for different reasons, but typically,
> it happens when max_slot_wal_keep_size is set to a non-negative value.
> So it might make sense to keep the explanation focused just on "unreserved"
> and max_slot_wal_keep_size. For example:
>
> ----------------------
>           <listitem>
>            <para>
>             <literal>unreserved</literal> means that the slot no longer
>             retains the required WAL files and some of them are to be removed at
> -          the next checkpoint.  This state can return
> +          the next checkpoint.  This can occur when
> +          <xref linkend="guc-max-slot-wal-keep-size"/> is set to
> +          a non-negative value.  This state can return
>             to <literal>reserved</literal> or <literal>extended</literal>.
>            </para>
>           </listitem>
>           <listitem>
> ----------------------
>
> What do you think?
>

The change LGTM, only a minor suggestion to add "typically", as “This
can typically occur when…” to indicate that max_slot_wal_keep_size is
one possible reason, not the only one.

>
> Also, I noticed the note that says “If <structfield>restart_lsn</structfield>
> is NULL, this field is null” seems inaccurate. For example, when "wal_removed"
> happens, restart_lsn is NULL but wal_status is "lost". So maybe we should remove
> that note as well?

You're right, the statement is not accurate.
We could rephrase it as: "If <structfield>restart_lsn</structfield> is
NULL, this field is either null or lost." But since 'unreserved' can
also appear briefly during invalidation, it might be better to remove
it altogether.

--
Thanks,
Nisha




On 2025/06/30 20:32, Nisha Moond wrote:
> On Fri, Jun 27, 2025 at 5:40 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>
>>
>>
>> On 2025/06/27 15:32, Nisha Moond wrote:
>>> On Thu, Jun 26, 2025 at 1:33 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2025/06/26 15:46, Nisha Moond wrote:
>>>>> On Wed, Jun 25, 2025 at 9:56 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> The pg_replication_slots documentation mentions only max_slot_wal_keep_size
>>>>>> as a condition under which the wal_status column can show unreserved or lost.
>>>>>> However, since commit ac0e33136ab, idle_replication_slot_timeout can also
>>>>>> cause this behavior when it is set. This has not been documented yet.
>>>>>> https://www.postgresql.org/docs/devel/view-pg-replication-slots.html
>>>>>>
>>>>>
>>>>> +1 to the doc update.
>>>>
>>>> Thanks for the review!
>>>>
>>>>
>>>>>> So, how about updating the documentation to also mention
>>>>>> idle_replication_slot_timeout as a factor that can cause wal_status to
>>>>>> become unreserved or lost? Patch attached.
>>>>>>
>>>>>
>>>>> Since idle_replication_slot_timeout can only cause wal_status to
>>>>> become 'lost' and not 'unreserved', perhaps we can reword the sentence
>>>>> slightly for clarity, suggestion -
>>>>> "The last two states are seen when max_slot_wal_keep_size is
>>>>> non-negative and, the 'lost' state may also appear when
>>>>> idle_replication_slot_timeout is greater than zero."
>>>>
>>>> I was thinking that when idle_replication_slot_timeout triggers,
>>>> the following functions are called, and that wal_status can become
>>>> "unreserved" before ReplicationSlotRelease() runs. It's very short
>>>> period, though. Am I wrong?
>>>>
>>>> ReplicationSlotMarkDirty();
>>>> ReplicationSlotSave();
>>>> ReplicationSlotRelease();
>>>>
>>>
>>> Thank you for pointing it out.
>>> You are correct that while the checkpointer is in the process of
>>> invalidating a slot, it sets its PID as the slot’s active_pid. During
>>> this short window, if a user queries pg_replication_slot, the
>>> underlying function pg_get_replication_slots will compute the
>>> wal_status as 'unreserved' for the invalidated slot because the slot
>>> has a valid active_pid.
>>>
>>> That said, it's reasonable to mention in the doc that 'unreserved' may
>>> appear when idle_replication_slot_timeout is greater than zero, as
>>> this can indeed happen. So, let's retain the current description.
>>>
>>> However, this behavior isn’t specific to
>>> idle_replication_slot_timeout. For example, when a slot is being
>>> invalidated due to a different cause "wal_level_insufficient",
>>> 'unreserved' may also briefly appear in wal_status.
>>
>> Yes, and "lost" can appear for various reasons, including wal_level_insufficient,
>> so it seems odd to highlight max_slot_wal_keep_size as the cause of the "lost"
>> status in the note. It would probably be better to remove the mention of "lost"
>> from that note.
>>
>
> +1

Is this true starting from v16, when logical replication from standby was introduced?
In other words, in v15 and earlier, only max_slot_wal_keep_size could cause
the wal_status to become "unreserved" or "lost"? I'm wondering where to back-patch
this fix to.


>> As for "unreserved", it can also occur for different reasons, but typically,
>> it happens when max_slot_wal_keep_size is set to a non-negative value.
>> So it might make sense to keep the explanation focused just on "unreserved"
>> and max_slot_wal_keep_size. For example:
>>
>> ----------------------
>>            <listitem>
>>             <para>
>>              <literal>unreserved</literal> means that the slot no longer
>>              retains the required WAL files and some of them are to be removed at
>> -          the next checkpoint.  This state can return
>> +          the next checkpoint.  This can occur when
>> +          <xref linkend="guc-max-slot-wal-keep-size"/> is set to
>> +          a non-negative value.  This state can return
>>              to <literal>reserved</literal> or <literal>extended</literal>.
>>             </para>
>>            </listitem>
>>            <listitem>
>> ----------------------
>>
>> What do you think?
>>
>
> The change LGTM, only a minor suggestion to add "typically", as “This
> can typically occur when…” to indicate that max_slot_wal_keep_size is
> one possible reason, not the only one.

OK.


>> Also, I noticed the note that says “If <structfield>restart_lsn</structfield>
>> is NULL, this field is null” seems inaccurate. For example, when "wal_removed"
>> happens, restart_lsn is NULL but wal_status is "lost". So maybe we should remove
>> that note as well?
>
> You're right, the statement is not accurate.
> We could rephrase it as: "If <structfield>restart_lsn</structfield> is
> NULL, this field is either null or lost." But since 'unreserved' can
> also appear briefly during invalidation, it might be better to remove
> it altogether.
I agree with removing the description. Unless I'm missing something,
it has been incorrect since at least v13, so we should back-patch this fix
to all supported versions.

Regards,

--
Fujii Masao
NTT DATA Japan Corporation




On Mon, Jun 30, 2025 at 6:12 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2025/06/30 20:32, Nisha Moond wrote:
> > On Fri, Jun 27, 2025 at 5:40 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> >>
> >>
> >>
> >> On 2025/06/27 15:32, Nisha Moond wrote:
> >>> On Thu, Jun 26, 2025 at 1:33 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 2025/06/26 15:46, Nisha Moond wrote:
> >>>>> On Wed, Jun 25, 2025 at 9:56 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> The pg_replication_slots documentation mentions only max_slot_wal_keep_size
> >>>>>> as a condition under which the wal_status column can show unreserved or lost.
> >>>>>> However, since commit ac0e33136ab, idle_replication_slot_timeout can also
> >>>>>> cause this behavior when it is set. This has not been documented yet.
> >>>>>> https://www.postgresql.org/docs/devel/view-pg-replication-slots.html
> >>>>>>
> >>>>>
> >>>>> +1 to the doc update.
> >>>>
> >>>> Thanks for the review!
> >>>>
> >>>>
> >>>>>> So, how about updating the documentation to also mention
> >>>>>> idle_replication_slot_timeout as a factor that can cause wal_status to
> >>>>>> become unreserved or lost? Patch attached.
> >>>>>>
> >>>>>
> >>>>> Since idle_replication_slot_timeout can only cause wal_status to
> >>>>> become 'lost' and not 'unreserved', perhaps we can reword the sentence
> >>>>> slightly for clarity, suggestion -
> >>>>> "The last two states are seen when max_slot_wal_keep_size is
> >>>>> non-negative and, the 'lost' state may also appear when
> >>>>> idle_replication_slot_timeout is greater than zero."
> >>>>
> >>>> I was thinking that when idle_replication_slot_timeout triggers,
> >>>> the following functions are called, and that wal_status can become
> >>>> "unreserved" before ReplicationSlotRelease() runs. It's very short
> >>>> period, though. Am I wrong?
> >>>>
> >>>> ReplicationSlotMarkDirty();
> >>>> ReplicationSlotSave();
> >>>> ReplicationSlotRelease();
> >>>>
> >>>
> >>> Thank you for pointing it out.
> >>> You are correct that while the checkpointer is in the process of
> >>> invalidating a slot, it sets its PID as the slot’s active_pid. During
> >>> this short window, if a user queries pg_replication_slot, the
> >>> underlying function pg_get_replication_slots will compute the
> >>> wal_status as 'unreserved' for the invalidated slot because the slot
> >>> has a valid active_pid.
> >>>
> >>> That said, it's reasonable to mention in the doc that 'unreserved' may
> >>> appear when idle_replication_slot_timeout is greater than zero, as
> >>> this can indeed happen. So, let's retain the current description.
> >>>
> >>> However, this behavior isn’t specific to
> >>> idle_replication_slot_timeout. For example, when a slot is being
> >>> invalidated due to a different cause "wal_level_insufficient",
> >>> 'unreserved' may also briefly appear in wal_status.
> >>
> >> Yes, and "lost" can appear for various reasons, including wal_level_insufficient,
> >> so it seems odd to highlight max_slot_wal_keep_size as the cause of the "lost"
> >> status in the note. It would probably be better to remove the mention of "lost"
> >> from that note.
> >>
> >
> > +1
>
> Is this true starting from v16, when logical replication from standby was introduced?
> In other words, in v15 and earlier, only max_slot_wal_keep_size could cause
> the wal_status to become "unreserved" or "lost"? I'm wondering where to back-patch
> this fix to.
>

I also think we should back-patch this till v16, since that’s when
additional slot invalidation causes were also introduced(commit
be87200). And since then “max_slot_wal_keep_size” is no longer the
sole reason for “unreserved” or “lost” status.

--
Thanks,
Nisha




On 2025/07/02 16:12, Fujii Masao wrote:
> 
> 
> On 2025/07/01 13:52, Nisha Moond wrote:
>> On Mon, Jun 30, 2025 at 6:12 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>>> Is this true starting from v16, when logical replication from standby was introduced?
>>> In other words, in v15 and earlier, only max_slot_wal_keep_size could cause
>>> the wal_status to become "unreserved" or "lost"? I'm wondering where to back-patch
>>> this fix to.
>>>
>>
>> I also think we should back-patch this till v16, since that’s when
>> additional slot invalidation causes were also introduced(commit
>> be87200). And since then “max_slot_wal_keep_size” is no longer the
>> sole reason for “unreserved” or “lost” status.
> 
> Okay, I've prepared two patches:
> 
> - 0001 removes the incorrect line: "If restart_lsn is NULL, this field is null."
>    This should be back-patched to v13.
> - 0002 updates the description of the wal_status to reflect that max_slot_wal_keep_size
>    is not the only cause of the lost state. This should be back-patched to v16.
> 
> Barrng objections, I will commit these patches.

I've pushed the patches. Thanks!

Regards,

-- 
Fujii Masao
NTT DATA Japan Corporation