Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
Date
Msg-id CAA4eK1+VbBxYY-rG4B7Jq6TsYvdjt3d2nbUrs7p9-1sP5mH+sw@mail.gmail.com
Whole thread Raw
In response to Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly  ("Vitaly Davydov" <v.davydov@postgrespro.ru>)
List pgsql-hackers
On Thu, May 29, 2025 at 5:29 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> On Tue, May 27, 2025 at 2:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Yeah, we should be able to change ABI during beta, but I can't comment
> > on the idea of effective_restart_lsn without seeing the patch or a
> > detailed explanation of this idea.
>
> Could you, please, check the patch [1].  It implements this idea
> except it names new field restart_lsn_flushed instead of
> effective_restart_lsn.
>

This appears to be a better direction than the other patch, at least
for HEAD. I noticed a few points while looking at the patch.

1. restart_lsn_flushed: Can we name it as last_saved_restart_lsn based
on existing variable last_saved_confirmed_flush?
2. There are no comments as to why this is considered only for
persistent slots when CheckPointReplicationSlots doesn't have any such
check.
3. Please see if it makes sense to copy it in the copy_replication_slot.

Apart from these, I am not sure if there are still any pending
comments in the thread to be handled for this patch, so please see to
avoid missing anything.

> > Now, you see my point related to restart_lsn computation for logical
> > slots, it is better to also do some analysis of the problem related to
> > xmin I have highlighted in one of my previous emails [1]. I see your
> > response to it, but I feel someone needs to give it a try by writing a
> > test and see the behavior. I am saying because logical slots took
> > precaution of flushing to disk before updating shared values of xmin
> > for a reason, whereas similar precautions are not taken for physical
> > slots, so there could be a problem with that computation as well.
>
> I see LogicalConfirmReceivedLocation() performs correctly while
> updating effective_catalog_xmin only after syncing the slot to the
> disk.  I don't see how effective_xmin gets updates with the logical
> replication progress though.  Could you get me some clue on this,
> please?
>

As per my understanding, for logical slots, effective_xmin is only set
during the initial copy phase (or say if one has to export a
snapshot), after that, its value won't change. Please read the
comments in CreateInitDecodingContext() where we set its value. If you
still have questions about it, we can discuss further.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Dimitrios Apostolou
Date:
Subject: Re: [PING] fallocate() causes btrfs to never compress postgresql files
Next
From: John Naylor
Date:
Subject: Re: Proposal for enabling auto-vectorization for checksum calculations