Re: [PoC] pg_upgrade: allow to upgrade publisher node - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: [PoC] pg_upgrade: allow to upgrade publisher node |
Date | |
Msg-id | CAD21AoBT3BCzafjHvH+=SN23j1DTdNJ=LE_iKDtKwkrGuXF0Sg@mail.gmail.com Whole thread Raw |
In response to | Re: [PoC] pg_upgrade: allow to upgrade publisher node (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: [PoC] pg_upgrade: allow to upgrade publisher node
|
List | pgsql-hackers |
On Tue, Aug 15, 2023 at 12:06 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Aug 15, 2023 at 7:51 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Mon, Aug 14, 2023 at 2:07 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > On Mon, Aug 14, 2023 at 7:57 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > Another idea is (which might have already discussed thoguh) that we check if the latest shutdown checkpoint LSN inthe control file matches the confirmed_flush_lsn in pg_replication_slots view. That way, we can ensure that the slot hasconsumed all WAL records before the last shutdown. We don't need to worry about WAL records generated after starting theold cluster during the upgrade, at least for logical replication slots. > > > > > > > > > > Right, this is somewhat closer to what Patch is already doing. But > > > remember in this case we need to remember and use the latest > > > checkpoint from the control file before the old cluster is started > > > because otherwise the latest checkpoint location could be even updated > > > during the upgrade. So, instead of reading from WAL, we need to change > > > so that we rely on the control file's latest LSN. > > > > Yes, I was thinking the same idea. > > > > But it works for only replication slots for logical replication. Do we > > want to check if no meaningful WAL records are generated after the > > latest shutdown checkpoint, for manually created slots (or non-logical > > replication slots)? If so, we would need to have something reading WAL > > records in the end. > > > > This feature only targets logical replication slots. I don't see a > reason to be different for manually created logical replication slots. > Is there something particular that you think we could be missing? Sorry I was not clear. I meant the logical replication slots that are *not* used by logical replication, i.e., are created manually and used by third party tools that periodically consume decoded changes. As we discussed before, these slots will never be able to pass that confirmed_flush_lsn check. After some thoughts, one thing we might need to consider is that in practice, the upgrade project is performed during the maintenance window and has a backup plan that revert the upgrade process, in case something bad happens. If we require the users to drop such logical replication slots, they cannot resume to use the old cluster in that case, since they would need to create new slots, missing some changes. Other checks in pg_upgrade seem to be compatibility checks that would eventually be required for the upgrade anyway. Do we need to consider this case? For example, we do that confirmed_flush_lsn check for only the slots with pgoutput plugin. > > > > > > Yet another thing I am trying to consider is whether we can allow to > > > upgrade slots from 16 or 15 to later versions. As of now, the patch > > > has the following check: > > > getLogicalReplicationSlots() > > > { > > > ... > > > + /* Check whether we should dump or not */ > > > + if (fout->remoteVersion < 170000) > > > + return; > > > ... > > > } > > > > > > If we decide to use the existing view pg_replication_slots then can we > > > consider upgrading slots from the prior version to 17? Now, if we want > > > to invent any new API similar to pg_replslotdata then we can't do this > > > because it won't exist in prior versions but OTOH using existing view > > > pg_replication_slots can allow us to fetch slot info from older > > > versions as well. So, I think it is worth considering. > > > > I think that without 0001 patch the replication slots will not be able > > to pass the confirmed_flush_lsn check. > > > > Right, but we can think of backpatching the same. Anyway, we can do > that as a separate work by starting a new thread to see if there is a > broader agreement for backpatching such a change. For now, we can > focus on >=v17. > Agreed. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: