Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption? - Mailing list pgsql-hackers
From | Nikolay Samokhvalov |
---|---|
Subject | Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption? |
Date | |
Msg-id | CAM527d8pSQ3pa1u5GocxpGNuY2yteHhTRKuwGPPUJC0V9Qzf5Q@mail.gmail.com Whole thread Raw |
In response to | Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption? (Michael Banck <mbanck@gmx.net>) |
Responses |
Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption?
Re: pg_upgrade instructions involving "rsync --size-only" might lead to standby corruption? |
List | pgsql-hackers |
On Mon, Jul 10, 2023 at 2:02 PM Michael Banck <mbanck@gmx.net> wrote:
Thanks for that!
Thanks for the fast review.
> I agree with Andrey – without it, we don't have any good way to upgrade
> large clusters in short time. Default rsync mode (without "--size-only")
> takes a lot of time too, if the load is heavy.
>
> With these adjustments, can "rsync --size-only" remain in the docs as the
> *fast* and safe method to upgrade standbys, or there are still some
> concerns related to corruption risks?
I hope somebody can answer that definitively, but I read Stephen's mail
to indicate that this procedure should be safe in principle (if you know
what you are doing).
right, this is my understanding too
> +++ b/doc/src/sgml/ref/pgupgrade.sgml
> @@ -380,22 +380,28 @@ NET STOP postgresql-&majorversion;
> </para>
>
> <para>
> - Streaming replication and log-shipping standby servers can
> + Streaming replication and log-shipping standby servers must
> remain running until a later step.
> </para>
> </step>
>
> - <step>
> + <step id="pgupgrade-step-prepare-standbys">
>
> <para>
> - If you are upgrading standby servers using methods outlined in section <xref
> - linkend="pgupgrade-step-replicas"/>, verify that the old standby
> - servers are caught up by running <application>pg_controldata</application>
> - against the old primary and standby clusters. Verify that the
> - <quote>Latest checkpoint location</quote> values match in all clusters.
> - (There will be a mismatch if old standby servers were shut down
> - before the old primary or if the old standby servers are still running.)
> + If you are upgrading standby servers using methods outlined in
> + <xref linkend="pgupgrade-step-replicas"/>,
You dropped the "section" before the xref, I think that should be kept
around.
Seems to be a problem in discussing source code that looks quite different than the final result.
In the result – the docs – we currently have "section Step 9", looking weird. I still think it's good to remove it. We also have "in Step 17 below" (without the extra word "section") in a different place on the same page.
> + ensure that they were running when
> + you shut down the primaries in the previous step, so all the latest changes
You talk of primaries in plural here, that is a bit weird for PostgreSQL
documentation.
The same docs already discuss two primaries ("8. Stop both primaries"), but I agree it might look confusing if you read only a part of the doc, jumping into middle of it, like I do all the time when using the docs in "check the reference" style.
I agree with this comment, but it tells me we need even more improvements of this doc, beyond my original goal – e.g., I don't like section 8 saying "Make sure both database servers", it should be "both primaries".
> + and the shutdown checkpoint record were received. You can verify this by running
> + <application>pg_controldata</application> against the old primary and standby
> + clusters. The <quote>Latest checkpoint location</quote> values must match in all
> + nodes. A mismatch might occur if old standby servers were shut down before
> + the old primary. To fix a mismatch, start all old servers and return to the
> + previous step; proceeding with mismatched
> + <quote>Latest checkpoint location</quote> may lead to standby corruption.
> + </para>
> +
> + <para>
> Also, make sure <varname>wal_level</varname> is not set to
> <literal>minimal</literal> in the <filename>postgresql.conf</filename> file on the
> new primary cluster.
> @@ -497,7 +503,6 @@ pg_upgrade.exe
> linkend="warm-standby"/>) standby servers, you can follow these steps to
> quickly upgrade them. You will not be running <application>pg_upgrade</application> on
> the standby servers, but rather <application>rsync</application> on the primary.
> - Do not start any servers yet.
> </para>
>
> <para>
> @@ -508,6 +513,15 @@ pg_upgrade.exe
> is running.
> </para>
>
> + <para>
> + Before running rsync, to avoid standby corruption, it is absolutely
> + critical to ensure that both primaries are shut down and standbys
> + have received the last changes (see <xref linkend="pgupgrade-step-prepare-standbys"/>).
I think this should be something like "ensure both that the primary is
shut down and that the standbys have received all the changes".
Well, we have two primary servers – old and new. I tried to clarify it in the new version.
> + Standbys can be running at this point or fully stopped.
"or be fully stopped." I think.
> + If they
> + are still running, you can stop, upgrade, and start them one by one; this
> + can be useful to keep the cluster open for read-only transactions.
> + </para>
Maybe this is clear from the context, but "upgrade" in the above should
maybe more explicitly refer to the rsync method or else people might
think one can run pg_upgrade on them after all?
Maybe. It will require changes in other parts of this doc.
Thinking (here: https://gitlab.com/postgres/postgres/-/merge_requests/18/diffs)
Meanwhile, attached is v2
thanks for the comments
Attachment
pgsql-hackers by date: