Re: Avoiding data loss with synchronous replication - Mailing list pgsql-hackers
From | Andrey Borodin |
---|---|
Subject | Re: Avoiding data loss with synchronous replication |
Date | |
Msg-id | D46D857F-5465-4688-BD6C-280942D28C39@yandex-team.ru Whole thread Raw |
In response to | Re: Avoiding data loss with synchronous replication ("Bossart, Nathan" <bossartn@amazon.com>) |
Responses |
Re: Avoiding data loss with synchronous replication
|
List | pgsql-hackers |
> 23 июля 2021 г., в 22:54, Bossart, Nathan <bossartn@amazon.com> написал(а): > > On 7/23/21, 4:33 AM, "Andrey Borodin" <x4mmm@yandex-team.ru> wrote: >> Thanks for you interest in the topic. I think in the thread [0] we almost agreed on general design. >> The only left question is that we want to threat pg_ctl stop and kill SIGTERM differently to pg_terminate_backend(). > > I didn't get the idea that there was a tremendous amount of support > for the approach to block canceling waits for synchronous replication. > FWIW this was my initial approach as well, but I've been trying to > think of alternatives. > > If we can gather support for some variation of the block-cancels > approach, I think that would be preferred over my proposal from a > complexity standpoint. Let's clearly enumerate problems of blocking. It's been mentioned that backend is not responsive when cancelation is blocked. But on the contrary, it's very responsive. postgres=# alter system set synchronous_standby_names to 'bogus'; ALTER SYSTEM postgres=# alter system set synchronous_commit_cancelation TO off ; ALTER SYSTEM postgres=# select pg_reload_conf(); 2021-07-24 15:35:03.054 +05 [10452] LOG: received SIGHUP, reloading configuration files l --- t (1 row) postgres=# begin; BEGIN postgres=*# insert into t1 values(0); INSERT 0 1 postgres=*# commit ; ^CCancel request sent WARNING: canceling wait for synchronous replication requested, but cancelation is not allowed DETAIL: The COMMIT record has already flushed to WAL locally and might not have been replicated to the standby. We mustwait here. ^CCancel request sent WARNING: canceling wait for synchronous replication requested, but cancelation is not allowed DETAIL: The COMMIT record has already flushed to WAL locally and might not have been replicated to the standby. We mustwait here. It tells clearly what's wrong. If it's still not enough, let's add hint about synchronous standby names. Are there any other problems with blocking cancels? > Robert's idea to provide a way to understand > the intent of the cancellation/termination request [0] could improve > matters. Perhaps adding an argument to pg_cancel/terminate_backend() > and using different signals to indicate that we want to cancel the > wait would be something that folks could get on board with. Semantics of cancelation assumes correct query interruption. This is not possible already when we committed locally. Therecannot be any correct cancelation. And I don't think it worth to add incorrect cancelation. Interestingly, converting transaction to 2PC is a neat idea when the backend is terminated. It provides more guaranties thattransaction will commit correctly even after restart. But we may be short of max_prepared_xacts slots... Anyway backend termination bothers me a lot less than cancelation - drivers do not terminate queries on their own. But theycancel queries by default. Thanks! Best regards, Andrey Borodin.
pgsql-hackers by date: