Re: Disallow cancellation of waiting for synchronous replication - Mailing list pgsql-hackers
From | Andrey Borodin |
---|---|
Subject | Re: Disallow cancellation of waiting for synchronous replication |
Date | |
Msg-id | 323FC6A9-2DDA-44EF-AAF5-B18A161E2735@yandex-team.ru Whole thread Raw |
In response to | Re: Disallow cancellation of waiting for synchronous replication (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Disallow cancellation of waiting for synchronous replication
|
List | pgsql-hackers |
> 2 янв. 2020 г., в 19:13, Robert Haas <robertmhaas@gmail.com> написал(а): > > On Sun, Dec 29, 2019 at 4:13 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote: >> Not loosing data - is a nice property of the database either. > > Sure, but there's more than one way to fix that problem, as I pointed > out in my first response. Sorry, it took some more reading iterations of your message for me to understand the problem you are writing about. You proposed two solutions: 1. Client analyze warning an understand that data is not actually committed. This, as you pointed out, does not solve theproblem: data is lost for another client, who never saw the warning. Actually, "client" is a stateless number of connections unable to communicate with each other by any means beside database.They cannot share information about not committed transactions (they would need a database, thus chicken and theegg problem). 2. Add another message "CANCEL --force" to stop synchronous replication for specific backend. We already have a way to stop synchronous replication "alter system set synchronous_standby_names to 'working.stand.by';select pg_reload_conf();". This will stop it for every backend, but "CANCEL --force" will be more picky. User still can loose data when they issue idempotent query based on data, committed by "CANCEL --force". Moreover, user canloose data if his upsert is based on data committed by someone else with "set synchronous_commit to off". We could fix upserts: make them wait for replication even if nothing was changed, but this will not cover the case when useris doing SELECT and decides not to insert anything. We can fix SELECT: if user asks for synchronous_commit=remote_write - give him snapshot no newer than synchronously committeddata. ISTM this would solve all above problems, but I do not see implications of this approach. We should add allXIDs to XIP if their commit LSN > sync rep LSN. But I'm not sure all other transactional mechanics will be OK with this. From practical point of view - when all writing queries use same synchronous_commit level - easiest solution is to just disallowcancel of sync replication. In psql we can just reset connection on second CTRL+C. That's more generic than "CANCEL--force". When all queries runs with same synchronous_commit there is no point in protocol message for canceling sync rep for singleconnection. Just drop that connection. Ignoring cancel is the only way to satisfy synchronous_commit level, which isconstant for transaction. When queries run in various synchronous_commit - things are much more complicated. Adding protocol message to change synchronous_commitfor running queries does not seems to be a viable option. > I continue to think that the root cause of this issue is that we can't > distinguish between cancelling the query and cancelling the sync rep > wait. Yes, it is. But canceling sync rep wait exists already. Just change synchronous_stanby_names. Canceling sync rep for oneclient - is, effectively, changing synchronous commit level for running transaction. It opens a way for way more difficultcomplications. > The client in this case is asking for both when it really only > wants the former, and then ignoring the warning that the latter is > what actually occurred. Client is not ignoring warnings. Data is lost for the client which never received warning. If we could just fix our code,I would not be making so much noise. There are workarounds, but they are very pleasant to explain. Best regards, Andrey Borodin.
pgsql-hackers by date: