Logical replication: lost updates/deletes and invalid log messages caused by SnapshotDirty + concurrent updates - Mailing list pgsql-hackers
From | Mihail Nikalayeu |
---|---|
Subject | Logical replication: lost updates/deletes and invalid log messages caused by SnapshotDirty + concurrent updates |
Date | |
Msg-id | CADzfLwXZVmbo11tFS_G2i+6TfFVwHU4VUUSeoqb+8UQfuoJs8A@mail.gmail.com Whole thread Raw |
List | pgsql-hackers |
Hello, everyone! Initially this was discussed in ("[BUG?] check_exclusion_or_unique_constraint false negative")[0], but as Amit recommended [1] I decided to write a dedicated email about it, since the original thread is too deep and has some unrelated discussion. Commitfest entry [2] contains reproducers for most of the issues described below - both injection_point-based and pgbench-based (to ensure such things may happen in real situations). ------ Source of the problem: any scan using SnapshotDirty/SnapshotSelf may miss some logical tuple (as if it does not exist) in case of a parallel update. It happens like this: * A SnapshotDirty heap scan starts * Some page X is already processed by the scan, but page Y is not yet * A logical row is updated by a concurrent transaction: the new version is placed into page X (already visited), the old version located in page Y (not-yet-visited) is marked as deleted * Our scan finally visits page Y and skips the tuple because of the xmax of the committed transaction * Both the first and the second version of the tuple are skipped So, in the MVCC view, the row exists all the time (just in different versions), but from the perspective of SnapshotDirty it does not exist at all at that moment. The same may happen in a BTREE scan, even in the case of a single index page. This is because the btree scan copies all TIDs from an index page before starting to process them. If, after taking such a copy, a new tuple version TID is inserted into that page, the scan will not see it. But the "cached" TID may be marked as deleted by a committed transaction, resulting in zero tuples being detected. ------ Minor consequence: check_exclusion_or_unique_constraint efficiency It may not find a record in a UNIQUE index during INSERT ON CONFLICT UPDATE. However, this is just a minor performance issue - retry logic takes care of it anyway. ------ Major consequence: weird behavior of logical replication in case of parallel updates on the subscriber. Caused by the fact that RelationFindReplTupleByIndex (or less likely RelationFindReplTupleSeq) may miss a target row to perform an action on, resulting in a skipped action. ---case 1: LOST DELETE--- Scenario: race of DELETE on publisher with UPDATE on subscriber Setup: On publisher: CREATE TABLE t(a int PRIMARY key, data text); INSERT INTO t(a, data) VALUES (1,'value_from_publisher'); On subscriber: CREATE TABLE t(a int PRIMARY key, data text); CREATE INDEX data_index ON t(data); -- index is required to prevent HOT Commands: On publisher: DELETE FROM t WHERE a=1; On subscriber: UPDATE t SET data = 'value_from_subscriber' WHERE (a=1); Results: Expected: Tuple is deleted on both subscriber and publisher. Both DELETE-before-UPDATE and UPDATE-before-DELETE result in the absence of the row in the end. Actual: Either as expected, or: Tuple is deleted on publisher, but 'value_from_subscriber' remains on subscriber. ---case 2: LOST UPDATE--- Scenario: UPDATE on publisher vs subscriber-columns only UPDATE on subscriber Setup: On publisher: CREATE TABLE t(a int PRIMARY key, data text); INSERT INTO t(a, data) VALUES (1,'initial_data'); On subscriber: -- note additional subscriber-only column CREATE TABLE t(a int PRIMARY key, data text, sub_only varchar(40) DEFAULT 'default_data'); CREATE INDEX sub_only_index ON t(sub_only); Commands: On publisher: UPDATE t SET data = 'update_from_publisher' WHERE (a=1); On subscriber: UPDATE t SET sub_only = 'update_from_subscriber' WHERE (a=1); Results: Expected: On subscriber: tuple (a=1, data='update_from_publisher', sub_only='update_from_subscriber'). Update on subscriber only affects the subscriber-only column, so I expect publisher-driven columns are replicated. Actual: Either as expected, or: Publisher update is lost, leaving (a=1, data='initial_data', sub_only='update_from_subscriber') on subscriber. ------ Normal consequence: For the same reasons, logical replication may provide invalid log messages (and stats) about conflicts: * correct: update_origin_differs, in logs: update_missing * correct: update_origin_differs, in logs: update_deleted (providing invalid deleted tuple data) * correct: delete_origin_differs, in logs: delete_missing This is not a critical issue, but invalid logs are something no one wants to deal with, especially logs made to debug conflicts. Also, in case of any conflict-resolution mechanics in the future, it may be a critical flaw. ------ Proposed fix: The patch in [2] addresses the issue by rewriting RelationFindReplTupleByIndex/RelationFindReplTupleSeq to use GetLatestSnapshot for each attempt to find the target row. Since it calls GetLatestSnapshot before table_tuple_lock anyway, no performance regression is expected. As a result: * MVCC scan will not miss updated tuples, while DirtyScan may * MVCC scan cannot see not-yet-committed new rows, while DirtyScan can, however in both variants, table_tuple_lock will wait for the updating transaction to commit before retrying the whole scan In any case I think we should at least document such behavior. Best regards, Mikhail. [0]: https://www.postgresql.org/message-id/flat/CANtu0oiktqQ2pwExoXqDpByXNCJa-KE5vQRodTRnmFHN_%2BqwHg%40mail.gmail.com [1]: https://www.postgresql.org/message-id/flat/CAA4eK1LZxzORgAoDhix9MWrOqYOsNZuZLW2sTfGsJFM99yRgrg%40mail.gmail.com#02be86f7e2d24a038878f03ac1b93e95 [2]: https://commitfest.postgresql.org/patch/5151/
pgsql-hackers by date: