Re: long-standing data loss bug in initial sync of logical replication - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: long-standing data loss bug in initial sync of logical replication |
Date | |
Msg-id | CAA4eK1KUoa1yVOXfGUvgA2kVQ2VvfqWU1LSzzeGHM+6=sbcVkw@mail.gmail.com Whole thread Raw |
In response to | Re: long-standing data loss bug in initial sync of logical replication (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: long-standing data loss bug in initial sync of logical replication
|
List | pgsql-hackers |
On Tue, Apr 22, 2025 at 10:57 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Tue, Mar 18, 2025 at 2:55 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Mar 17, 2025 at 4:56 PM Hayato Kuroda (Fujitsu) > > <kuroda.hayato@fujitsu.com> wrote: > > > > > > > Regarding the PG13, it cannot be > > > > applied > > > > as-is thus some adjustments are needed. I will share it in upcoming posts. > > > > > > Here is a patch set for PG13. Apart from PG14-17, the patch could be created as-is, > > > because... > > > > > > 1. WAL record for invalidation messages (XLOG_XACT_INVALIDATIONS) does not exist. > > > 2. Thus the ReorderBufferChange for the invalidation does not exist. > > > Our patch tries to distribute it but cannot be done as-is. > > > 3. Codes assumed that invalidation messages can be added only once. > > > 4. The timing when invalidation messages are consumed is limited: > > > a. COMMAND_ID change is poped, > > > b. start of decoding a transaction, or > > > c. end of decoding a transaction. > > > > > > Above means that invalidations cannot be executed while being decoded. > > > I created two patch sets to resolve the data loss issue. 0001 has less code > > > changes but could resolve a part of issue, 0002 has huge changes but provides a > > > complete solution. > > > > > > 0001 - mostly same as patches for other versions. ReorderBufferAddInvalidations() > > > was adjusted to allow being called several times. As I said above, > > > 0001 cannot execute inval messages while decoding the transacitons. > > > 0002 - introduces new ReorderBufferChange type to indicate inval messages. > > > It would be handled like PG14+. > > > > > > Here is an example. Assuming that the table foo exists on both nodes, a > > > publication "pub" which publishes all tables, and a subscription "sub" which > > > subscribes "pub". What if the workload is executed? > > > > > > ``` > > > S1 S2 > > > BEGIN; > > > INSERT INTO foo VALUES (1) > > > ALTER PUBLICATION pub RENAME TO pub_renamed; > > > INSERT INTO foo VALUES (2) > > > COMMIT; > > > LR -> ? > > > ``` > > > > > > With 0001, tuples (1) and (2) would be replicated to the subscriber. > > > An error "publication "pub" does not exist" would raise when new changes are done > > > later. > > > > > > 0001+0002 works more aggressively; the error would raise when S1 transaction is decoded. > > > The behavior is same as for patched PG14-PG17. > > > > > > > I understand that with 0001 the fix is partial in the sense that > > because invalidations are processed at the transaction end, the > > changes of concurrent DDL will only be reflected for the next > > transaction. Now, on one hand, it is prudent to not add a new type of > > ReorderBufferChange in the backbranch (v13) but the change is not that > > invasive, so we can go with it as well. My preference would be to go > > with just 0001 for v13 to minimize the risk of adding new bugs or > > breaking something unintentionally. > > > > Sawada-San, and others involved here, do you have any suggestions on > > this matter? > > Sorry for the late response. > > I agree with just 0001 for v13 as 0002 seems invasive. Given that v13 > would have only a few releases until EOL and 0001 can deal with some > cases in question, I'd like to avoid such invasive changes in v13. > Fair enough. OTOH, we can leave the 13 branch considering following: (a) it is near EOL, (b) bug happens in rare cases (when the DDLs like ALTER PUBLICATION ... ADD TABLE ... or ALTER TYPE ... that don't take a strong lock on table happens concurrently to DMLs on the tables involved in the DDL.), and (c) the complete fix is invasive, even partial fix is not simple. I have a slight fear that if we make any mistake in fixing it partially (of course, we can't see any today), we may not even get a chance to fix it. Now, if the above convinces you or someone else not to push the partial fix in 13, then fine; otherwise, I'll push the 0001 to 13 day after tomorrow. -- With Regards, Amit Kapila.
pgsql-hackers by date: