Re: Single transaction in the tablesync worker? - Mailing list pgsql-hackers
From | Ajin Cherian |
---|---|
Subject | Re: Single transaction in the tablesync worker? |
Date | |
Msg-id | CAFPTHDaZw5o+wMbv3aveOzuLyz_rqZebXAj59rDKTJbwXFPYgw@mail.gmail.com Whole thread Raw |
In response to | Re: Single transaction in the tablesync worker? (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Single transaction in the tablesync worker?
Re: Single transaction in the tablesync worker? |
List | pgsql-hackers |
On Mon, Feb 1, 2021 at 11:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > I have updated the patch to display WARNING for each of the tablesync > slots during DropSubscription. As discussed, I have moved the drop > slot related code towards the end in AlterSubscription_refresh. Apart > from this, I have fixed one more issue in tablesync code where in > after catching the exception we were not clearing the transaction > state on the publisher, see changes in LogicalRepSyncTableStart. I > have also fixed other comments raised by you. Additionally, I have > removed the test because it was creating the same name slot as the > tablesync worker and tablesync worker removed the same due to new > logic in LogicalRepSyncStart. Earlier, it was not failing because of > the bug in that code which I have fixed in the attached. > I was testing this patch. I had a table on the subscriber which had a row that would cause a PK constraint violation during the table copy. This is resulting in the subscriber trying to rollback the table copy and failing. 2021-02-01 23:28:16.041 EST [23738] LOG: logical replication apply worker for subscription "tap_sub" has started 2021-02-01 23:28:16.051 EST [23740] LOG: logical replication table synchronization worker for subscription "tap_sub", table "tab_rep" has started 2021-02-01 23:28:21.118 EST [23740] ERROR: table copy could not rollback transaction on publisher 2021-02-01 23:28:21.118 EST [23740] DETAIL: The error was: another command is already in progress 2021-02-01 23:28:21.122 EST [8028] LOG: background worker "logical replication worker" (PID 23740) exited with exit code 1 2021-02-01 23:28:21.125 EST [23908] LOG: logical replication table synchronization worker for subscription "tap_sub", table "tab_rep" has started 2021-02-01 23:28:21.138 EST [23908] ERROR: could not create replication slot "pg_16398_sync_16384": ERROR: replication slot "pg_16398_sync_16384" already exists 2021-02-01 23:28:21.139 EST [8028] LOG: background worker "logical replication worker" (PID 23908) exited with exit code 1 2021-02-01 23:28:26.168 EST [24048] LOG: logical replication table synchronization worker for subscription "tap_sub", table "tab_rep" has started 2021-02-01 23:28:34.244 EST [24048] ERROR: table copy could not rollback transaction on publisher 2021-02-01 23:28:34.244 EST [24048] DETAIL: The error was: another command is already in progress 2021-02-01 23:28:34.251 EST [8028] LOG: background worker "logical replication worker" (PID 24048) exited with exit code 1 2021-02-01 23:28:34.254 EST [24337] LOG: logical replication table synchronization worker for subscription "tap_sub", table "tab_rep" has started 2021-02-01 23:28:34.263 EST [24337] ERROR: could not create replication slot "pg_16398_sync_16384": ERROR: replication slot "pg_16398_sync_16384" already exists 2021-02-01 23:28:34.264 EST [8028] LOG: background worker "logical replication worker" (PID 24337) exited with exit code 1 And one more thing I see is that now we error out in PG_CATCH() in LogicalRepSyncTableStart() with the above error and as a result, the tablesync slot is not dropped. Hence causing the slot create to fail in the next restart. I think this can be avoided. We could either attempt a rollback only on specific failures and drop slot prior to erroring out. regards, Ajin Cherian Fujitsu Australia
pgsql-hackers by date: