RE: Initial Schema Sync for Logical Replication - Mailing list pgsql-hackers

From Kumar, Sachin
Subject RE: Initial Schema Sync for Logical Replication
Date
Msg-id a01ec64c94a5481d9e9508f95f18b709@amazon.com
Whole thread Raw
In response to Re: Initial Schema Sync for Logical Replication  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
> From: Masahiko Sawada <sawada.mshk@gmail.com>
> So I've implemented a different approach; doing schema synchronization at a
> CREATE SUBSCRIPTION time. The backend executing CREATE SUBSCRIPTION
> uses pg_dump and restores the table schemas including both partitioned tables
> and their partitions regardless of publish_via_partition_root option, and then
> creates pg_subscription_rel entries for tables while respecting
> publish_via_partition_root option.
> 
> There is a window between table creations and the tablesync workers starting to
> process the tables. If DDLs are executed in this window, the tablesync worker
> might fail because the table schema might have already been changed. We need
> to mention this note in the documentation. BTW, I think we will be able to get
> rid of this downside if we support DDL replication. DDLs executed in the window
> are applied by the apply worker and it takes over the data copy to the tablesync
> worker at a certain LSN.

I don’t think even with DDL replication we will be able to get rid of this window. 
There are some issues
1. Even with tablesync worker taking over at certain LSN, publisher can make more changes till
Table sync acquires lock on publisher table via copy table.
2. how we will make sure that applier worker has caught up will all the changes from publisher
Before it starts tableSync worker. It can be lag behind publisher.

I think the easiest option would be to just recreate the table , this way we don’t have to worry about 
complex race conditions, tablesync already makes a slot for copy data we can use same slot for 
getting upto date table definition, dropping the table won't be much expensive since there won't be any data
in it.Apply worker will skip all the DDLs/DMLs till table is synced.

Although for partitioned tables we will be able to keep with published table schema changes only when 
publish_by_partition_root is true.

Regards
Sachin
Amazon Web Services: https://aws.amazon.com

pgsql-hackers by date:

Previous
From: Yugo NAGATA
Date:
Subject: Re: pg_column_toast_chunk_id: a function to get a chunk ID of a TOASTed value
Next
From: Peter Smith
Date:
Subject: Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication