Home > mailing lists

RE: Initial Schema Sync for Logical Replication - Mailing list pgsql-hackers

From	Kumar, Sachin
Subject	RE: Initial Schema Sync for Logical Replication
Date	July 7, 2023 09:16:01
Msg-id	a01ec64c94a5481d9e9508f95f18b709@amazon.com Whole thread Raw
In response to	Re: Initial Schema Sync for Logical Replication (Masahiko Sawada <sawada.mshk@gmail.com>)
List	pgsql-hackers

Tree view

> From: Masahiko Sawada <sawada.mshk@gmail.com>
> So I've implemented a different approach; doing schema synchronization at a
> CREATE SUBSCRIPTION time. The backend executing CREATE SUBSCRIPTION
> uses pg_dump and restores the table schemas including both partitioned tables
> and their partitions regardless of publish_via_partition_root option, and then
> creates pg_subscription_rel entries for tables while respecting
> publish_via_partition_root option.
> 
> There is a window between table creations and the tablesync workers starting to
> process the tables. If DDLs are executed in this window, the tablesync worker
> might fail because the table schema might have already been changed. We need
> to mention this note in the documentation. BTW, I think we will be able to get
> rid of this downside if we support DDL replication. DDLs executed in the window
> are applied by the apply worker and it takes over the data copy to the tablesync
> worker at a certain LSN.

I don’t think even with DDL replication we will be able to get rid of this window. 
There are some issues
1. Even with tablesync worker taking over at certain LSN, publisher can make more changes till
Table sync acquires lock on publisher table via copy table.
2. how we will make sure that applier worker has caught up will all the changes from publisher
Before it starts tableSync worker. It can be lag behind publisher.

I think the easiest option would be to just recreate the table , this way we don’t have to worry about 
complex race conditions, tablesync already makes a slot for copy data we can use same slot for 
getting upto date table definition, dropping the table won't be much expensive since there won't be any data
in it.Apply worker will skip all the DDLs/DMLs till table is synced.

Although for partitioned tables we will be able to keep with published table schema changes only when 
publish_by_partition_root is true.

Regards
Sachin
Amazon Web Services: https://aws.amazon.com

pgsql-hackers by date:

From: Yugo NAGATA
Date: 07 July 2023, 08:30:15
Subject: Re: pg_column_toast_chunk_id: a function to get a chunk ID of a TOASTed value

From: Peter Smith
Date: 07 July 2023, 09:37:54
Subject: Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

RE: Initial Schema Sync for Logical Replication - Mailing list pgsql-hackers

Previous

Next