Help diagnosing replication (copy) error - Mailing list pgsql-general

From Steve Baldwin
Subject Help diagnosing replication (copy) error
Date
Msg-id CAKE1Aib-6yvpd1mvn02haEr=EHYO2Hoq1BLQcw3JE5Ob95jZYw@mail.gmail.com
Whole thread Raw
Responses Re: Help diagnosing replication (copy) error
Re: Help diagnosing replication (copy) error
List pgsql-general
Hi,

I'm in the process of migrating a cluster from 15.3 to 16.2. We have a 'zero downtime' requirement so I'm using logical replication to create the new cluster and then perform the switch in the application.

I have a situation where all but one table have done their initial copy. The remaining table is the largest (of course), and the replication slot that is assigned for the copy (pg_378075177_sync_60067_7343845372910323059) is showing as 'active=false' if I select from pg_replication_slots on the publisher.

I've checked the recent logs for both the publishing cluster and the subscribing cluster but I can't see any replication errors. I guess I could have missed them, but it doesn't seem like anything is being 'retried' like I've seen in the past with replication errors.

I've used this mechanism for zero-downtime upgrades multiple times in the past, and have recently used it to upgrade smaller clusters from 15.x to 16.2 without issue.

The clusters are hosted on AWS RDS, so I have no access to the servers, but if that's the only way to diagnose the issue, I can create a support case.

Does anyone have any suggestions as to where I should look for the issue?

Thanks,

Steve

pgsql-general by date:

Previous
From: sud
Date:
Subject: Re: Question related to partitioning with pg_partman
Next
From: Adrian Klaver
Date:
Subject: Re: Help diagnosing replication (copy) error