Home > mailing lists

Re: Applying logical replication changes by more than one process - Mailing list pgsql-hackers

From	konstantin knizhnik
Subject	Re: Applying logical replication changes by more than one process
Date	March 22, 2016 06:32:37
Msg-id	015FC6E9-B213-4993-AB5B-3D3E4032A13A@postgrespro.ru Whole thread Raw
In response to	Re: Applying logical replication changes by more than one process (Petr Jelinek <petr@2ndquadrant.com>)
Responses	Re: Applying logical replication changes by more than one process Re: Applying logical replication changes by more than one process
List	pgsql-hackers

Tree view

On Mar 21, 2016, at 4:30 PM, Petr Jelinek wrote:

On 21/03/16 14:25, Andres Freund wrote:
On 2016-03-21 14:18:27 +0100, Petr Jelinek wrote:
On 21/03/16 14:15, Andres Freund wrote:
Only when the origin is actually setup for the current session. You
need
to call the replorigin_advance yourself from your apply code.

That's problematic from a durability POV.

Huh? How come?

If you use the session mechanism the replication progress is synced with
the apply process, even if there are crashes. Crash recovery updates the
progress. There's no such interlock with apply otherwise, and I don't
see how you can build one with reasonable effort.

Ah you mean because with wal_log=true the origin advance is in different WAL record than commit? OK yeah you might be one transaction behind then, true.

It actually means that we can not enforce database consistency. If we do replorigin_advance before commit and then crash happen, then we will loose some changes.

If we call replorigin_advance after commit but crash happen before, then some changes can be applied multiple times. For example we can insert some record twice (if there are no unique constraints).

Look likes the only working scenario is to setup replication session for each commit and use locking to prevent concurrent session setup for the same slot by multiple process, doesn't it?

I have tried it, fortunately it doesn't cause any noticeable performance degradation. But unfortunately can't consider such approach as elegant.

Why it is actually necessary to bind replication slot to process? Why it is not possible to have multiple concurrent sessions for the same slot?

Also I concern about using sequential search for slot location in replorigin_session_setup and many other functions - there is loop through all max_replication_slots.

It seems to be not a problem when number of slots is less than 10. For multimaster this assumption is true - even Oracle RAC rarely has two-digit number of nodes.

But if we want to perform sharding and use logical replication for providing redundancy, then number of nodes and slots can be essentially larger.

I didn't think much about such configuration - may be it possible to propose more efficient mechanism for replication in this case.

--
Petr Jelinek http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

From: Jeff Janes
Date: 22 March 2016, 06:11:28
Subject: trivial typo in vacuum progress doc

From: Fabien COELHO
Date: 22 March 2016, 06:35:15
Subject: Re: checkpointer continuous flushing

Re: Applying logical replication changes by more than one process - Mailing list pgsql-hackers

Previous

Next