Re: Slow catchup of 2PC (twophase) transactions on replica in LR - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Slow catchup of 2PC (twophase) transactions on replica in LR |
Date | |
Msg-id | CAA4eK1KOs3s6syZqUgrd2WvjTz64SGf0ToZcRoPMCKKH+M0YFQ@mail.gmail.com Whole thread Raw |
In response to | Slow catchup of 2PC (twophase) transactions on replica in LR (Давыдов Виталий <v.davydov@postgrespro.ru>) |
Responses |
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
|
List | pgsql-hackers |
On Thu, Feb 22, 2024 at 6:59 PM Давыдов Виталий <v.davydov@postgrespro.ru> wrote: > > I'd like to present and talk about a problem when 2PC transactions are applied quite slowly on a replica during logicalreplication. There is a master and a replica with established logical replication from the master to the replica withtwophase = true. With some load level on the master, the replica starts to lag behind the master, and the lag will beincreasing. We have to significantly decrease the load on the master to allow replica to complete the catchup. Such problemmay create significant difficulties in the production. The problem appears at least on REL_16_STABLE branch. > > To reproduce the problem: > > Setup logical replication from master to replica with subscription parameter twophase = true. > Create some intermediate load on the master (use pgbench with custom sql with prepare+commit) > Optionally switch off the replica for some time (keep load on master). > Switch on the replica and wait until it reaches the master. > > The replica will never reach the master with even some low load on the master. If to remove the load, the replica willreach the master for much greater time, than expected. I tried the same for regular transactions, but such problem doesn'tappear even with a decent load. > > I think, the main proplem of 2PC catchup bad performance - the lack of asynchronous commit support for 2PC. For regulartransactions asynchronous commit is used on the replica by default (subscrition sycnronous_commit = off). It allowsthe replication worker process on the replica to avoid fsync (XLogFLush) and to utilize 100% CPU (the background walwriter or checkpointer will do fsync). I agree, 2PC are mostly used in multimaster configurations with two or more nodeswhich are performed synchronously, but when the node in catchup (node is not online in a multimaster cluster), asynchronouscommit have to be used to speedup the catchup. > I don't see we do anything specific for 2PC transactions to make them behave differently than regular transactions with respect to synchronous_commit setting. What makes you think so? Can you pin point the code you are referring to? > There is another thing that affects on the disbalance of the master and replica performance. When the master executes requestesfrom multiple clients, there is a fsync optimization takes place in XLogFlush. It allows to decrease the numberof fsync in case when a number of parallel backends write to the WAL simultaneously. The replica applies received transactionsin one thread sequentially, such optimization is not applied. > Right, I think for this we need to implement parallel apply. > I see some possible solutions: > > Implement asyncronous commit for 2PC transactions. > Do some hacking with enableFsync when it is possible. > Can you be a bit more specific about what exactly you have in mind to achieve the above solutions? -- With Regards, Amit Kapila.
pgsql-hackers by date: