Re: Parallel Apply - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: Parallel Apply
Date
Msg-id d25177ef-0d1d-4489-9b2d-b6c43ad6677c@garret.ru
Whole thread Raw
In response to Parallel Apply  (Amit Kapila <amit.kapila16@gmail.com>)
Responses RE: Parallel Apply
List pgsql-hackers


On 11/08/2025 7:45 AM, Amit Kapila wrote:
Hi,


4. Triggers and Constraints
For the initial version, exclude tables with user-defined triggers or
constraints from parallel apply due to complexity in dependency
detection. We may need some parallel-apply-safe marking to allow this.

I think that the problem is wider than just triggers and constrains.
Even if database has no triggers and constraints, there still can be causality violations.

If transactions at subscriber are executed in different order than on publisher, then it is possible to observe some "invalid" database state which is never possible at publisher. Assume very simple example: you withdraw some money in ATM from one account and then deposit them to some other account. There are two different transactions. And there are no any dependencies between them (they update different records). But if second transaction is committed before first, then we can view incorrect report where total number of money at all accounts exceeds real balance. Another case is when you persisting some stream of events (with timestamps). It may be confusing if at subscriber monotony of events is violated.

And there can be many other similar situations when tjere are no "direct" data dependencies between transactions, but there are hidden "indirect"dependencies. The most popular case you have mentioned: foreign keys. Certainly support of referential integrity constraints can be added. But there can be such dependencies without correspondent constraints in database schema.

You have also suggested to add option which will force preserving commit order. But my experiments with `debug_logical_replication_streaming=immediate` shows that in this case for short transactions performance with parallel workers is even worser than with single apply worker.

May be it is possible to enforce some weaker commit order: do not try to commit transactions in exactly the same order as at publisher, but if transaction T1 at publisher is started after T2 is committed, then T2 can not be committed before T1 at subscriber. Unfortunately it is not clear how to enforce such "partial order" -  `LogicalRepBeginData` contains `finish_lsn`, but not `start_lsn`.

First time I read your proposal and especially after seen concrete results of it's implementation, I decided than parallel apply approach is definitely better than prefetch approach. But now I am not so sure. Yes, parallel apply is about 2x times faster than parallel prefetch. But still parallel prefetch allows to 2-3 times increase LR speed without causing any problems with deadlock, constraints, triggers,... 




Replication Progress Tracking
-----------------------------------------
Parallel apply introduces out-of-order commit application,
complicating replication progress tracking. To handle restarts and
ensure consistency:

Track Three Key Metrics:
lowest_remote_lsn: Starting point for applying transactions.
highest_remote_lsn: Highest LSN that has been applied.
list_remote_lsn: List of commit LSNs applied between the lowest and highest.

Mechanism:
Store these in ReplicationState: lowest_remote_lsn,
highest_remote_lsn, list_remote_lsn. Flush these to disk during
checkpoints similar to CheckPointReplicationOrigin.

After Restart, Start from lowest_remote_lsn and for each transaction,
if its commit LSN is in list_remote_lsn, skip it, otherwise, apply it.
Once commit LSN > highest_remote_lsn, apply without checking the list.

During apply, the leader maintains list_in_progress_xacts in the
increasing commit order. On commit, update highest_remote_lsn. If
commit LSN matches the first in-progress xact of
list_in_progress_xacts, update lowest_remote_lsn, otherwise, add to
list_remote_lsn. After commit, also remove it from the
list_in_progress_xacts. We need to clean up entries below
lowest_remote_lsn in list_remote_lsn while updating its value.

To illustrate how this mechanism works, consider the following four
transactions:

Transaction ID Commit LSN
501 1000
502 1100
503 1200
504 1300

Assume:
Transactions 501 and 502 take longer to apply whereas transactions 503
and 504 finish earlier. Parallel apply workers are assigned as
follows:
pa-1 → 501
pa-2 → 502
pa-3 → 503
pa-4 → 504

Initial state: list_in_progress_xacts = [501, 502, 503, 504]

Step 1: Transaction 503 commits first and in RecordTransactionCommit,
it updates highest_remote_lsn to 1200. In apply_handle_commit, since
503 is not the first in list_in_progress_xacts, add 1200 to
list_remote_lsn. Remove 503 from list_in_progress_xacts.
Step 2: Transaction 504 commits, Update highest_remote_lsn to 1300.
Add 1300 to list_remote_lsn. Remove 504 from list_in_progress_xacts.
ReplicationState now:
lowest_remote_lsn = 0
list_remote_lsn = [1200, 1300]
highest_remote_lsn = 1300
list_in_progress_xacts = [501, 502]

Step 3: Transaction 501 commits. Since 501 is now the first in
list_in_progress_xacts, update lowest_remote_lsn to 1000. Remove 501
from list_in_progress_xacts. Clean up list_remote_lsn to remove
entries < lowest_remote_lsn (none in this case).
ReplicationState now:
lowest_remote_lsn = 1000
list_remote_lsn = [1200, 1300]
highest_remote_lsn = 1300
list_in_progress_xacts = [502]

Step 4: System crash and restart
Upon restart, Start replication from lowest_remote_lsn = 1000. First
transaction encountered is 502, since it is not present in
list_remote_lsn, apply it. As transactions 503 and 504 are present in
list_remote_lsn, we skip them. Note that each transaction's
end_lsn/commit_lsn has to be compared which the apply worker receives
along with the first transaction command BEGIN. This ensures
correctness and avoids duplicate application of already committed
transactions.

Upon restart, start replication from lowest_remote_lsn = 1000. First
transaction encountered is 502 with commit LSN 1100, since it is not
present in list_remote_lsn, apply it. As transactions 503 and 504's
respective commit LSNs [1200, 1300] are present in list_remote_lsn, we
skip them. This ensures correctness and avoids duplicate application
of already committed transactions.

Now, it is possible that some users may want to parallelize the
transaction but still want to maintain commit order because they don't
explicitly annotate FK, PK for columns but maintain the integrity via
application. So, in such cases as we won't be able to detect
transaction dependencies, it would be better to allow out-of-order
commits optionally.

Thoughts?

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: POC: Parallel processing of indexes in autovacuum
Next
From: Greg Burd
Date:
Subject: Re: [PATCH] Add tests for Bitmapset