Re: Parallel Apply - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | Re: Parallel Apply |
Date | |
Msg-id | d25177ef-0d1d-4489-9b2d-b6c43ad6677c@garret.ru Whole thread Raw |
In response to | Parallel Apply (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
RE: Parallel Apply
|
List | pgsql-hackers |
Hi, 4. Triggers and Constraints For the initial version, exclude tables with user-defined triggers or constraints from parallel apply due to complexity in dependency detection. We may need some parallel-apply-safe marking to allow this.
I think that the problem is wider than just triggers and constrains.
Even if database has no triggers and constraints, there still can be causality violations.
If transactions at subscriber are executed in different order than on publisher, then it is possible to observe some "invalid" database state which is never possible at publisher. Assume very simple example: you withdraw some money in ATM from one account and then deposit them to some other account. There are two different transactions. And there are no any dependencies between them (they update different records). But if second transaction is committed before first, then we can view incorrect report where total number of money at all accounts exceeds real balance. Another case is when you persisting some stream of events (with timestamps). It may be confusing if at subscriber monotony of events is violated.
And there can be many other similar situations when tjere are no "direct" data dependencies between transactions, but there are hidden "indirect"dependencies. The most popular case you have mentioned: foreign keys. Certainly support of referential integrity constraints can be added. But there can be such dependencies without correspondent constraints in database schema.
You have also suggested to add option which will force preserving commit order. But my experiments with `debug_logical_replication_streaming=immediate` shows that in this case for short transactions performance with parallel workers is even worser than with single apply worker.
May be it is possible to enforce some weaker commit order: do not try to commit transactions in exactly the same order as at publisher, but if transaction T1 at publisher is started after T2 is committed, then T2 can not be committed before T1 at subscriber. Unfortunately it is not clear how to enforce such "partial order" - `LogicalRepBeginData` contains `finish_lsn`, but not `start_lsn`.
First time I read your proposal and especially after seen concrete results of it's implementation, I decided than parallel apply approach is definitely better than prefetch approach. But now I am not so sure. Yes, parallel apply is about 2x times faster than parallel prefetch. But still parallel prefetch allows to 2-3 times increase LR speed without causing any problems with deadlock, constraints, triggers,...
Replication Progress Tracking ----------------------------------------- Parallel apply introduces out-of-order commit application, complicating replication progress tracking. To handle restarts and ensure consistency: Track Three Key Metrics: lowest_remote_lsn: Starting point for applying transactions. highest_remote_lsn: Highest LSN that has been applied. list_remote_lsn: List of commit LSNs applied between the lowest and highest. Mechanism: Store these in ReplicationState: lowest_remote_lsn, highest_remote_lsn, list_remote_lsn. Flush these to disk during checkpoints similar to CheckPointReplicationOrigin. After Restart, Start from lowest_remote_lsn and for each transaction, if its commit LSN is in list_remote_lsn, skip it, otherwise, apply it. Once commit LSN > highest_remote_lsn, apply without checking the list. During apply, the leader maintains list_in_progress_xacts in the increasing commit order. On commit, update highest_remote_lsn. If commit LSN matches the first in-progress xact of list_in_progress_xacts, update lowest_remote_lsn, otherwise, add to list_remote_lsn. After commit, also remove it from the list_in_progress_xacts. We need to clean up entries below lowest_remote_lsn in list_remote_lsn while updating its value. To illustrate how this mechanism works, consider the following four transactions: Transaction ID Commit LSN 501 1000 502 1100 503 1200 504 1300 Assume: Transactions 501 and 502 take longer to apply whereas transactions 503 and 504 finish earlier. Parallel apply workers are assigned as follows: pa-1 → 501 pa-2 → 502 pa-3 → 503 pa-4 → 504 Initial state: list_in_progress_xacts = [501, 502, 503, 504] Step 1: Transaction 503 commits first and in RecordTransactionCommit, it updates highest_remote_lsn to 1200. In apply_handle_commit, since 503 is not the first in list_in_progress_xacts, add 1200 to list_remote_lsn. Remove 503 from list_in_progress_xacts. Step 2: Transaction 504 commits, Update highest_remote_lsn to 1300. Add 1300 to list_remote_lsn. Remove 504 from list_in_progress_xacts. ReplicationState now: lowest_remote_lsn = 0 list_remote_lsn = [1200, 1300] highest_remote_lsn = 1300 list_in_progress_xacts = [501, 502] Step 3: Transaction 501 commits. Since 501 is now the first in list_in_progress_xacts, update lowest_remote_lsn to 1000. Remove 501 from list_in_progress_xacts. Clean up list_remote_lsn to remove entries < lowest_remote_lsn (none in this case). ReplicationState now: lowest_remote_lsn = 1000 list_remote_lsn = [1200, 1300] highest_remote_lsn = 1300 list_in_progress_xacts = [502] Step 4: System crash and restart Upon restart, Start replication from lowest_remote_lsn = 1000. First transaction encountered is 502, since it is not present in list_remote_lsn, apply it. As transactions 503 and 504 are present in list_remote_lsn, we skip them. Note that each transaction's end_lsn/commit_lsn has to be compared which the apply worker receives along with the first transaction command BEGIN. This ensures correctness and avoids duplicate application of already committed transactions. Upon restart, start replication from lowest_remote_lsn = 1000. First transaction encountered is 502 with commit LSN 1100, since it is not present in list_remote_lsn, apply it. As transactions 503 and 504's respective commit LSNs [1200, 1300] are present in list_remote_lsn, we skip them. This ensures correctness and avoids duplicate application of already committed transactions. Now, it is possible that some users may want to parallelize the transaction but still want to maintain commit order because they don't explicitly annotate FK, PK for columns but maintain the integrity via application. So, in such cases as we won't be able to detect transaction dependencies, it would be better to allow out-of-order commits optionally. Thoughts?
pgsql-hackers by date: