Re: Parallel Apply - Mailing list pgsql-hackers

From Nisha Moond
Subject Re: Parallel Apply
Date
Msg-id CABdArM4gv08OWF5Gxndf8cVgO3MVeU9T8z47sZR=rUfL1N9bqw@mail.gmail.com
Whole thread Raw
In response to RE: Parallel Apply  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
List pgsql-hackers
On Wed, Aug 13, 2025 at 4:17 PM Zhijie Hou (Fujitsu)
<houzj.fnst@fujitsu.com> wrote:
>
> Here is the initial POC patch for this idea.
>

Thank you Hou-san for the patch.

I did some performance benchmarking for the patch and overall, the
results show substantial performance improvements.
Please find the details as follows:

Source code:
----------------
pgHead (572c0f1b0e) and v1-0001 patch

Setup:
---------
Pub --> Sub
 - Two nodes created in pub-sub logical replication setup.
 - Both nodes have the same set of pgbench tables created with scale=300.
 - The sub node is subscribed to all the changes from the pub node's
pgbench tables.

Workload Run:
--------------------
 - Disable the subscription on Sub node
 - Run default pgbench(read-write) only on Pub node with #clients=40
and run duration=10 minutes
 - Enable the subscription on Sub once pgbench completes and then
measure time taken in replication.
~~~

Test-01: Measure Replication lag
----------------------------------------
Observations:
---------------
 - Replication time improved as the number of parallel workers
increased with the patch.
 - On pgHead, replicating a 10-minute publisher workload took ~46 minutes.
 - With just 2 parallel workers (default), replication time was cut in
half, and with 8 workers it completed in ~13 minutes(3.5x faster).
 - With 16 parallel workers, achieved ~3.7x speedup over pgHead.
 - With 32 workers, performance gains plateaued slightly, likely due
to more workers running on the machine and work done parallelly is not
that high to see further improvements.

Detailed Result:
-----------------
Case    Time_taken_in_replication(sec)    rep_time_in_minutes
faster_than_head
1. pgHead              2760.791     46.01318333    -
2. patched_#worker=2    1463.853    24.3975    1.88 times
3. patched_#worker=4    1031.376    17.1896    2.68 times
4. patched_#worker=8      781.007    13.0168    3.54 times
5. patched_#worker=16    741.108    12.3518    3.73 times
6. patched_#worker=32    787.203    13.1201    3.51 times
~~~~

Test-02: Measure number of transactions parallelized
-----------------------------------------------------
 - Used a top up patch to LOG the number of transactions applied by
parallel worker, applied by leader, and are depended.
 - The LOG output e.g. -
  ```
LOG:  parallelized_nxact: 11497254 dependent_nxact: 0 leader_applied_nxact: 600
```
 - parallelized_nxact: gives the number of parallelized transactions
 - dependent_nxact: gives the dependent transactions
 - leader_applied_nxact: gives the transactions applied by leader worker
 (the required top-up v1-002 patch is attached.)

 Observations:
----------------
 - With 4 to 8 parallel workers, ~80%-98% transactions are parallelized
 - As the number of workers increased, the parallelized percentage
increased and reached 99.99% with 32 workers.

Detailed Result:
-----------------
case1: #parallel_workers = 2(default)
  #total_pgbench_txns = 24745648
    parallelized_nxact = 14439480 (58.35%)
    dependent_nxact    = 16 (0.00006%)
    leader_applied_nxact = 10306153 (41.64%)

case2: #parallel_workers = 4
  #total_pgbench_txns = 24776108
    parallelized_nxact = 19666593 (79.37%)
    dependent_nxact    = 212 (0.0008%)
    leader_applied_nxact = 5109304 (20.62%)

case3: #parallel_workers = 8
  #total_pgbench_txns = 24821333
    parallelized_nxact = 24397431 (98.29%)
    dependent_nxact    = 282 (0.001%)
    leader_applied_nxact = 423621 (1.71%)

case4: #parallel_workers = 16
  #total_pgbench_txns = 24938255
    parallelized_nxact = 24937754 (99.99%)
    dependent_nxact    = 142 (0.0005%)
    leader_applied_nxact = 360 (0.0014%)

case5: #parallel_workers = 32
  #total_pgbench_txns = 24769474
    parallelized_nxact = 24769135 (99.99%)
    dependent_nxact    = 312 (0.0013%)
    leader_applied_nxact = 28 (0.0001%)

~~~~~
The scripts used for above tests are attached.

Next, I plan to extend the testing to larger workloads by running
pgbench for 20–30 minutes.
We will also benchmark performance across different workload types to
evaluate the improvements once the patch has matured further.

--
Thanks,
Nisha

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Proposal: Conflict log history table for Logical Replication
Next
From: David Rowley
Date:
Subject: Re: max_locks_per_transaction v18