Re: Optimize LISTEN/NOTIFY - Mailing list pgsql-hackers
| From | Tom Lane |
|---|---|
| Subject | Re: Optimize LISTEN/NOTIFY |
| Date | |
| Msg-id | 803584.1768503335@sss.pgh.pa.us Whole thread Raw |
| In response to | Re: Optimize LISTEN/NOTIFY (Tom Lane <tgl@sss.pgh.pa.us>) |
| List | pgsql-hackers |
I wrote:
> Fascinating. I was doing my testing on Intel (RHEL8). I'd bet a good
> deal that this is more about the OS than the hardware. I wonder if
> newer Linux versions behave differently.
> I can try to reproduce your results tomorrow on macOS (M4 Pro chip).
It does seem like macOS behaves noticeably differently than Linux.
On a 2024 Mac Mini M4 Pro, running current macOS (Tahoe 26.2),
I get results like these from your preferred test case:
$ ./pg_async_notify_test --listeners 1 --notifiers 1 --channels 1000 --sleep 0.01 --sleep-exp 1.01 --duration 10
today's HEAD:
10 s: 6211 sent (657/s), 6217 received (566/s)
Notification Latency Distribution:
0.00-0.01ms 0 (0.0%) avg: 0.000ms
0.01-0.10ms 0 (0.0%) avg: 0.000ms
0.10-1.00ms 0 (0.0%) avg: 0.000ms
1.00-10.00ms 0 (0.0%) avg: 0.000ms
10.00-100.00ms # 77 (1.2%) avg: 58.837mss
>100.00ms ######### 6140 (98.8%) avg: 725.985ms
v34 patch:
10 s: 143733 sent (14618/s), 143733 received (14630/s)
Notification Latency Distribution:
0.00-0.01ms 0 (0.0%) avg: 0.000ms
0.01-0.10ms # 11384 (7.9%) avg: 0.084ms
0.10-1.00ms ###### 93357 (65.0%) avg: 0.266ms
1.00-10.00ms ## 36566 (25.4%) avg: 3.525ms
10.00-100.00ms # 2426 (1.7%) avg: 20.367mss
>100.00ms 0 (0.0%) avg: 0.000ms
v35 patch:
10 s: 43872 sent (4652/s), 43870 received (4651/s)
Notification Latency Distribution:
0.00-0.01ms 0 (0.0%) avg: 0.000ms
0.01-0.10ms # 267 (0.6%) avg: 0.084ms
0.10-1.00ms # 2105 (4.8%) avg: 0.337mss
1.00-10.00ms # 3861 (8.8%) avg: 5.372mss
10.00-100.00ms ####### 31756 (72.4%) avg: 51.454ms
>100.00ms # 5881 (13.4%) avg: 117.087ms
But on my Intel workstation (Xeon W-2245, up-to-date RHEL 8)
it looks like this:
HEAD:
10 s: 15324 sent (1565/s), 15326 received (1567/s)
Notification Latency Distribution:
0.00-0.01ms 0 (0.0%) avg: 0.000ms
0.01-0.10ms 0 (0.0%) avg: 0.000ms
0.10-1.00ms 0 (0.0%) avg: 0.000ms
1.00-10.00ms # 15 (0.1%) avg: 6.743ms
10.00-100.00ms # 92 (0.6%) avg: 56.650ms
>100.00ms ######### 15219 (99.3%) avg: 253.127ms
v34:
10 s: 198891 sent (20011/s), 198890 received (20010/s)
Notification Latency Distribution:
0.00-0.01ms 0 (0.0%) avg: 0.000ms
0.01-0.10ms ####### 158004 (79.4%) avg: 0.067ms
0.10-1.00ms # 39661 (19.9%) avg: 0.183ms
1.00-10.00ms # 1051 (0.5%) avg: 2.832ms
10.00-100.00ms # 175 (0.1%) avg: 14.321ms
>100.00ms 0 (0.0%) avg: 0.000ms
v35:
10 s: 190192 sent (19966/s), 190192 received (19966/s)
Notification Latency Distribution:
0.00-0.01ms 0 (0.0%) avg: 0.000ms
0.01-0.10ms ####### 151957 (79.9%) avg: 0.063ms
0.10-1.00ms # 31932 (16.8%) avg: 0.207ms
1.00-10.00ms # 2693 (1.4%) avg: 3.768ms
10.00-100.00ms # 3610 (1.9%) avg: 25.191mss
>100.00ms 0 (0.0%) avg: 0.000ms
This doesn't make a lot of sense if you compare the hardware specs:
the M4 Pro has more than double the geekbench ratings of the W-2245,
yet it runs these tests much more slowly. I think perhaps there is
something about the way we do sleep/wakeup on macOS that is not as
well optimized as Linux.
Also, I was testing various cases that just go as fast as possible,
no --sleep. On the Mac (ignoring plain HEAD, it's not in the
same league):
v34:
$ ./pg_async_notify_test --listeners 1 --notifiers 1 --channels 100 --duration 10
10 s: 674530 sent (67781/s), 674529 received (67781/s)
Notification Latency Distribution:
0.00-0.01ms 0 (0.0%) avg: 0.000ms
0.01-0.10ms 0 (0.0%) avg: 0.000ms
0.10-1.00ms # 1 (0.0%) avg: 0.572ms
1.00-10.00ms ######### 674528 (100.0%) avg: 1.513ms
10.00-100.00ms 0 (0.0%) avg: 0.000ms
>100.00ms 0 (0.0%) avg: 0.000ms
$ ./pg_async_notify_test --listeners 10 --notifiers 1 --channels 100 --duration 10
10 s: 81097 sent (8015/s), 810962 received (80163/s)
Notification Latency Distribution:
0.00-0.01ms 0 (0.0%) avg: 0.000ms
0.01-0.10ms 0 (0.0%) avg: 0.000ms
0.10-1.00ms # 64 (0.0%) avg: 0.497ms
1.00-10.00ms ## 224131 (27.6%) avg: 9.105ms
10.00-100.00ms ####### 586772 (72.4%) avg: 14.290ms
>100.00ms 0 (0.0%) avg: 0.000ms
$ ./pg_async_notify_test --listeners 1 --notifiers 1 --channels 1000 --duration 10
10 s: 363167 sent (36772/s), 363167 received (36772/s)
Notification Latency Distribution:
0.00-0.01ms 0 (0.0%) avg: 0.000ms
0.01-0.10ms 0 (0.0%) avg: 0.000ms
0.10-1.00ms 0 (0.0%) avg: 0.000ms
1.00-10.00ms # 2 (0.0%) avg: 5.912ms
10.00-100.00ms ######### 362165 (99.7%) avg: 27.450ms
>100.00ms # 1000 (0.3%) avg: 129.015ms
v35:
$ ./pg_async_notify_test --listeners 1 --notifiers 1 --channels 100 --duration 10
10 s: 707180 sent (70699/s), 707179 received (70698/s)
Notification Latency Distribution:
0.00-0.01ms 0 (0.0%) avg: 0.000ms
0.01-0.10ms # 1 (0.0%) avg: 0.084ms
0.10-1.00ms # 43 (0.0%) avg: 0.862ms
1.00-10.00ms ######### 707135 (100.0%) avg: 1.441ms
10.00-100.00ms 0 (0.0%) avg: 0.000ms
>100.00ms 0 (0.0%) avg: 0.000ms
$ ./pg_async_notify_test --listeners 10 --notifiers 1 --channels 100 --duration 10
10 s: 76301 sent (7632/s), 763000 received (76325/s)
Notification Latency Distribution:
0.00-0.01ms 0 (0.0%) avg: 0.000ms
0.01-0.10ms 0 (0.0%) avg: 0.000ms
0.10-1.00ms # 28 (0.0%) avg: 0.886ms
1.00-10.00ms # 91294 (12.0%) avg: 9.351ms
10.00-100.00ms ######## 671680 (88.0%) avg: 13.953ms
>100.00ms 0 (0.0%) avg: 0.000ms
$ ./pg_async_notify_test --listeners 1 --notifiers 1 --channels 1000 --duration 10
10 s: 313326 sent (36772/s), 313325 received (36771/s)
Notification Latency Distribution:
0.00-0.01ms 0 (0.0%) avg: 0.000ms
0.01-0.10ms 0 (0.0%) avg: 0.000ms
0.10-1.00ms 0 (0.0%) avg: 0.000ms
1.00-10.00ms 0 (0.0%) avg: 0.000ms
10.00-100.00ms ######### 302951 (96.7%) avg: 27.686ms
>100.00ms # 10375 (3.3%) avg: 164.892mss
But on the Linux box:
v34:
$ ./pg_async_notify_test --listeners 1 --notifiers 1 --channels 100 --duration 10
10 s: 914323 sent (92641/s), 914322 received (92641/s)
Notification Latency Distribution:
0.00-0.01ms 0 (0.0%) avg: 0.000ms
0.01-0.10ms # 13 (0.0%) avg: 0.045ms
0.10-1.00ms # 17 (0.0%) avg: 0.598ms
1.00-10.00ms ######### 914292 (100.0%) avg: 1.105ms
10.00-100.00ms 0 (0.0%) avg: 0.000ms
>100.00ms 0 (0.0%) avg: 0.000ms
$ ./pg_async_notify_test --listeners 10 --notifiers 1 --channels 100 --duration 10
10 s: 205412 sent (20655/s), 2054112 received (206550/s)
Notification Latency Distribution:
0.00-0.01ms 0 (0.0%) avg: 0.000ms
0.01-0.10ms 0 (0.0%) avg: 0.000ms
0.10-1.00ms # 149 (0.0%) avg: 0.455ms
1.00-10.00ms ######### 2053945 (100.0%) avg: 4.888ms
10.00-100.00ms # 18 (0.0%) avg: 10.430ms
>100.00ms 0 (0.0%) avg: 0.000ms
$ ./pg_async_notify_test --listeners 1 --notifiers 1 --channels 1000 --duration 10
10 s: 331651 sent (33458/s), 331649 received (33458/s)
Notification Latency Distribution:
0.00-0.01ms 0 (0.0%) avg: 0.000ms
0.01-0.10ms 0 (0.0%) avg: 0.000ms
0.10-1.00ms # 1 (0.0%) avg: 0.999ms
1.00-10.00ms # 1 (0.0%) avg: 1.035ms
10.00-100.00ms ######### 331648 (100.0%) avg: 30.134ms
>100.00ms 0 (0.0%) avg: 0.000ms
v35:
$ ./pg_async_notify_test --listeners 1 --notifiers 1 --channels 100 --duration 10
10 s: 940448 sent (95016/s), 940447 received (95016/s)
Notification Latency Distribution:
0.00-0.01ms 0 (0.0%) avg: 0.000ms
0.01-0.10ms # 11 (0.0%) avg: 0.075ms
0.10-1.00ms # 557 (0.1%) avg: 0.660ms
1.00-10.00ms ######### 939880 (99.9%) avg: 1.074ms
10.00-100.00ms 0 (0.0%) avg: 0.000ms
>100.00ms 0 (0.0%) avg: 0.000ms
$ ./pg_async_notify_test --listeners 10 --notifiers 1 --channels 100 --duration 10
10 s: 208430 sent (21033/s), 2084298 received (210329/s)
Notification Latency Distribution:
0.00-0.01ms 0 (0.0%) avg: 0.000ms
0.01-0.10ms # 121 (0.0%) avg: 0.071ms
0.10-1.00ms # 455 (0.0%) avg: 0.493ms
1.00-10.00ms ######### 2082795 (99.9%) avg: 4.813ms
10.00-100.00ms # 929 (0.0%) avg: 11.882ms
>100.00ms 0 (0.0%) avg: 0.000ms
$ ./pg_async_notify_test --listeners 1 --notifiers 1 --channels 1000 --duration 10
10 s: 351154 sent (35975/s), 351153 received (35976/s)
Notification Latency Distribution:
0.00-0.01ms 0 (0.0%) avg: 0.000ms
0.01-0.10ms 0 (0.0%) avg: 0.000ms
0.10-1.00ms 0 (0.0%) avg: 0.000ms
1.00-10.00ms 0 (0.0%) avg: 0.000ms
10.00-100.00ms ########## 351154 (100.0%) avg: 28.460ms
>100.00ms 0 (0.0%) avg: 0.000ms
Also, I looked at "perf" results for the as-fast-as-possible
runs, and was interested to see that the directly notify-related
logic accounts for only 10%-15% of the runtime. The rest is going
into generic transaction housekeeping, client I/O, kernel overhead,
and so on. So that bolsters my feeling that we should be minimizing
process wakeups rather than trying to optimize anything more in
the notify processing itself.
Anyway, at this point I'm content to go ahead with v35, and
I'll push that in a little bit. Perhaps we should take a TODO
to figure out why this test scenario runs so poorly on macOS;
but I'll bet that the answer is not anywhere near async.c itself.
regards, tom lane
pgsql-hackers by date: