Hi, Nathan.
I just realized that I almost forgot about this thread :)
> The result looks great, but the discussion in [0] shows that the result may
> vary among different ARM chips. Could you provide the chip model of this
> test? So that we can do a cross validation of this patch. Not sure if compiler
> version is necessary too. I'm willing to test it on Alibaba Cloud Yitian 710
> if I have time.
I did some benchmark on Yitian 710.
On c8y.16xlarge (64 cores):
Without the patch:
80.31% postgres [.] __aarch64_swp4_acq
1.77% postgres [.] __aarch64_ldadd4_acq_rel
1.13% postgres [.] hash_search_with_hash_value
0.87% pg_stat_statements.so [.] __aarch64_swp4_acq
0.72% postgres [.] perform_spin_delay
0.44% postgres [.] _bt_compare
tps = 295272.628421 (including connections establishing)
tps = 295335.660323 (excluding connections establishing)
Patched:
9.94% postgres [.] s_lock
6.07% postgres [.] __aarch64_swp4_acq
5.73% postgres [.] hash_search_with_hash_value
2.81% postgres [.] perform_spin_delay
2.29% postgres [.] _bt_compare
2.15% postgres [.] PinBuffer
tps = 864519.764125 (including connections establishing)
tps = 864638.244443 (excluding connections establishing)
Seems that great performance could be gained if s_lock contention is severe.
This may be more likely to happen on bigger machines.
On c8y.2xlarge (8 cores), I failed to make s_lock contended severely, and
as a result this patch didn’t bring any difference outside the noise.
Regards,
Jingtang