On Fri, Jan 10, 2025 at 09:38:14AM -0600, Nathan Bossart wrote:
> On Fri, Jan 10, 2025 at 11:10:03AM +0000, Chiranmoy.Bhattacharya@fujitsu.com wrote:
>> We tried auto-vectorization and observed no performance improvement.
>
> Do you mean that the auto-vectorization worked and you observed no
> performance improvement, or the auto-vectorization had no effect on the
> code generated?
I was able to get auto-vectorization to take effect on Apple clang 16 with
the following addition to src/backend/utils/adt/Makefile:
encode.o: CFLAGS += ${CFLAGS_VECTORIZE} -mllvm -force-vector-width=8
This gave the following results with your hex_encode_test() function:
buf | HEAD | patch | % diff
-------+-------+-------+--------
16 | 21 | 16 | 24
64 | 54 | 41 | 24
256 | 138 | 100 | 28
1024 | 441 | 300 | 32
4096 | 1671 | 1106 | 34
16384 | 6890 | 4570 | 34
65536 | 27393 | 18054 | 34
This doesn't compare with the gains you are claiming to see with
intrinsics, but it's not bad for a one line change. I bet there are ways
to adjust the code so that the auto-vectorization is more effective, too.
--
nathan