Re: [PATCH] Hex-coding optimizations using SVE on ARM. - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: [PATCH] Hex-coding optimizations using SVE on ARM.
Date
Msg-id Z4GHNfhRKuA0r_Wn@nathan
Whole thread Raw
In response to Re: [PATCH] Hex-coding optimizations using SVE on ARM.  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: [PATCH] Hex-coding optimizations using SVE on ARM.
List pgsql-hackers
On Fri, Jan 10, 2025 at 09:38:14AM -0600, Nathan Bossart wrote:
> On Fri, Jan 10, 2025 at 11:10:03AM +0000, Chiranmoy.Bhattacharya@fujitsu.com wrote:
>> We tried auto-vectorization and observed no performance improvement.
> 
> Do you mean that the auto-vectorization worked and you observed no
> performance improvement, or the auto-vectorization had no effect on the
> code generated?

I was able to get auto-vectorization to take effect on Apple clang 16 with
the following addition to src/backend/utils/adt/Makefile:

    encode.o: CFLAGS += ${CFLAGS_VECTORIZE} -mllvm -force-vector-width=8

This gave the following results with your hex_encode_test() function:

    buf  | HEAD  | patch | % diff
  -------+-------+-------+--------
      16 |    21 |    16 |   24
      64 |    54 |    41 |   24
     256 |   138 |   100 |   28
    1024 |   441 |   300 |   32
    4096 |  1671 |  1106 |   34
   16384 |  6890 |  4570 |   34
   65536 | 27393 | 18054 |   34

This doesn't compare with the gains you are claiming to see with
intrinsics, but it's not bad for a one line change.  I bet there are ways
to adjust the code so that the auto-vectorization is more effective, too.

-- 
nathan



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: IANA timezone abbreviations versus timezone_abbreviations
Next
From: Tom Lane
Date:
Subject: Re: Memory leak in plpython3u (with testcase and patch)