Re: [PATCH] Hex-coding optimizations using SVE on ARM. - Mailing list pgsql-hackers
From | Ranier Vilela |
---|---|
Subject | Re: [PATCH] Hex-coding optimizations using SVE on ARM. |
Date | |
Msg-id | CAEudQAqsYN2+i_05NvyS0csQbukmgoP2xX7RAp6niHTapO7i1w@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] Hex-coding optimizations using SVE on ARM. (John Naylor <johncnaylorls@gmail.com>) |
List | pgsql-hackers |
Hi.
Em qua., 15 de jan. de 2025 às 07:57, John Naylor <johncnaylorls@gmail.com> escreveu:
On Wed, Jan 15, 2025 at 2:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Couple of thoughts:
>
> 1. I was actually hoping for a comment on the constant's definition,
> perhaps along the lines of
>
> /*
> * The hex expansion of each possible byte value (two chars per value).
> */
Works for me. With that, did you mean we then wouldn't need a comment
in the code?
> 2. Since "src" is defined as "const char *", I'm pretty sure that
> pickier compilers will complain that
>
> + unsigned char usrc = *((unsigned char *) src);
>
> results in casting away const. Recommend
>
> + unsigned char usrc = *((const unsigned char *) src);
Thanks for the reminder!
> 3. I really wonder if
>
> + memcpy(dst, &hextbl[2 * usrc], 2);
>
> is faster than copying the two bytes manually, along the lines of
>
> + *dst++ = hextbl[2 * usrc];
> + *dst++ = hextbl[2 * usrc + 1];
>
> Compilers that inline memcpy() may arrive at the same machine code,
> but why rely on the compiler to make that optimization? If the
> compiler fails to do so, an out-of-line memcpy() call will surely
> be a loser.
See measurements at the end. As for compilers, gcc 3.4.6 and clang
3.0.0 can inline the memcpy. The manual copy above only gets combined
to a single word starting with gcc 12 and clang 15, and latest MSVC
still can't do it (4A in the godbolt link below). Are there any
buildfarm animals around that may not inline memcpy for word-sized
input?
> A variant could be
>
> + const char *hexptr = &hextbl[2 * usrc];
> + *dst++ = hexptr[0];
> + *dst++ = hexptr[1];
>
> but this supposes that the compiler fails to see the common
> subexpression in the other formulation, which I believe
> most modern compilers will see.
This combines to a single word starting with clang 5, but does not
work on gcc 14.2 or gcc trunk (4B below). I have gcc 14.2 handy, and
on my machine bytewise load/stores are somewhere in the middle:
master 1158.969 ms
v3 776.791 ms
variant 4A 775.777 ms
variant 4B 969.945 ms
https://godbolt.org/z/ajToordKq
Your example from godbolt, has a
have an important difference, which modifies the assembler result.
-static const char hextbl[] = "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebfc0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedfe0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2f3f4f5f6f7f8f9fafbfcfdfeff";
+static const char hextbl[512] = "000102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f202122232425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f404142434445464748494a4b4c4d4e4f505152535455565758595a5b5c5d5e5f606162636465666768696a6b6c6d6e6f707172737475767778797a7b7c7d7e7f808182838485868788898a8b8c8d8e8f909192939495969798999a9b9c9d9e9fa0a1a2a3a4a5a6a7a8a9aaabacadaeafb0b1b2b3b4b5b6b7b8b9babbbcbdbebfc0c1c2c3c4c5c6c7c8c9cacbcccdcecfd0d1d2d3d4d5d6d7d8d9dadbdcdddedfe0e1e2e3e4e5e6e7e8e9eaebecedeeeff0f1f2f3f4f5f6f7f8f9fafbfcfdfeff";best regards,
Ranier Vilela
pgsql-hackers by date: