Re: [PATCH] Hex-coding optimizations using SVE on ARM. - Mailing list pgsql-hackers

From David Rowley
Subject Re: [PATCH] Hex-coding optimizations using SVE on ARM.
Date
Msg-id CAApHDvpJr7r0n6TY_0psumA1uLy7+nmdq==L3jjDT-RYPMoHRw@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Hex-coding optimizations using SVE on ARM.  (John Naylor <johncnaylorls@gmail.com>)
Responses Re: [PATCH] Hex-coding optimizations using SVE on ARM.
List pgsql-hackers
On Wed, 15 Jan 2025 at 23:57, John Naylor <johncnaylorls@gmail.com> wrote:
>
> On Wed, Jan 15, 2025 at 2:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Compilers that inline memcpy() may arrive at the same machine code,
> > but why rely on the compiler to make that optimization?  If the
> > compiler fails to do so, an out-of-line memcpy() call will surely
> > be a loser.
>
> See measurements at the end. As for compilers, gcc 3.4.6 and clang
> 3.0.0 can inline the memcpy. The manual copy above only gets combined
> to a single word starting with gcc 12 and clang 15, and latest MSVC
> still can't do it (4A in the godbolt link below). Are there any
> buildfarm animals around that may not inline memcpy for word-sized
> input?
>
> > A variant could be
> >
> > +               const char *hexptr = &hextbl[2 * usrc];
> > +               *dst++ = hexptr[0];
> > +               *dst++ = hexptr[1];

I'd personally much rather see us using memcpy() for this sort of
stuff. If the compiler is too braindead to inline tiny
constant-and-power-of-two-sized memcpys then we'd probably also have
plenty of other performance issues with that compiler already. I don't
think contorting the code into something less human-readable and
something the compiler may struggle even more to optimise is a good
idea.  The nieve way to implement the above requires two MOVs of
single bytes and two increments of dst. I imagine it's easier for the
compiler to inline a small constant-sized memcpy() than to figure out
that it's safe to implement the above with a single word-sized MOV
rather than two byte-sized MOVs due to the "dst++" in between the two.

I agree that the evidence you (John) gathered is enough reason to use memcpy().

David



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Adding a '--two-phase' option to 'pg_createsubscriber' utility.
Next
From: Peter Smith
Date:
Subject: Re: Skip collecting decoded changes of already-aborted transactions