RE: Review/Pull Request: Adding new CRC32C implementation for IBM S390X - Mailing list pgsql-hackers

From Eduard Stefes
Subject RE: Review/Pull Request: Adding new CRC32C implementation for IBM S390X
Date
Msg-id 62e18f3d02e770a3790e68c8dd0a4e9d576db841.camel@ibm.com
Whole thread Raw
In response to Re: Review/Pull Request: Adding new CRC32C implementation for IBM S390X  (John Naylor <johncnaylorls@gmail.com>)
Responses Re: Review/Pull Request: Adding new CRC32C implementation for IBM S390X
List pgsql-hackers
Hi,

So I worked on the algorithm to also work on buffers between 16-64
bytes. Then I ran the performance measurement on two
dataset[^raw_data_1] [^raw_data_2]. And created two diagrams
[^attachment].

my findings so far:

- the optimized crc32cvx is faster
- the sb8 performance is heavily depending on alignment (see the
ripples every 8 bytes)
- the 8 byte ripple is also visible in the vx implementation. As it can
only perform on 16 or 64 byte chunks, it will still use sb8 for the
remaining bytes.
- there is no obvious speed regression in the vx algorithm. Except
raw_data_2-28 which I assume is a fluke. I am sharing the system with a
bunch of other devs.


I hope this this is acceptable as performance measurement. However we
will setup a dedicated performance test and try to get precise numbers
without side-effects. But it may take some time until we get to that.

I'll post the update on the Code together with the other requested
updates.

cheers, Eddy



[^raw_data_1]
bytes    crc32c_sb8    crc32c_vx
4    6.54 ms        6.548 ms
8    4.476 ms    4.47 ms
10    7.346 ms    7.348 ms
12    10.955 ms    10.958 ms
14    14.548 ms    14.546 ms
16    6.837 ms    6.193 ms
32    12.23 ms    6.741 ms
64    22.826 ms    7.6 ms
80    28.536 ms    8.307 ms
96    34.426 ms    9.09 ms
112    40.295 ms    9.844 ms
128    46.053 ms    10.825 ms
144    51.868 ms    11.712 ms
160    65.91 ms    12.122 ms
176    71.649 ms    13.055 ms
192    77.465 ms    11.716 ms
208    83.286 ms    13.532 ms
224    88.991 ms    13.165 ms
240    94.875 ms    13.881 ms
256    100.653 ms    13.147 ms
8192    2967.477 ms    182.911 ms

[^raw_data_2]
bytes    crc32c_sb8    crc32c_vx
4    6.543 ms    6.536 ms
8    4.476 ms    4.47 ms
10    7.35 ms        7.345 ms
12    10.96 ms    10.954 ms
14    14.552 ms    14.588 ms
16    6.843 ms    6.189 ms
18    10.253 ms    9.814 ms
24    9.645 ms    9.924 ms
28    15.957 ms    17.211 ms
32    12.226 ms    6.726 ms
36    18.823 ms    14.484 ms
42    17.855 ms    14.271 ms
48    17.342 ms    7.344 ms
52    24.208 ms    15.306 ms
58    23.525 ms    14.695 ms
64    22.818 ms    7.593 ms



On Thu, 2025-05-08 at 05:32 +0700, John Naylor wrote:
> On Wed, May 7, 2025 at 8:15 PM Aleksander Alekseev
> <aleksander@timescale.com> wrote:
> >
> > I didn't review the patch but wanted to point out that when it
> > comes
> > to performance improvements it's typically useful to provide some
> > benchmarks.
>
> +1 -- It's good to have concrete numbers for the commit message, and
> also to verify improvement on short inputs. There is a test harness
> in
> the  v7-0002 patch from here:
>
> https://www.postgresql.org/message-id/CANWCAZaD5niydBF6q3V_cjApNV05cw-LpxxFtMbwDPLsz-PjbQ@mail.gmail.com
>  
>
> After building, run the "test-crc.sh" script here after executing
> "CREATE EXTENSION test_crc32c;":
>
> https://www.postgresql.org/message-id/CANWCAZahvhE-%2BhtZiUyzPiS5e45ukx5877mD-dHr-KSX6LcdjQ%40mail.gmail.com
>  
>
>
>
> --
> John Naylor
> Amazon Web Services

Attachment

pgsql-hackers by date:

Previous
From: Kevin K Biju
Date:
Subject: Re: Add CHECK_FOR_INTERRUPTS in polling loop code path in XactLockTableWait
Next
From: Paul A Jungwirth
Date:
Subject: Re: SQL:2011 application time