Re: define pg_structiszero(addr, s, r) - Mailing list pgsql-hackers
From | Ranier Vilela |
---|---|
Subject | Re: define pg_structiszero(addr, s, r) |
Date | |
Msg-id | CAEudQAqWbLS1q4VZ+d4Tx5T-DHryQ8juu=Kgn9wfiV8iB-pzyA@mail.gmail.com Whole thread Raw |
In response to | Re: define pg_structiszero(addr, s, r) (David Rowley <dgrowleyml@gmail.com>) |
List | pgsql-hackers |
Em ter., 5 de nov. de 2024 às 00:23, David Rowley <dgrowleyml@gmail.com> escreveu:
On Tue, 5 Nov 2024 at 06:39, Ranier Vilela <ranier.vf@gmail.com> wrote:
> I think we can add a small optimization to this last patch [1].
> The variable *aligned_end* is only needed in the second loop (for).
> So, only before the for loop do we actually declare it.
>
> Result before this change:
> check zeros using BERTRAND 1 0.000031s
>
> Result after this change:
> check zeros using BERTRAND 1 0.000018s
>
> + const unsigned char *aligned_end;
>
> + /* Multiple bytes comparison(s) at once */
> + aligned_end = (const unsigned char *) ((uintptr_t) end & (~(sizeof(size_t) - 1)));
> + for (; p < aligned_end; p += sizeof(size_t))
I think we all need to stop using Godbolt's servers to run benchmarks
on. These servers are likely to be running various other workloads in
highly virtualised environments and are not going to be stable servers
that would give consistent benchmark results.
I tried your optimisation in the attached allzeros.c and here are my results:
# My version
$ gcc allzeros.c -O2 -o allzeros && for i in {1..3}; do ./allzeros; done
char: done in 1566400 nanoseconds
size_t: done in 195400 nanoseconds (8.01638 times faster than char)
char: done in 1537500 nanoseconds
size_t: done in 196300 nanoseconds (7.8324 times faster than char)
char: done in 1543600 nanoseconds
size_t: done in 196300 nanoseconds (7.86347 times faster than char)
# Ranier's optimization
$ gcc allzeros.c -O2 -D RANIERS_OPTIMIZATION -o allzeros && for i in
{1..3}; do ./allzeros; done
char: done in 1943100 nanoseconds
size_t: done in 531700 nanoseconds (3.6545 times faster than char)
char: done in 1957200 nanoseconds
size_t: done in 458400 nanoseconds (4.26963 times faster than char)
char: done in 1949500 nanoseconds
size_t: done in 469000 nanoseconds (4.15672 times faster than char)
Seems to be about half as fast with gcc on -O2
Thanks for test coding.
I've tried with msvc 2022 32bits
Here the results:
C:\usr\src\tests\allzeros>allzeros
char: done in 71431900 nanoseconds
size_t: done in 18010900 nanoseconds (3.96604 times faster than char)
C:\usr\src\tests\allzeros>allzeros
char: done in 71070100 nanoseconds
size_t: done in 19654300 nanoseconds (3.61601 times faster than char)
C:\usr\src\tests\allzeros>allzeros
char: done in 68682400 nanoseconds
size_t: done in 19841100 nanoseconds (3.46162 times faster than char)
C:\usr\src\tests\allzeros>allzeros
char: done in 63215100 nanoseconds
size_t: done in 17920200 nanoseconds (3.52759 times faster than char)
char: done in 71431900 nanoseconds
size_t: done in 18010900 nanoseconds (3.96604 times faster than char)
C:\usr\src\tests\allzeros>allzeros
char: done in 71070100 nanoseconds
size_t: done in 19654300 nanoseconds (3.61601 times faster than char)
C:\usr\src\tests\allzeros>allzeros
char: done in 68682400 nanoseconds
size_t: done in 19841100 nanoseconds (3.46162 times faster than char)
C:\usr\src\tests\allzeros>allzeros
char: done in 63215100 nanoseconds
size_t: done in 17920200 nanoseconds (3.52759 times faster than char)
C:\usr\src\tests\allzeros>c /DRANIERS_OPTIMIZATION
Microsoft (R) Program Maintenance Utility Versão 14.40.33813.0
Direitos autorais da Microsoft Corporation. Todos os direitos reservados.
C:\usr\src\tests\allzeros>allzeros
char: done in 67213800 nanoseconds
size_t: done in 15049200 nanoseconds (4.46627 times faster than char)
C:\usr\src\tests\allzeros>allzeros
char: done in 51505900 nanoseconds
size_t: done in 13645700 nanoseconds (3.77452 times faster than char)
C:\usr\src\tests\allzeros>allzeros
char: done in 62852600 nanoseconds
size_t: done in 17863800 nanoseconds (3.51843 times faster than char)
C:\usr\src\tests\allzeros>allzeros
char: done in 51877200 nanoseconds
size_t: done in 13759900 nanoseconds (3.77017 times faster than char)
Microsoft (R) Program Maintenance Utility Versão 14.40.33813.0
Direitos autorais da Microsoft Corporation. Todos os direitos reservados.
C:\usr\src\tests\allzeros>allzeros
char: done in 67213800 nanoseconds
size_t: done in 15049200 nanoseconds (4.46627 times faster than char)
C:\usr\src\tests\allzeros>allzeros
char: done in 51505900 nanoseconds
size_t: done in 13645700 nanoseconds (3.77452 times faster than char)
C:\usr\src\tests\allzeros>allzeros
char: done in 62852600 nanoseconds
size_t: done in 17863800 nanoseconds (3.51843 times faster than char)
C:\usr\src\tests\allzeros>allzeros
char: done in 51877200 nanoseconds
size_t: done in 13759900 nanoseconds (3.77017 times faster than char)
The function used to replace clock_getime is:
timespec_get(ts, TIME_UTC)
best regards,
Ranier Vilela
pgsql-hackers by date: