Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c) - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c) |
Date | |
Msg-id | CAApHDvruee_36_fWSjeCkXiUg04FJKQBVBZZpF9rY7qEdchNPA@mail.gmail.com Whole thread Raw |
In response to | Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c) (Ranier Vilela <ranier.vf@gmail.com>) |
Responses |
Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c)
Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c) Re: Avoid unecessary MemSet call (src/backend/utils/cache/relcache.c) |
List | pgsql-hackers |
On Thu, 19 May 2022 at 02:08, Ranier Vilela <ranier.vf@gmail.com> wrote: > That would initialize the content at compilation and not at runtime, correct? Your mental model of compilation and run-time might be flawed here. Here's no such thing as zeroing memory at compile time. There's only emitting instructions that perform those tasks at run-time. https://godbolt.org/ might help your understanding. > There are a lot of cases using MemSet (with struct variables) and at Windows 64 bits, long are 4 (four) bytes. > So I believe that MemSet is less efficient on Windows than on Linux. > "The size of the '_vstart' buffer is not a multiple of the element size of the type 'long'." > message from PVS-Studio static analysis tool. I've been wondering for a while if we really need to have the MemSet() macro. I see it was added in 8cb415449 (1997). I think compilers have evolved quite a bit in the past 25 years, so it could be time to revisit that. Your comment on the sizeof(long) on win64 is certainly true. I wrote the attached C program to test the performance difference. (windows 64-bit) >cl memset.c /Ox >memset 200000000 Running 200000000 loops MemSet: size 8: 1.833000 seconds MemSet: size 16: 1.841000 seconds MemSet: size 32: 1.838000 seconds MemSet: size 64: 1.851000 seconds MemSet: size 128: 3.228000 seconds MemSet: size 256: 5.278000 seconds MemSet: size 512: 3.943000 seconds memset: size 8: 0.065000 seconds memset: size 16: 0.131000 seconds memset: size 32: 0.262000 seconds memset: size 64: 0.530000 seconds memset: size 128: 1.169000 seconds memset: size 256: 2.950000 seconds memset: size 512: 3.191000 seconds It seems like there's no cases there where MemSet is faster than memset. I was careful to only provide MemSet() with inputs that result in it not using the memset fallback. I also provided constants so that the decision about which method to use was known at compile time. It's not clear to me why 512 is faster than 256. I saw the same on a repeat run. Changing "long" to "long long" it looks like: >memset 200000000 Running 200000000 loops MemSet: size 8: 0.066000 seconds MemSet: size 16: 1.978000 seconds MemSet: size 32: 1.982000 seconds MemSet: size 64: 1.973000 seconds MemSet: size 128: 1.970000 seconds MemSet: size 256: 3.225000 seconds MemSet: size 512: 5.366000 seconds memset: size 8: 0.069000 seconds memset: size 16: 0.132000 seconds memset: size 32: 0.265000 seconds memset: size 64: 0.527000 seconds memset: size 128: 1.161000 seconds memset: size 256: 2.976000 seconds memset: size 512: 3.179000 seconds The situation is a little different on my Linux machine: $ gcc memset.c -o memset -O2 $ ./memset 200000000 Running 200000000 loops MemSet: size 8: 0.000002 seconds MemSet: size 16: 0.000000 seconds MemSet: size 32: 0.094041 seconds MemSet: size 64: 0.184618 seconds MemSet: size 128: 1.781503 seconds MemSet: size 256: 2.547910 seconds MemSet: size 512: 4.005173 seconds memset: size 8: 0.046156 seconds memset: size 16: 0.046123 seconds memset: size 32: 0.092291 seconds memset: size 64: 0.184509 seconds memset: size 128: 1.781518 seconds memset: size 256: 2.577104 seconds memset: size 512: 4.004757 seconds It looks like part of the work might be getting optimised away in the 8-16 MemSet() calls. clang seems to have the opposite for size 8. $ clang memset.c -o memset -O2 $ ./memset 200000000 Running 200000000 loops MemSet: size 8: 0.007653 seconds MemSet: size 16: 0.005771 seconds MemSet: size 32: 0.011539 seconds MemSet: size 64: 0.023095 seconds MemSet: size 128: 0.046130 seconds MemSet: size 256: 0.092269 seconds MemSet: size 512: 0.968564 seconds memset: size 8: 0.000000 seconds memset: size 16: 0.005776 seconds memset: size 32: 0.011559 seconds memset: size 64: 0.023069 seconds memset: size 128: 0.046129 seconds memset: size 256: 0.092243 seconds memset: size 512: 0.968534 seconds There does not seem to be any significant reduction in the size of the binary from changing the MemSet macro to directly use memset. It went from 9865008 bytes down to 9860800 bytes (4208 bytes less). David
Attachment
pgsql-hackers by date: