Re: Use POPCNT on MSVC - Mailing list pgsql-hackers
From | John Naylor |
---|---|
Subject | Re: Use POPCNT on MSVC |
Date | |
Msg-id | CAFBsxsHfGai31H2_ha2ZAMmGEZbkDgCop8OzdR05AX6CQkWvtQ@mail.gmail.com Whole thread Raw |
In response to | Re: Use POPCNT on MSVC (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Use POPCNT on MSVC
|
List | pgsql-hackers |
On Tue, Aug 3, 2021 at 11:36 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Tue, 3 Aug 2021 at 22:43, John Naylor <john.naylor@enterprisedb.com> wrote:
> > 1. the __popcnt64() intrinsic is put inside pg_popcount64_asm(), which is a bit of a misnomer since it's not assembly. Renaming s/_asm/_fast/ would help it look better. But then looking around, other platforms have intrinsics coded, but for some reason they're put in pg_popcount64_slow(), where the compiler will emit instructions we could easily write ourselves in C (and without #ifdefs) since without the right CFLAGS these intrinsics won't emit a popcnt instruction. Is MSVC different in that regard? If so, it might be worth a comment.
>
> Yeah, the names no longer fit so well after stuffing the MSVC
> intrinsic into the asm function. The reason I did it that way is down
> to what I read in the docs. Namely:
>
> "If you run code that uses these intrinsics on hardware that doesn't
> support the popcnt instruction, the results are unpredictable."
>
> So, it has to go somewhere where we're sure the CPU supports POPCNT
> and that seemed like the correct place.
>
> From what I understand of GCC and __builtin_popcountl(), the code
> it'll output will depend on the -mpopcnt flag. So having
> __builtin_popcountl() does not mean the popcnt instruction will be
> used. The Microsoft documentation indicates that's not the case for
> __popcnt().
Okay, "unpredictable" sounds bad.
> > 2. (defined(_MSC_VER) && defined(_WIN64) lead to a runtime check for the CPUID, which is fine, but now next to it HAVE_X86_64_POPCNTQ looks strange because the latter symbol comes from a specific configure test -- the two don't seem equivalent, but maybe they are because of what MSVC does? That would be nice to spell out here.
>
> The problem there is that we define HAVE_X86_64_POPCNTQ based on the
> outcome of configure so it does not get set for MSVC. Maybe it's
> worth coming up with some other constant that can be defined or we
> could just do:
>
> #if defined(_MSC_VER) && defined(_WIN64)
> #define HAVE_X86_64_POPCNTQ
> #endif
That seems fine. I don't know PG can build with Arm on Windows, but for the cpuid to work, it seems safer to also check for __x86_64?
> I think the reason for the asm is that __builtin_popcountl does not
> mean popcnt will be used. Maybe we could have done something like
> compile pg_bitutils.c with -mpopcnt, and have kept the run-time check,
> but the problem there is that means that the compiler might end up
> using that instruction in some other function in that file that we
> don't want it to. It looks like my patch in [1] did pass the -mpopcnt
> flag which Tom fixed.
Ah, that makes sense. (If we someday offer a configure option for x86-64-v2, we can use intrinsics in the asm functions, and call them directly. Yet another different thread.)
--
John Naylor
EDB: http://www.enterprisedb.com
>
> On Tue, 3 Aug 2021 at 22:43, John Naylor <john.naylor@enterprisedb.com> wrote:
> > 1. the __popcnt64() intrinsic is put inside pg_popcount64_asm(), which is a bit of a misnomer since it's not assembly. Renaming s/_asm/_fast/ would help it look better. But then looking around, other platforms have intrinsics coded, but for some reason they're put in pg_popcount64_slow(), where the compiler will emit instructions we could easily write ourselves in C (and without #ifdefs) since without the right CFLAGS these intrinsics won't emit a popcnt instruction. Is MSVC different in that regard? If so, it might be worth a comment.
>
> Yeah, the names no longer fit so well after stuffing the MSVC
> intrinsic into the asm function. The reason I did it that way is down
> to what I read in the docs. Namely:
>
> "If you run code that uses these intrinsics on hardware that doesn't
> support the popcnt instruction, the results are unpredictable."
>
> So, it has to go somewhere where we're sure the CPU supports POPCNT
> and that seemed like the correct place.
>
> From what I understand of GCC and __builtin_popcountl(), the code
> it'll output will depend on the -mpopcnt flag. So having
> __builtin_popcountl() does not mean the popcnt instruction will be
> used. The Microsoft documentation indicates that's not the case for
> __popcnt().
Okay, "unpredictable" sounds bad.
> > 2. (defined(_MSC_VER) && defined(_WIN64) lead to a runtime check for the CPUID, which is fine, but now next to it HAVE_X86_64_POPCNTQ looks strange because the latter symbol comes from a specific configure test -- the two don't seem equivalent, but maybe they are because of what MSVC does? That would be nice to spell out here.
>
> The problem there is that we define HAVE_X86_64_POPCNTQ based on the
> outcome of configure so it does not get set for MSVC. Maybe it's
> worth coming up with some other constant that can be defined or we
> could just do:
>
> #if defined(_MSC_VER) && defined(_WIN64)
> #define HAVE_X86_64_POPCNTQ
> #endif
That seems fine. I don't know PG can build with Arm on Windows, but for the cpuid to work, it seems safer to also check for __x86_64?
> I think the reason for the asm is that __builtin_popcountl does not
> mean popcnt will be used. Maybe we could have done something like
> compile pg_bitutils.c with -mpopcnt, and have kept the run-time check,
> but the problem there is that means that the compiler might end up
> using that instruction in some other function in that file that we
> don't want it to. It looks like my patch in [1] did pass the -mpopcnt
> flag which Tom fixed.
Ah, that makes sense. (If we someday offer a configure option for x86-64-v2, we can use intrinsics in the asm functions, and call them directly. Yet another different thread.)
--
John Naylor
EDB: http://www.enterprisedb.com
pgsql-hackers by date: