Re: Use compiler intrinsics for bit ops in hash - Mailing list pgsql-hackers
From | Jesse Zhang |
---|---|
Subject | Re: Use compiler intrinsics for bit ops in hash |
Date | |
Msg-id | CAGf+fX4nqk4utcdC3ZGRFA-xcU4=JZ+rE_Zq3ex+pLCsi=8soA@mail.gmail.com Whole thread Raw |
In response to | Re: Use compiler intrinsics for bit ops in hash (David Fetter <david@fetter.org>) |
Responses |
Re: Use compiler intrinsics for bit ops in hash
|
List | pgsql-hackers |
On Tue, Jan 14, 2020 at 2:09 PM David Fetter <david@fetter.org> wrote: > > The changes in hash AM and SIMPLEHASH do look like a net positive > > improvement. My biggest cringe might be in pg_bitutils: > > > > 1. Is ceil_log2_64 dead code? > > Let's call it nascent code. I suspect there are places it could go, if > I look for them. Also, it seemed silly to have one without the other. > While not absolutely required, I'd like us to find at least one place and start using it. (Clang also nags at me when we have unused functions). > On Tue, Jan 14, 2020 at 12:21:41PM -0800, Jesse Zhang wrote: > > 4. It seems like you *really* would like an operation like LZCNT in x86 > > (first appearing in Haswell) that is well defined on zero input. ISTM > > the alternatives are: > > > > a) Special case 1. That seems straightforward, but the branching cost > > on a seemingly unlikely condition seems to be a lot of performance > > loss > > > > b) Use architecture specific intrinsic (and possibly with CPUID > > shenanigans) like __builtin_ia32_lzcnt_u64 on x86 and use the CLZ > > intrinsic elsewhere. The CLZ GCC intrinsic seems to map to > > instructions that are well defined on zero in most ISA's other than > > x86, so maybe we can get away with special-casing x86? i. We can detect LZCNT instruction by checking one of the "extended feature" (EAX=80000001) bits using CPUID. Unlike the "basic features" (EAX=1), extended feature flags have been more vendor-specific, but fortunately it seems that the feature bit for LZCNT is the same [1][2]. ii. We'll most likely still need to provide a fallback implementation for processors that don't have LZCNT (either because they are from a different vendor, or an older Intel/AMD processor). I wonder if simply checking for 1 is "good enough". Maybe a micro benchmark is in order? > Is there some way to tilt the tools so that this happens? We have a couple options here: 1. Use a separate object (a la our SSE 4.2 implemenation of CRC). On Clang and GCC (I don't have MSVC at hand), -mabm or -mlzcnt should cause __builtin_clz to generate the LZCNT instruction, which is well defined on zero input. The default configuration would translate __builtin_clz to code that subtracts BSR from the width of the input, but BSR leaves the destination undefined on zero input. 2. (My least favorite) use inline asm (a la our popcount implementation). > b) seems much more attractive. Is there some way to tilt the tools so > that this happens? What should I be reading up on? The enclosed references hopefully are good places to start. Let me know if you have more ideas. Cheers, Jesse References: [1] "How to detect New Instruction support in the 4th generation Intel® Core™ processor family" https://software.intel.com/en-us/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family [2] "Bit Manipulation Instruction Sets" https://en.wikipedia.org/wiki/Bit_Manipulation_Instruction_Sets
pgsql-hackers by date: