On Wed, Mar 5, 2025 at 12:16 PM Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Wed, 2025-03-05 at 11:33 -0800, James Hunter wrote:
> > For a bitfield, however, the CPU has to read from or write to the
> > byte
> > that contains the bit, but then it also has to mask out the *other*
> > bits in that bitfield. This is a data dependency, so it stalls the
> > CPU
> > pipeline.
>
> Here the bits aren't changing, so we're only talking about mask-and-
> test, right? My intuition is that wouldn't cause much of a problem.
Right, so that's just +1 pipeline stall (load, mask, and test; vs.
just load and test). But you can imagine microbenchmarks / situations
where that extra "mask" matters (like some of the benchmarks David
ran). "Mask + test" has to wait for the mask to complete, before it
can perform the test; so it's slower than two independent instructions
would be.
But -- cost vs. benefit, a Boolean is typically 1 byte; a cache line
is typically 64 bytes, with maybe CPU prefetch making it behave like
128 bytes. So replacing 6 bytes of Booleans with 6 bits saves us < 10%
of a cache line -- it's "only" an 8-to-1 compression ratio -- which
might or might not be worth it, as David's benchmarks show...
James Hunter