On Thu, Oct 30, 2025 at 5:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Right. I wasn't excited about building out 16-bit atomics, not least
> because I'm unsure that those exist on every supported architecture.
> Instead I spent a little time thinking about how we could use a 32-bit
> atomic op here. Clearly, that should theoretically work, but you'd
> have to identify where is the start of the 32-bit word (t_infomask2
> sadly is not at a 4-byte boundary) and which bit within that word is
> the target bit (that's gonna vary depending on endianness at least).
> Seems like a pain in the rear, but probably still less work than
> creating 16-bit atomic ops.
I read that GCC and Clang will do exactly that for us automatically if
we use builtins (generic-gcc.h) or eventually <stdatomic.h> on
hardware without small atomics. AFAIK that's only RISC-V, which I
don't have, so I tried cross-compiling void f(atomic_char *x) {
atomic_fetch_or(x, 1); } and GCC happily generated a lr.w.aqrl,
sc.w.rl sequence with a bunch of bitswizzling. With the right magic
switches I think it should be able to spit out a single amoor.b
instruction instead (looks like it's the "zaamo" extension that added
sub-word atomics quite recently), but I couldn't immediately figure
out how to turn it on. I think every other ISA on our list can do it
in hardware except MIPS (RIP?).
> A vaguer thought is that we're not bound to represent the match
> bit in exactly the way it's done now, if there's some other choice
> that would be easier to fit into these concerns. The only hard
> limit I'd lay down is that making the struct bigger isn't OK.
Yeah. Some alternative match flag storage sketches came up in the PHJ
right join thread. I've wondered a few times about tagged pointers.
What if we stole the lower two bits of the pointers to tuples to mean
"matched" and "every tuple that follows me in the chain is also
matched"? In right join unmatched scans, and clearly in this case
too, we'd often avoid following pointers to matched tuples we're not
interested in, but we'd have to mask the tagged bits before
dereferencing. Otherwise I think it should be much like the earlier
description, eg specialisation could remove all the match-tracking,
masking and/or atomic ops depending on the plan.