Re: `pg_ctl init` crashes when run concurrently; semget(2) suspected - Mailing list pgsql-hackers

From Burd, Greg
Subject Re: `pg_ctl init` crashes when run concurrently; semget(2) suspected
Date
Msg-id B03D0B7E-0521-4E74-8884-312C1D530D67@burd.me
Whole thread Raw
In response to Re: `pg_ctl init` crashes when run concurrently; semget(2) suspected  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
> On Aug 13, 2025, at 8:01 PM, Thomas Munro <thomas.munro@gmail.com> wrote:
>
> On Wed, Aug 13, 2025 at 7:38 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>> Here's a new attempt at that.  It picks futexes automatically, unless
>> you export MACOSX_DEPLOYMENT_TARGET=14.3 or lower, and then it picks
>> sysv as before.

Thomas, thanks for digging into this platform specific issue.  I've read
your patch and done some research.  Thoughts below...

> I compared my hand-rolled atomics logic against FreeBSD's user space
> semaphores[1].  sem_post() and sem_trywait() are about the same, and I
> guess there just aren't many ways to write those.  Minor differences:
>
> 1. They use CAS in sem_post() because they want to report EOVERFLOW if
> you exceed SEM_VALUE_MAX, but POSIX doesn't seem to require that, so I
> just used fetch-and-add.  Is that bad?  I noticed their alternative
> older version (remove "_new" from [1]) also uses a whole 32 bit
> counter like me, so perhaps isn't so worried about exceeding a narrow
> field of bits, and does it my way.

The Open Group Base Specifications Issue 7 (POSIX.1-2008) [1] says that
the sem_post() function shall fail if the maximum allowable value of the
semaphore would be exceeded [EOVERFLOW].  So, maybe a CAS is the way to
go after all?

> 2.  In sem_trywait() they reload the value while our
> pg_atomic_compare_exchange_u32() does that for you, and I also prefer
> to avoid assignment expression syntax if I can.
>
> On the other hand my v1 sem_wait() seemed a little janky and weird in
> comparison to theirs, and I was a bit worried about bugs.  I thought
> about rewriting it to look more like that, but then I realised that I
> could use sem_trywait() to implement sem_wait() for the same net
> effect.  It's much shorter and sweeter and more intuitively
> understandable, I think?

So if I understand it the way your sem_trywait() works is:
 * Try non-blocking acquire with pg_sem_trywait()
 * If fails (semaphore = 0): Call pg_futex_wait_u32() to block efficiently
 * When woken up: Loop back to try again

This is essentially how sem_trywait() implementations work in Linux IIRC,
they use futexes or similar kernel primitives for efficient blocking.

> I also didn't bother to supply a couple of functions that posix_sema.c
> doesn't use.

I think that's fine for now.

> There's no immediately obvious performance difference, but I wasn't
> expecting there to be: it's used in the slow path of our LWLocks and
> they already have similar user space atomics in front, so you don't
> really avoid entering the kernel with this technique except perhaps
> just occasionally if you wait just as someone posts.  This is all just
> about the convenience of bannishing System V cruft.  It doesn't seem
> to be any slower though, in simple testing, and CI might even be a bit
> faster based on spot checks of a few runs... but proper statistical
> tools might be needed to see if that is a real phenomenon.

That lines up with my intuition after reading the patch.  I don't know
that it is even worth diving too deep in performance testing this.

> Anyone got any better ideas for how to organise the build scripts,
> control the feature, naming etc?  I came up with is this, which
> automatically falls back to sysv if it can't find what it needs
> (respecting Apple's deployment target thing following what we did for
> preadv):
>
>      meson.build:        sema_type = "unnamed_posix+emulation"
>      configure template: PREFERRED_SEMAPHORES="UNNAMED_POSIX+EMULATION"

I'm not an expert on Meson, but I do have years of Autoconf work under
my belt.  Offhand this seems reasonable, but I'm new to this project.

> The pg_futex.h abstraction layer doesn't have any support for other
> OSes yet because I tried to pare this patch down to the absolute
> minimum to solve this problem, but I did try at least a little bit to
> anticipate that (having removed it from earlier versions from a few
> years ago) when designing the futex contracts.  We could trivially add
> at least OpenBSD support to use emulated semaphores there.  I never
> quite figured out whether the NetBSD futex API is really available
> outside its Linux syscall emulator, but that might be doable too.  And
> futexes might of course turn out to have applications in other
> synchronisation primitives.
>
> Any thoughts on whether this is worth pursuing, and what kinds of
> validation might be useful?
>
> [1] https://github.com/freebsd/freebsd-src/blob/main/lib/libc/gen/sem_new.c
> <v2-0001-Use-futex-based-semaphore-emulation-on-macOS.patch>

Nit: pg_sem_init() doesn't return EINVAL if value > SEM_VALUE_MAX

Overall, I like the approach, it addresses a real limitation on macOS and
could be expanded in the future to cover other similar issues on other
platforms.

best.

-greg

[1] https://pubs.opengroup.org/onlinepubs/9799919799.2024edition/functions/sem_post.html




pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: Remove Instruction Synchronization Barrier in spin_delay() for ARM64 architecture
Next
From: Peter Geoghegan
Date:
Subject: Re: index prefetching