Re: `pg_ctl init` crashes when run concurrently; semget(2) suspected - Mailing list pgsql-hackers
From | Burd, Greg |
---|---|
Subject | Re: `pg_ctl init` crashes when run concurrently; semget(2) suspected |
Date | |
Msg-id | B03D0B7E-0521-4E74-8884-312C1D530D67@burd.me Whole thread Raw |
In response to | Re: `pg_ctl init` crashes when run concurrently; semget(2) suspected (Thomas Munro <thomas.munro@gmail.com>) |
List | pgsql-hackers |
> On Aug 13, 2025, at 8:01 PM, Thomas Munro <thomas.munro@gmail.com> wrote: > > On Wed, Aug 13, 2025 at 7:38 PM Thomas Munro <thomas.munro@gmail.com> wrote: >> Here's a new attempt at that. It picks futexes automatically, unless >> you export MACOSX_DEPLOYMENT_TARGET=14.3 or lower, and then it picks >> sysv as before. Thomas, thanks for digging into this platform specific issue. I've read your patch and done some research. Thoughts below... > I compared my hand-rolled atomics logic against FreeBSD's user space > semaphores[1]. sem_post() and sem_trywait() are about the same, and I > guess there just aren't many ways to write those. Minor differences: > > 1. They use CAS in sem_post() because they want to report EOVERFLOW if > you exceed SEM_VALUE_MAX, but POSIX doesn't seem to require that, so I > just used fetch-and-add. Is that bad? I noticed their alternative > older version (remove "_new" from [1]) also uses a whole 32 bit > counter like me, so perhaps isn't so worried about exceeding a narrow > field of bits, and does it my way. The Open Group Base Specifications Issue 7 (POSIX.1-2008) [1] says that the sem_post() function shall fail if the maximum allowable value of the semaphore would be exceeded [EOVERFLOW]. So, maybe a CAS is the way to go after all? > 2. In sem_trywait() they reload the value while our > pg_atomic_compare_exchange_u32() does that for you, and I also prefer > to avoid assignment expression syntax if I can. > > On the other hand my v1 sem_wait() seemed a little janky and weird in > comparison to theirs, and I was a bit worried about bugs. I thought > about rewriting it to look more like that, but then I realised that I > could use sem_trywait() to implement sem_wait() for the same net > effect. It's much shorter and sweeter and more intuitively > understandable, I think? So if I understand it the way your sem_trywait() works is: * Try non-blocking acquire with pg_sem_trywait() * If fails (semaphore = 0): Call pg_futex_wait_u32() to block efficiently * When woken up: Loop back to try again This is essentially how sem_trywait() implementations work in Linux IIRC, they use futexes or similar kernel primitives for efficient blocking. > I also didn't bother to supply a couple of functions that posix_sema.c > doesn't use. I think that's fine for now. > There's no immediately obvious performance difference, but I wasn't > expecting there to be: it's used in the slow path of our LWLocks and > they already have similar user space atomics in front, so you don't > really avoid entering the kernel with this technique except perhaps > just occasionally if you wait just as someone posts. This is all just > about the convenience of bannishing System V cruft. It doesn't seem > to be any slower though, in simple testing, and CI might even be a bit > faster based on spot checks of a few runs... but proper statistical > tools might be needed to see if that is a real phenomenon. That lines up with my intuition after reading the patch. I don't know that it is even worth diving too deep in performance testing this. > Anyone got any better ideas for how to organise the build scripts, > control the feature, naming etc? I came up with is this, which > automatically falls back to sysv if it can't find what it needs > (respecting Apple's deployment target thing following what we did for > preadv): > > meson.build: sema_type = "unnamed_posix+emulation" > configure template: PREFERRED_SEMAPHORES="UNNAMED_POSIX+EMULATION" I'm not an expert on Meson, but I do have years of Autoconf work under my belt. Offhand this seems reasonable, but I'm new to this project. > The pg_futex.h abstraction layer doesn't have any support for other > OSes yet because I tried to pare this patch down to the absolute > minimum to solve this problem, but I did try at least a little bit to > anticipate that (having removed it from earlier versions from a few > years ago) when designing the futex contracts. We could trivially add > at least OpenBSD support to use emulated semaphores there. I never > quite figured out whether the NetBSD futex API is really available > outside its Linux syscall emulator, but that might be doable too. And > futexes might of course turn out to have applications in other > synchronisation primitives. > > Any thoughts on whether this is worth pursuing, and what kinds of > validation might be useful? > > [1] https://github.com/freebsd/freebsd-src/blob/main/lib/libc/gen/sem_new.c > <v2-0001-Use-futex-based-semaphore-emulation-on-macOS.patch> Nit: pg_sem_init() doesn't return EINVAL if value > SEM_VALUE_MAX Overall, I like the approach, it addresses a real limitation on macOS and could be expanded in the future to cover other similar issues on other platforms. best. -greg [1] https://pubs.opengroup.org/onlinepubs/9799919799.2024edition/functions/sem_post.html
pgsql-hackers by date: