Re: GNU/Hurd portability patches - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: GNU/Hurd portability patches
Date
Msg-id CA+hUKGJE_moUF74c97GkoP6RaknRMoeFOednXe2FyXnS_bOTFQ@mail.gmail.com
Whole thread Raw
In response to Re: GNU/Hurd portability patches  (Michael Banck <mbanck@gmx.net>)
List pgsql-hackers
[Using this as a general GNU/Hurd problem thread]

An interesting fruitcrow failure:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fruitcrow&dt=2025-09-30%2007%3A28%3A50

TRAP: failed Assert("postgres_signal_arg < PG_NSIG"), File:
"pqsignal.c", Line: 91, PID: 25731
postgres(ExceptionalCondition+0x5a) [0x1006b1d0a]
postgres(+0x711cf2) [0x100711cf2]
/lib/x86_64-gnu/libc.so.0.3(+0x39fee) [0x102bdffee]
/lib/x86_64-gnu/libc.so.0.3(+0x39fdd) [0x102bdffdd]
2025-09-30 08:38:59.451 BST [24668:6] LOG:  client backend (PID 25731)
was terminated by signal 6: Aborted

Our definition of NSIG is:

#ifdef PG_SIGNAL_COUNT          /* Windows */
#define PG_NSIG (PG_SIGNAL_COUNT)
#elif defined(NSIG)
#define PG_NSIG (NSIG)
#else
#define PG_NSIG (64)            /* XXX: wild guess */
#endif

Is NSIG defined?  Where on the internet can we see the SIGXXX signal
numbers and the glibc source that is actually used on these systems?
This has to be handling something installed by pqsignal(), so I guess
it's probably not the synchronous SIGABRT from abort() expected in
ExceptionCondition() (assuming that abort() is implemented as
raise(SIGABRT) in the traditional way, which might not be true), so
then I guess it must be an asynchronous signal, but which one?

Searching for that error in our archives brought up another platform
that saw the same assertion fail[1].  There it smelled a bit like an
uninitialised value somehow finishing up in there, maybe related to
valgrind, but I have no idea whether or how that relates to this
failure.

The main thing I learned while failing to find the values for those
symbols for myself was that it implements asynchronous signals in an
unorthodox way akin to Windows' SIGINT mechanism:

"The UNIX signalling mechanism is implemented for the GNU Hurd by
means of a separate signal thread that is part of every user-space
process. This makes handling of signals a separate thread of control.
GNU Mach itself has no idea what a signal is and kill is not a system
call (as it typically is in a UNIX system): it's implemented in
glibc." - glibc docs[2]

I haven't investigated the details or implications, but huh, I wonder
what that can break in our code...  We're working on booting
asynchronous signals out of the code for various reasons so this might
already or at least soon be a non-issue, but still.

I've so far resisted the urge to spin up a Debian GNU/Hurd box to
figure any of that out for myself, but maybe someone has a clue...

[1] https://www.postgresql.org/message-id/flat/Z8z6EaT89FL7UUBU%40nathan#ed792121e7d146c44c2941f50a1d3142
[2] https://www.gnu.org/software/hurd/glibc/signal.html



pgsql-hackers by date:

Previous
From: jian he
Date:
Subject: Re: speedup COPY TO for partitioned table.
Next
From: Chao Li
Date:
Subject: Re: speedup COPY TO for partitioned table.