Re: pgsql: Move named LWLock tranche requests to shared memory. - Mailing list pgsql-committers

From Michael Paquier
Subject Re: pgsql: Move named LWLock tranche requests to shared memory.
Date
Msg-id aMoejB3iTWy1SxfF@paquier.xyz
Whole thread Raw
In response to pgsql: Move named LWLock tranche requests to shared memory.  (Nathan Bossart <nathan@postgresql.org>)
Responses Re: pgsql: Move named LWLock tranche requests to shared memory.
List pgsql-committers
Ni Nathan,

On Thu, Sep 11, 2025 at 09:15:12PM +0000, Nathan Bossart wrote:
> Move named LWLock tranche requests to shared memory.
>
> In EXEC_BACKEND builds, GetNamedLWLockTranche() can segfault when
> called outside of the postmaster process, as it might access
> NamedLWLockTrancheRequestArray, which won't be initialized.  Given
> the lack of reports, this is apparently unusual, presumably because
> it is usually called from a shmem_startup_hook like this:

Since this commit has been merged, batta has kept failing.  Here is
the first failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=batta&dt=2025-09-12%2002%3A05%3A01

I use this animal with a specific configuration:
shared_preload_libraries = 'pg_stat_statements'
compute_query_id = regress
regress_dump_restore
wal_consistency_checking
--enable-injection-points

The recovery tests 013_crash_restart.pl, 022_crash_temp_files.pl and
041_checkpoint_at_promote.pl stress some restart scenarios, not all
use injection points.  I could not get a backtrace from the host.

However, I have come up with the following change in 013 that's able
to reproduce what I think is the same crash:
--- a/src/test/recovery/t/013_crash_restart.pl
+++ b/src/test/recovery/t/013_crash_restart.pl
@@ -21,6 +21,8 @@ my $psql_timeout = IPC::Run::timer($PostgreSQL::Test::Utils::timeout_default);

 my $node = PostgreSQL::Test::Cluster->new('primary');
 $node->init(allows_streaming => 1);
+$node->append_conf('postgresql.conf',
+                   "shared_preload_libraries = 'pg_stat_statements'");
 $node->start();

And here is the backtrace:
#0  0x000055fcdf6bc97a in NumLWLocksForNamedTranches () at lwlock.c:385
385 numLocks += NamedLWLockTrancheRequestArray[i].num_lwlocks;
(gdb) bt
#0  0x000055fcdf6bc97a in NumLWLocksForNamedTranches () at lwlock.c:385
#1  0x000055fcdf6bc9b3 in LWLockShmemSize () at lwlock.c:400
#2  0x000055fcdf65bda5 in CalculateShmemSize (num_semaphores=0x7ffcaf7a78e4) at ipci.c:130
#3  0x000055fcdf65c0b1 in CreateSharedMemoryAndSemaphores () at ipci.c:210
#4  0x000055fcdf42830c in PostmasterStateMachine () at postmaster.c:3223
#5  0x000055fcdf42703f in process_pm_child_exit () at postmaster.c:2558
#6  0x000055fcdf425729 in ServerLoop () at postmaster.c:1696
#7  0x000055fcdf424be1 in PostmasterMain (argc=4, argv=0x55fd0a8faa10) at postmaster.c:1403
#8  0x000055fcdef80a19 in main (argc=4, argv=0x55fd0a8faa10) at main.c:231
(gdb) p i
$3 = 0
(gdb) p NamedLWLockTrancheRequestArray[0]
Cannot access memory at address 0x7f15ee4ccc08

Thanks,
--
Michael

Attachment

pgsql-committers by date:

Previous
From: Michael Paquier
Date:
Subject: pgsql: injection_points: Fix incrementation of variable-numbered stats
Next
From: Thomas Munro
Date:
Subject: pgsql: jit: Fix type used for Datum values in LLVM IR.