Re: Refactoring postmaster's code to cleanup after child exit - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: Refactoring postmaster's code to cleanup after child exit |
Date | |
Msg-id | 20250306044933.7a.nmisch@google.com Whole thread Raw |
In response to | Re: Refactoring postmaster's code to cleanup after child exit (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Refactoring postmaster's code to cleanup after child exit
|
List | pgsql-hackers |
On Tue, Mar 04, 2025 at 05:50:34PM -0500, Andres Freund wrote: > On 2024-12-09 00:12:32 +0100, Tomas Vondra wrote: > > [23:48:44.444](1.129s) ok 3 - reserved_connections limit > > [23:48:44.445](0.001s) ok 4 - reserved_connections limit: matches > > process ended prematurely at > > /home/user/work/postgres/src/test/postmaster/../../../src/test/perl/PostgreSQL/Test/BackgroundPsql.pm > > line 154. > > # Postmaster PID for node "primary" is 198592 > > > I just saw this failure on skink in the BF: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-03-04%2015%3A43%3A23 > > [17:05:56.438](0.247s) ok 3 - reserved_connections limit > [17:05:56.438](0.000s) ok 4 - reserved_connections limit: matches > process ended prematurely at /home/bf/bf-build/skink-master/HEAD/pgsql/src/test/perl/PostgreSQL/Test/BackgroundPsql.pmline 160. > > > > That BackgroundPsql.pm line is this in wait_connect() > > > > $self->{run}->pump() > > until $self->{stdout} =~ /$banner/ || $self->{timeout}->is_expired; > > A big part of the problem here imo is the exception behaviour that > IPC::Run::pump() has: > > If pump() is called after all harnessed activities have completed, a "process > ended prematurely" exception to be thrown. This allows for simple scripting > of external applications without having to add lots of error handling code at > each step of the script: > > Which is, uh, not very compatible with how we use IPC::Run (here and > elsewhere). Just ending the test because a connection failed is pretty awful. Historically, I think we've avoided this sort of trouble by doing pipe I/O only on processes where we feel able to predict when the process will exit. Commit f44b9b6 is one example (simpler case, not involving pump()). It would be a nice improvement to do better, since there's always some risk of unexpected exit. > This behaviour makes it really hard to debug problems. It'd have been a lot > easier to understand the problem if we'd seen psql's stderr before the test > died. > > I guess that mean at the very least we'd need to put an eval {} around the > ->pump() call., print $self->{stdout}, ->{stderr} and reraise an error? That sounds right. Officially, you could call ->pumpable() before ->pump(). It's defined as 'Returns TRUE if calling pump() won't throw an immediate "process ended prematurely" exception.' I lack high confidence that it avoids the exception, because the pump() still calls pumpable()->reap_nb()->waitpid(WNOHANG) and may decide "process ended prematurely" based on the new finding. In other words, I bet there would be a TOCTOU defect in "$h->pump if $h->pumpable". > Presumably not just in in wait_connect(), but also at least in pump_until()? If the goal is to have it capture maximum data from processes that exit when we don't expect it (seems good to me), yes.
pgsql-hackers by date: