Thread: IPC::Run accepts bug reports
Separating this from the pytest thread: On Sat, Jun 15, 2024 at 01:26:57PM -0400, Robert Haas wrote: > The one > thing I know about that *I* think is a pretty big problem about Perl > is that IPC::Run is not really maintained. I don't see in https://github.com/cpan-authors/IPC-Run/issues anything affecting PostgreSQL. If you know of IPC::Run defects, please report them. If I knew of an IPC::Run defect affecting PostgreSQL, I likely would work on it before absurdity like https://github.com/cpan-authors/IPC-Run/issues/175 NetBSD-10-specific behavior coping.
On Sat, Jun 15, 2024 at 7:48 PM Noah Misch <noah@leadboat.com> wrote: > Separating this from the pytest thread: > > On Sat, Jun 15, 2024 at 01:26:57PM -0400, Robert Haas wrote: > > The one > > thing I know about that *I* think is a pretty big problem about Perl > > is that IPC::Run is not really maintained. > > I don't see in https://github.com/cpan-authors/IPC-Run/issues anything > affecting PostgreSQL. If you know of IPC::Run defects, please report them. > If I knew of an IPC::Run defect affecting PostgreSQL, I likely would work on > it before absurdity like https://github.com/cpan-authors/IPC-Run/issues/175 > NetBSD-10-specific behavior coping. I'm not concerned about any specific open issue; my concern is about the health of that project. https://metacpan.org/pod/IPC::Run says that this module is seeking new maintainers, and it looks like the people listed as current maintainers are mostly inactive. Instead, you're fixing stuff. That's great, but we ideally want PostgreSQL's dependencies to be things that are used widely enough that we don't end up maintaining them ourselves. I apologize if my comment came across as disparaging your efforts; that was not my intent. -- Robert Haas EDB: http://www.enterprisedb.com
Hi, On 2024-06-15 16:48:24 -0700, Noah Misch wrote: > Separating this from the pytest thread: > > On Sat, Jun 15, 2024 at 01:26:57PM -0400, Robert Haas wrote: > > The one > > thing I know about that *I* think is a pretty big problem about Perl > > is that IPC::Run is not really maintained. > > I don't see in https://github.com/cpan-authors/IPC-Run/issues anything > affecting PostgreSQL. If you know of IPC::Run defects, please report them. > If I knew of an IPC::Run defect affecting PostgreSQL, I likely would work on > it before absurdity like https://github.com/cpan-authors/IPC-Run/issues/175 > NetBSD-10-specific behavior coping. 1) Sometimes hangs hard on windows if started processes have not been shut down before script exits. I've mostly encountered this via the buildfarm / CI, so I never had a good way of narrowing this down. It's very painful because things seem to often just get stuck once that happens. 2) If a subprocess dies in an inopportune moment, IPC::Run dies with "ack Broken pipe:" (in _do_filters()). There's plenty reports of this on the list, and I've hit this several times personally. It seems to be timing dependent, I've encountered it after seemingly irrelevant ordering changes. I suspect I could create a reproducer with a bit of time. 3) It's very slow on windows (in addition to the windows process slowness). That got to be fixable to some degree. Greetings, Andres Freund
On Mon, Jun 17, 2024 at 11:11:17AM -0700, Andres Freund wrote: > On 2024-06-15 16:48:24 -0700, Noah Misch wrote: > > On Sat, Jun 15, 2024 at 01:26:57PM -0400, Robert Haas wrote: > > > The one > > > thing I know about that *I* think is a pretty big problem about Perl > > > is that IPC::Run is not really maintained. > > > > I don't see in https://github.com/cpan-authors/IPC-Run/issues anything > > affecting PostgreSQL. If you know of IPC::Run defects, please report them. > > If I knew of an IPC::Run defect affecting PostgreSQL, I likely would work on > > it before absurdity like https://github.com/cpan-authors/IPC-Run/issues/175 > > NetBSD-10-specific behavior coping. > > 1) Sometimes hangs hard on windows if started processes have not been shut > down before script exits. I've mostly encountered this via the buildfarm / > CI, so I never had a good way of narrowing this down. It's very painful > because things seem to often just get stuck once that happens. That's bad. Do you have a link to a log, a thread discussing it, or even just one of the test names experiencing it? > 2) If a subprocess dies in an inopportune moment, IPC::Run dies with "ack > Broken pipe:" (in _do_filters()). There's plenty reports of this on the > list, and I've hit this several times personally. It seems to be timing > dependent, I've encountered it after seemingly irrelevant ordering changes. > > I suspect I could create a reproducer with a bit of time. I've seen that one. If the harness has data to write to a child, the child exiting before the write is one way to reach that. Perhaps before exec(), IPC::Run should do a non-blocking write from each pending IO. That way, small writes never experience the timing-dependent behavior. > 3) It's very slow on windows (in addition to the windows process > slowness). That got to be fixable to some degree. Agreed. For the next release, today's git has some optimizations. There are other known-possible Windows optimizations not implemented.
Hi, On 2024-06-18 10:10:17 -0700, Noah Misch wrote: > On Mon, Jun 17, 2024 at 11:11:17AM -0700, Andres Freund wrote: > > On 2024-06-15 16:48:24 -0700, Noah Misch wrote: > > > On Sat, Jun 15, 2024 at 01:26:57PM -0400, Robert Haas wrote: > > > > The one > > > > thing I know about that *I* think is a pretty big problem about Perl > > > > is that IPC::Run is not really maintained. > > > > > > I don't see in https://github.com/cpan-authors/IPC-Run/issues anything > > > affecting PostgreSQL. If you know of IPC::Run defects, please report them. > > > If I knew of an IPC::Run defect affecting PostgreSQL, I likely would work on > > > it before absurdity like https://github.com/cpan-authors/IPC-Run/issues/175 > > > NetBSD-10-specific behavior coping. > > > > 1) Sometimes hangs hard on windows if started processes have not been shut > > down before script exits. I've mostly encountered this via the buildfarm / > > CI, so I never had a good way of narrowing this down. It's very painful > > because things seem to often just get stuck once that happens. > > That's bad. Do you have a link to a log, a thread discussing it, or even just > one of the test names experiencing it? I'm unfortunately blanking on the right keyword right now. I think it basically required not shutting down a process started in the background with IPC::Run. I'll try to repro it by removing some ->finish or ->quit calls. There's also a bunch of tests that have blocks like # some Windows Perls at least don't like IPC::Run's start/kill_kill regime. skip "Test fails on Windows perl", 2 if $Config{osname} eq 'MSWin32'; Some of them may have been related to this. > > 2) If a subprocess dies in an inopportune moment, IPC::Run dies with "ack > > Broken pipe:" (in _do_filters()). There's plenty reports of this on the > > list, and I've hit this several times personally. It seems to be timing > > dependent, I've encountered it after seemingly irrelevant ordering changes. > > > > I suspect I could create a reproducer with a bit of time. > > I've seen that one. If the harness has data to write to a child, the child > exiting before the write is one way to reach that. Perhaps before exec(), > IPC::Run should do a non-blocking write from each pending IO. That way, small > writes never experience the timing-dependent behavior. I think the question is rather, why is ipc run choosing to die in this situation and can that be fixed? > > 3) It's very slow on windows (in addition to the windows process > > slowness). That got to be fixable to some degree. > > Agreed. For the next release, today's git has some optimizations. There are > other known-possible Windows optimizations not implemented. Yay! Greetings, Andres Freund
On 2024-06-18 Tu 3:00 PM, Andres Freund wrote:
Hi, On 2024-06-18 10:10:17 -0700, Noah Misch wrote:On Mon, Jun 17, 2024 at 11:11:17AM -0700, Andres Freund wrote:On 2024-06-15 16:48:24 -0700, Noah Misch wrote:On Sat, Jun 15, 2024 at 01:26:57PM -0400, Robert Haas wrote:The one thing I know about that *I* think is a pretty big problem about Perl is that IPC::Run is not really maintained.I don't see in https://github.com/cpan-authors/IPC-Run/issues anything affecting PostgreSQL. If you know of IPC::Run defects, please report them. If I knew of an IPC::Run defect affecting PostgreSQL, I likely would work on it before absurdity like https://github.com/cpan-authors/IPC-Run/issues/175 NetBSD-10-specific behavior coping.1) Sometimes hangs hard on windows if started processes have not been shut down before script exits. I've mostly encountered this via the buildfarm / CI, so I never had a good way of narrowing this down. It's very painful because things seem to often just get stuck once that happens.That's bad. Do you have a link to a log, a thread discussing it, or even just one of the test names experiencing it?I'm unfortunately blanking on the right keyword right now. I think it basically required not shutting down a process started in the background with IPC::Run. I'll try to repro it by removing some ->finish or ->quit calls. There's also a bunch of tests that have blocks like # some Windows Perls at least don't like IPC::Run's start/kill_kill regime. skip "Test fails on Windows perl", 2 if $Config{osname} eq 'MSWin32'; Some of them may have been related to this.
I only found one of those, in src/test/recovery/t/006_logical_decoding.pl, which seems to be the only place we use kill_kill at all. That comment dates back to 2017, so maybe a more modern perl and/or IPC::Run will improve matters.
It's not clear to me why that code isn't calling finish() before trying kill_kill(). That's what the IPC::Run docs seem to suggest you should do.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
Hi, On 2024-06-18 12:00:13 -0700, Andres Freund wrote: > On 2024-06-18 10:10:17 -0700, Noah Misch wrote: > > > 1) Sometimes hangs hard on windows if started processes have not been shut > > > down before script exits. I've mostly encountered this via the buildfarm / > > > CI, so I never had a good way of narrowing this down. It's very painful > > > because things seem to often just get stuck once that happens. > > > > That's bad. Do you have a link to a log, a thread discussing it, or even just > > one of the test names experiencing it? > > I'm unfortunately blanking on the right keyword right now. > > I think it basically required not shutting down a process started in the > background with IPC::Run. > > I'll try to repro it by removing some ->finish or ->quit calls. Yep, that did it. It reliably reproduces if I comment out the lines below # explicitly shut down psql instances gracefully - to avoid hangs # or worse on windows in 021_row_visibility.pl The logfile ends in Warning: unable to close filehandle GEN25 properly: Bad file descriptor during global destruction. Warning: unable to close filehandle GEN20 properly: Bad file descriptor during global destruction. Even if I cancel the test, I can't rerun it because due to a leftover psql a) a new temp install can't be made (could be solved by rm -rf) b) the test's logfile can't be removed (couldn't even rename the directory) The psql instance needs to be found and terminated first. Greetings, Andres Freund
On Tue, Jun 18, 2024 at 08:07:27PM -0700, Andres Freund wrote: > > > > 1) Sometimes hangs hard on windows if started processes have not been shut > > > > down before script exits. > It reliably reproduces if I comment out > the lines below > # explicitly shut down psql instances gracefully - to avoid hangs > # or worse on windows > in 021_row_visibility.pl > > The logfile ends in > Warning: unable to close filehandle GEN25 properly: Bad file descriptor during global destruction. > Warning: unable to close filehandle GEN20 properly: Bad file descriptor during global destruction. > > > Even if I cancel the test, I can't rerun it because due to a leftover psql > a) a new temp install can't be made (could be solved by rm -rf) > b) the test's logfile can't be removed (couldn't even rename the directory) > > The psql instance needs to be found and terminated first. Thanks for that recipe. I've put that in my queue to fix. On Tue, Jun 18, 2024 at 12:00:13PM -0700, Andres Freund wrote: > On 2024-06-18 10:10:17 -0700, Noah Misch wrote: > > On Mon, Jun 17, 2024 at 11:11:17AM -0700, Andres Freund wrote: > > > 2) If a subprocess dies in an inopportune moment, IPC::Run dies with "ack > > > Broken pipe:" (in _do_filters()). There's plenty reports of this on the > > > list, and I've hit this several times personally. It seems to be timing > > > dependent, I've encountered it after seemingly irrelevant ordering changes. > > > > > > I suspect I could create a reproducer with a bit of time. > > > > I've seen that one. If the harness has data to write to a child, the child > > exiting before the write is one way to reach that. Perhaps before exec(), > > IPC::Run should do a non-blocking write from each pending IO. That way, small > > writes never experience the timing-dependent behavior. > > I think the question is rather, why is ipc run choosing to die in this > situation and can that be fixed? With default signal handling, the process would die to SIGPIPE. Since PostgreSQL::Test ignores SIGPIPE, this happens instead. The IPC::Run source tree has no discussion of ignoring SIGPIPE, so I bet it didn't get a conscious decision. Perhaps it can do better.
On Mon, Jun 17, 2024 at 01:56:46PM -0400, Robert Haas wrote: > On Sat, Jun 15, 2024 at 7:48 PM Noah Misch <noah@leadboat.com> wrote: > > Separating this from the pytest thread: > > > > On Sat, Jun 15, 2024 at 01:26:57PM -0400, Robert Haas wrote: > > > The one > > > thing I know about that *I* think is a pretty big problem about Perl > > > is that IPC::Run is not really maintained. > > > > I don't see in https://github.com/cpan-authors/IPC-Run/issues anything > > affecting PostgreSQL. If you know of IPC::Run defects, please report them. > > If I knew of an IPC::Run defect affecting PostgreSQL, I likely would work on > > it before absurdity like https://github.com/cpan-authors/IPC-Run/issues/175 > > NetBSD-10-specific behavior coping. > > I'm not concerned about any specific open issue; my concern is about > the health of that project. https://metacpan.org/pod/IPC::Run says > that this module is seeking new maintainers, and it looks like the > people listed as current maintainers are mostly inactive. Instead, > you're fixing stuff. That's great, but we ideally want PostgreSQL's > dependencies to be things that are used widely enough that we don't > end up maintaining them ourselves. That's reasonable to want. > I apologize if my comment came across as disparaging your efforts; > that was not my intent. It did not come across as disparaging.
Hello Noah, 16.06.2024 02:48, Noah Misch wrote: > I don't see in https://github.com/cpan-authors/IPC-Run/issues anything > affecting PostgreSQL. If you know of IPC::Run defects, please report them. > If I knew of an IPC::Run defect affecting PostgreSQL, I likely would work on > it before absurdity like https://github.com/cpan-authors/IPC-Run/issues/175 > NetBSD-10-specific behavior coping. It looks like a recent indri failure [1] revealed one more IPC::Run anomaly. The failure log contains: # Running: pgbench -n -t 1 -Dfoo=bla -Dnull=null -Dtrue=true -Done=1 -Dzero=0.0 -Dbadtrue=trueXXX -Dmaxint=9223372036854775807 -Dminint=-9223372036854775808 -M prepared -f /Users/buildfarm/bf-data/HEAD/pgsql.build/src/bin/pgbench/tmp_check/t_001_pgbench_with_server_main_data/001_pgbench_error_sleep_undefined_variable [22:38:14.887](0.014s) ok 362 - pgbench script error: sleep undefined variable status (got 2 vs expected 2) [22:38:14.887](0.000s) ok 363 - pgbench script error: sleep undefined variable stdout /(?^:processed: 0/1)/ [22:38:14.887](0.000s) not ok 364 - pgbench script error: sleep undefined variable stderr /(?^:sleep: undefined variable)/ [22:38:14.887](0.000s) [22:38:14.887](0.000s) # Failed test 'pgbench script error: sleep undefined variable stderr /(?^:sleep: undefined variable)/' # at t/001_pgbench_with_server.pl line 1242. [22:38:14.887](0.000s) # '' # doesn't match '(?^:sleep: undefined variable)' So the pgbench process exited as expected, stdout contained expected string, but stderr happened to be empty. Maybe such behavior is specific to macOS, and even on indri it's the only failure of that ilk out of 2000+ runs (and I couldn't reproduce this in a Linux VM), but I find this place in IPC::Run suspicious: sub _read { ... my $r = POSIX::read( $_[0], $s, 10_000 ); croak "$!: read( $_[0] )" if not($r) and !$!{EINTR}; That is, EINTR kind of recognized as an expected error, but there is no retry in this case. Thus, with the following modification, which simulates read() failed with EINTR: sub _read { confess 'undef' unless defined $_[0]; my $s = ''; - my $r = POSIX::read( $_[0], $s, 10_000 ); + my $r; +if (int(rand(100)) == 0) +{ + $r = 0; $! = Errno::EINTR; +} +else +{ + $r = POSIX::read( $_[0], $s, 10_000 ); +} croak "$!: read( $_[0] )" if not($r) and !$!{EINTR}; I can see failures like the one in question when running that test. Perhaps, I could reproduce the issue with a program, that sends signals to running (pgbench) processes (and thus interrupts read()), if it makes sense. What do you think? [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=indri&dt=2024-10-02%2002%3A34%3A16 Best regards, Alexander
On Fri, Oct 04, 2024 at 02:00:00PM +0300, Alexander Lakhin wrote: > sub _read { > ... > my $r = POSIX::read( $_[0], $s, 10_000 ); > croak "$!: read( $_[0] )" if not($r) and !$!{EINTR}; > > That is, EINTR kind of recognized as an expected error, but there is no > retry in this case. Thus, with the following modification, which simulates > read() failed with EINTR: > sub _read { > confess 'undef' unless defined $_[0]; > my $s = ''; > - my $r = POSIX::read( $_[0], $s, 10_000 ); > + my $r; > +if (int(rand(100)) == 0) > +{ > + $r = 0; $! = Errno::EINTR; > +} > +else > +{ > + $r = POSIX::read( $_[0], $s, 10_000 ); > +} > croak "$!: read( $_[0] )" if not($r) and !$!{EINTR}; > > I can see failures like the one in question when running that test. That makes sense. Would you file this at https://github.com/cpan-authors/IPC-Run/issues? I suppose that code should become roughly: do { $r = POSIX::read(...) } while (!defined($r) && $!{EINTR}); croak ... unless defined($r); > Perhaps, I could reproduce the issue with a program, that sends signals > to running (pgbench) processes (and thus interrupts read()), if it makes > sense. That should work. Best would be an IPC::Run test case, like the existing t/eintr.t that makes select() report EINTR. That said, IPC::Run could adopt the retry without a test. > [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=indri&dt=2024-10-02%2002%3A34%3A16
Hello Noah, 04.10.2024 21:57, Noah Misch wrote: > That makes sense. Would you file this at > https://github.com/cpan-authors/IPC-Run/issues? I suppose that code should > become roughly: > > do { $r = POSIX::read(...) } while (!defined($r) && $!{EINTR}); > croak ... unless defined($r); > Just for reference: I've filed the bug report at: https://github.com/cpan-authors/IPC-Run/issues/176 Best regards, Alexander