Thread: pgsql: Consistently test for in-use shared memory.
Consistently test for in-use shared memory. postmaster startup scrutinizes any shared memory segment recorded in postmaster.pid, exiting if that segment matches the current data directory and has an attached process. When the postmaster.pid file was missing, a starting postmaster used weaker checks. Change to use the same checks in both scenarios. This increases the chance of a startup failure, in lieu of data corruption, if the DBA does "kill -9 `head -n1 postmaster.pid` && rm postmaster.pid && pg_ctl -w start". A postmaster will no longer stop if shmat() of an old segment fails with EACCES. A postmaster will no longer recycle segments pertaining to other data directories. That's good for production, but it's bad for integration tests that crash a postmaster and immediately delete its data directory. Such a test now leaks a segment indefinitely. No "make check-world" test does that. win32_shmem.c already avoided all these problems. In 9.6 and later, enhance PostgresNode to facilitate testing. Back-patch to 9.4 (all supported versions). Reviewed (in earlier versions) by Daniel Gustafsson and Kyotaro HORIGUCHI. Discussion: https://postgr.es/m/20190408064141.GA2016666@rfd.leadboat.com Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/c098509927f9a49ebceb301a2cb6a477ecd4ac3c Modified Files -------------- src/Makefile.global.in | 4 +- src/backend/port/sysv_shmem.c | 269 +++++++++++++++++++++--------------- src/backend/port/win32_shmem.c | 7 +- src/backend/postmaster/postmaster.c | 12 +- src/backend/storage/ipc/ipci.c | 14 +- src/backend/utils/init/postinit.c | 6 +- src/include/storage/ipc.h | 2 +- src/include/storage/pg_shmem.h | 6 +- src/test/perl/PostgresNode.pm | 184 +++++++++++++++++++----- src/test/recovery/t/017_shm.pl | 200 +++++++++++++++++++++++++++ src/tools/msvc/vcregress.pl | 1 + 11 files changed, 525 insertions(+), 180 deletions(-)
Noah Misch <noah@leadboat.com> writes: > Consistently test for in-use shared memory. Hmm ... conchuela has just found another problem with this test case: ok 2 - detected live backend via shared memory # Running: postgres --single -D /home/pgbf/buildroot/HEAD/pgsql.build/src/test/recovery/tmp_check/t_017_shm_gnat_data/pgdatatemplate1 ack Broken pipe: write( 16, 'SELECT 1 + 1' ) at /usr/local/lib/perl5/site_perl/IPC/Run/IO.pm line 549. I interpret this as "the single-user backend exited before we could stuff 'SELECT 1 + 1' down the pipe to it, and the Perl script is not expecting to get a write failure there". regards, tom lane
On Mon, Apr 15, 2019 at 08:08:48PM -0400, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > Consistently test for in-use shared memory. > > Hmm ... conchuela has just found another problem with this test case: > > ok 2 - detected live backend via shared memory > # Running: postgres --single -D /home/pgbf/buildroot/HEAD/pgsql.build/src/test/recovery/tmp_check/t_017_shm_gnat_data/pgdatatemplate1 > ack Broken pipe: write( 16, 'SELECT 1 + 1' ) at /usr/local/lib/perl5/site_perl/IPC/Run/IO.pm line 549. > > I interpret this as "the single-user backend exited before we could stuff > 'SELECT 1 + 1' down the pipe to it, and the Perl script is not expecting > to get a write failure there". Perhaps I should write the 'SELECT 1 + 1' to a regular file and redirect input from that file.
Noah Misch <noah@leadboat.com> writes: > On Mon, Apr 15, 2019 at 08:08:48PM -0400, Tom Lane wrote: >> I interpret this as "the single-user backend exited before we could stuff >> 'SELECT 1 + 1' down the pipe to it, and the Perl script is not expecting >> to get a write failure there". > Perhaps I should write the 'SELECT 1 + 1' to a regular file and redirect input > from that file. Yeah, that was the first idea that occurred to me as well. Kinda grotty, but ... Another idea is to shove the problem off on a sub-shell, that is run echo SELECT 1 + 1 | postgres --single ... regards, tom lane
Noah Misch <noah@leadboat.com> writes: >> Perhaps I should write the 'SELECT 1 + 1' to a regular file and redirect input >> from that file. Actually ... we don't need to run a query at all do we? Although this would only help if Perl will optimize away an attempt to write zero bytes to the pipe, which seems likely but not certain. regards, tom lane
On Mon, Apr 15, 2019 at 09:01:45PM -0400, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > >> Perhaps I should write the 'SELECT 1 + 1' to a regular file and redirect input > >> from that file. > > Actually ... we don't need to run a query at all do we? No. We expect never to run one; it was there for the unexpected case of "postgres --single" startup succeeding. I pushed a change to close the stdin of "postgres --single" instead of writing to it. I probably worried that "postgres --single" would not tolerate that, but it seems to.
Noah Misch <noah@leadboat.com> writes: > No. We expect never to run one; it was there for the unexpected case of > "postgres --single" startup succeeding. I pushed a change to close the stdin > of "postgres --single" instead of writing to it. Ah, good, that's safer than what I was imagining. > I probably worried that > "postgres --single" would not tolerate that, but it seems to. Sure. EOF-on-stdin is the standard way to end a standalone-backend session. This fix means it'd have zero queries to execute rather than one, but I'd be entirely willing to call it a backend bug if that didn't work. The more important point of course is that we must correctly detect test failure if the backend doesn't complain. But since the test explicitly looks for a particular error message text, that seems fine. regards, tom lane