PGCTLTIMEOUT in pg_regress, or skink versus the clock - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | PGCTLTIMEOUT in pg_regress, or skink versus the clock |
Date | |
Msg-id | 13969.1461191936@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: PGCTLTIMEOUT in pg_regress, or skink versus the clock
|
List | pgsql-hackers |
Buildfarm member skink has failed three times recently like this: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2016-04-15%2001%3A20%3A44 the relevant part of that being pg_regress: postmaster did not respond within 60 seconds Examine /home/andres/build/buildfarm/REL9_5_STABLE/pgsql.build/src/interfaces/ecpg/test/log/postmaster.log for the reason where the postmaster log shows nothing particularly surprising, it's just not reached the ready state yet: LOG: database system was shut down at 2016-04-15 05:11:18 UTC FATAL: the database system is starting up LOG: MultiXact member wraparound protections are now enabled Now, there are some reasons to suspect that there might be more here than meets the eye; for one thing, it stretches credulity a bit to believe that it's only random chance that all three failures are in the 9.5 branch and all are in the ecpg regression test step. I'm also curious as to why we see only one "FATAL: the database system is starting up" connection rejection and not sixty. However, by far the simplest explanation for this failure is just that the postmaster took more than 60 seconds to start up; and seeing that skink is running Valgrind and is on an AWS instance, that's not that much of a stretch of credulity either. Hence, I am thinking that we missed a bet in commit 2ffa86962077c588 et al, and that pg_regress's hard-wired 60-second start timeout ought to be overridable from an environment variable just as pg_ctl's timeout is. It might as well be the same environment variable, so I propose the attached patch. Note that since the shutdown end of things in pg_regress uses "pg_ctl stop", that end of it already responds to PGCTLTIMEOUT. (I could not find any user-facing documentation for pg_regress, so there's no apparent need for a docs update.) Any objections? regards, tom lane diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c index 2f6f56d..574f5b8 100644 *** a/src/test/regress/pg_regress.c --- b/src/test/regress/pg_regress.c *************** regression_main(int argc, char *argv[], *** 2185,2190 **** --- 2185,2192 ---- if (temp_instance) { FILE *pg_conf; + const char *env_wait; + int wait_seconds; /* * Prepare the temp instance *************** regression_main(int argc, char *argv[], *** 2335,2345 **** } /* ! * Wait till postmaster is able to accept connections (normally only a ! * second or so, but Cygwin is reportedly *much* slower). Don't wait ! * forever, however. */ ! for (i = 0; i < 60; i++) { /* Done if psql succeeds */ if (system(buf2) == 0) --- 2337,2359 ---- } /* ! * Wait till postmaster is able to accept connections; normally this ! * is only a second or so, but Cygwin is reportedly *much* slower, and ! * test builds using Valgrind or similar tools might be too. Hence, ! * allow the default timeout of 60 seconds to be overridden from the ! * PGCTLTIMEOUT environment variable. */ ! env_wait = getenv("PGCTLTIMEOUT"); ! if (env_wait != NULL) ! { ! wait_seconds = atoi(env_wait); ! if (wait_seconds <= 0) ! wait_seconds = 60; ! } ! else ! wait_seconds = 60; ! ! for (i = 0; i < wait_seconds; i++) { /* Done if psql succeeds */ if (system(buf2) == 0) *************** regression_main(int argc, char *argv[], *** 2360,2368 **** pg_usleep(1000000L); } ! if (i >= 60) { ! fprintf(stderr, _("\n%s: postmaster did not respond within 60 seconds\nExamine %s/log/postmaster.log for thereason\n"), progname, outputdir); /* * If we get here, the postmaster is probably wedged somewhere in --- 2374,2383 ---- pg_usleep(1000000L); } ! if (i >= wait_seconds) { ! fprintf(stderr, _("\n%s: postmaster did not respond within %d seconds\nExamine %s/log/postmaster.log for thereason\n"), ! progname, wait_seconds, outputdir); /* * If we get here, the postmaster is probably wedged somewhere in
pgsql-hackers by date: