Re: Regression tests versus the buildfarm environment - Mailing list pgsql-hackers
From | Andrew Dunstan |
---|---|
Subject | Re: Regression tests versus the buildfarm environment |
Date | |
Msg-id | 4C6281A1.5030403@dunslane.net Whole thread Raw |
In response to | Regression tests versus the buildfarm environment (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Regression tests versus the buildfarm environment
|
List | pgsql-hackers |
On 08/11/2010 12:42 AM, Tom Lane wrote: > There's an interesting buildfarm failure here: > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=polecat&dt=2010-08-10%2023:46:10 > It appears to me that this was caused by the concurrent run of another > buildfarm animal on the same physical machine, namely: > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=colugos&dt=2010-08-11%2000:02:58 > Both animals are trying to test HEAD, which means that pg_regress > defaults to the same postmaster port number in both builds: > > if (temp_install&& !port_specified_by_user) > > /* > * To reduce chances of interference with parallel installations, use > * a port number starting in the private range (49152-65535) > * calculated from the version number. > */ > port = 0xC000 | (PG_VERSION_NUM& 0x3FFF); > > We observe colugos successfully starting on that port: > > ============== starting postmaster ============== > running on port 57332 with pid 47019 > ============== creating database "regression" ============== > CREATE DATABASE > ALTER DATABASE > ... etc etc ... > > polecat comes along what must be only moments later, and tries to use > the same port for its temp install: > > ============== starting postmaster ============== > running on port 57332 with pid 47022 > ============== creating database "regression" ============== > ERROR: duplicate key value violates unique constraint "pg_database_datname_index" > DETAIL: Key (datname)=(regression) already exists. > command failed: "/usr/local/src/build-farm-3.2/builds/HEAD/pgsql.15278/src/test/regress/./tmp_check/install//usr/local/src/build-farm-3.2/builds/HEAD/inst/bin/psql" -X-c "CREATE DATABASE \"regression\" TEMPLATE=template0 ENCODING='SQL_ASCII' LC_COLLATE='C' LC_CTYPE='C'" "postgres" > pg_ctl: PID file "/usr/local/src/build-farm-3.2/builds/HEAD/pgsql.15278/src/test/regress/./tmp_check/data/postmaster.pid"does not exist > Is server running? > > pg_regress: could not stop postmaster: exit code was 256 > > Now the postmaster log shows that the second postmaster correctly > recognized that the port number was already in use, so it bailed out: > > ================== pgsql.15278/src/test/regress/log/postmaster.log =================== > [4c61f2d2.b7ae:1] FATAL: lock file "/tmp/.s.PGSQL.57332.lock" already exists > [4c61f2d2.b7ae:2] HINT: Is another postmaster (PID 47019) using socket file "/tmp/.s.PGSQL.57332"? > > However, pg_regress failed to have a clue about what had happened, > and bulled ahead trying to run the regression tests (against the > postmaster started by the other pg_regress instance). A look at the > code shows that it is merely trying to run psql, and if psql reports > that it can connect to the specified port, then pg_regress thinks the > postmaster started OK. Of course, psql was really reporting that it > could connect to the other instance's postmaster. > > > I've seen similar multiple-postmaster-interference symptoms before in > the buildfarm, but never really understood the cause. > > I am not sure if there's anything very good we can do about the > problem of pg_regress misidentifying the postmaster it's managed to > connect to. A real solution would probably be much more trouble than > it's worth, anyway. However, it does seem like we ought to be able to > do something about two buildfarm critters defaulting to the same choice > of port number. The buildfarm infrastructure goes to great lengths to > pick nonconflicting port numbers for the "installed" postmasters it > runs; but we're ignoring all that effort and just using a hardwired > port number for "make check". This is dumb. > > pg_regress does have a --port argument that can be used to override > that default. I don't know whether the buildfarm script calls > pg_regress directly or does "make check". If the latter, we'd need to > twiddle the Makefiles to allow a port number to get passed in. But > this seems well worthwhile to me. > > Comments? > > The buildfarm calls "make check". Why not just add the configured port (DEF_PGPORT) into the calculation of the port to run on? cheers andrew
pgsql-hackers by date: