Re: [HACKERS] jacana hung after failing to acquire random number - Mailing list pgsql-hackers
From | Andrew Dunstan |
---|---|
Subject | Re: [HACKERS] jacana hung after failing to acquire random number |
Date | |
Msg-id | 50108a9a-72ad-4887-320f-f2d9de149c41@dunslane.net Whole thread Raw |
In response to | Re: [HACKERS] jacana hung after failing to acquire random number (Heikki Linnakangas <hlinnaka@iki.fi>) |
Responses |
Re: [HACKERS] jacana hung after failing to acquire random number
Re: [HACKERS] jacana hung after failing to acquire random number |
List | pgsql-hackers |
On 12/12/2016 02:32 AM, Heikki Linnakangas wrote: > On 12/12/2016 05:58 AM, Michael Paquier wrote: >> On Sun, Dec 11, 2016 at 9:06 AM, Andrew Dunstan <andrew@dunslane.net> >> wrote: >>> >>> jascana (mingw, 64 bit compiler, no openssl) is currently hung on "make >>> check". After starting the autovacuum launcher there are 120 >>> messages on its >>> log about "Could not acquire random number". Then nothing. >>> >>> >>> So I suspect the problem here is commit >>> fe0a0b5993dfe24e4b3bcf52fa64ff41a444b8f1, although I haven't looked in >>> detail. >>> >>> >>> Shouldn't we want the postmaster to shut down if it's not going to go >>> further? Note that frogmouth, also mingw, which builds with openssl, >>> doesn't >>> have this issue. >> >> Did you unlock it in some way at the end? Here is the shape of the >> report for others: >> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2016-12-10%2022%3A00%3A15 >> >> And here is of course the interesting bit: >> 2016-12-10 17:25:38.822 EST [584c80e2.ddc:2] LOG: could not acquire >> random number >> 2016-12-10 17:25:39.869 EST [584c80e2.ddc:3] LOG: could not acquire >> random number >> 2016-12-10 17:25:40.916 EST [584c80e2.ddc:4] LOG: could not acquire >> random number >> >> I am not seeing any problems with MSVC without openssl, so that's a >> problem proper to MinGW. I am getting to wonder if it is actually a >> good idea to cache the crypt context and then re-use it. Using a new >> context all the time is definitely not performance-wise though. > > Actually, looking at the config.log on jacana, it's trying to use > /dev/urandom: > > configure:15028: checking for /dev/urandom > configure:15041: result: yes > configure:15054: checking which random number source to use > configure:15073: result: /dev/urandom > > And looking closer at configure.in, I can see why: > > elif test "$PORTNAME" = x"win32" ; then > USE_WIN32_RANDOM=1 > > That test is broken. It looks like the x"$VAR" = x"constant" idiom, > but the left side of the comparison doesn't have the 'x'. Oops. > > Fixed that, let's see if it made jacana happy again. > > This makes me wonder if we should work a bit harder to get a good > error message, if acquiring a random number fails for any reason. This > needs to work in the frontend as well backend, but we could still have > an elog(LOG, ...) there, inside an #ifndef FRONTEND block. I see you have now improved the messages in postmaster.c, which is good. However, the bigger problem (ISTM) is that when this failed I had a system which was running but where every connection immediately failed: ============== creating temporary instance ============== ============== initializing database system ============== ============== starting postmaster ============== pg_regress: postmaster did not respond within 120 seconds Examine c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/test/regress/log/postmaster.logfor the reason make: *** [check]Error 2 Should one or more of these errors be fatal? Or should we at least get pg_regress to try to shut down the postmaster if it can't connect after 120 seconds? [In answer to Michael's question above, I forcibly shut down the postmaster by hand. Otherwise it would still be running, and we would not have got the report on the buildfarm server.] cheers andrew
pgsql-hackers by date: