Thread: Why doesn't src/backend/port/win32/socket.c implement bind()?
Some of the Windows buildfarm members occasionally fail like this: LOG: could not bind IPv4 socket: No error HINT: Is another postmaster already running on port 64470? If not, wait a few seconds and retry. WARNING: could not create listen socket for "127.0.0.1" FATAL: could not create any TCP/IP sockets (bowerbird, in particular, has a few recent examples) I think the reason why we're getting "No error" instead of a useful strerror report is that socket.c doesn't provide an implementation of bind() that includes TranslateSocketError(). Why is that? regards, tom lane
On Sun, Jan 10, 2016 at 11:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Some of the Windows buildfarm members occasionally fail like this:
>
> LOG: could not bind IPv4 socket: No error
> HINT: Is another postmaster already running on port 64470? If not, wait a few seconds and retry.
> WARNING: could not create listen socket for "127.0.0.1"
> FATAL: could not create any TCP/IP sockets
>
> (bowerbird, in particular, has a few recent examples)
>
> I think the reason why we're getting "No error" instead of a useful
> strerror report is that socket.c doesn't provide an implementation
> of bind() that includes TranslateSocketError().
>
> Some of the Windows buildfarm members occasionally fail like this:
>
> LOG: could not bind IPv4 socket: No error
> HINT: Is another postmaster already running on port 64470? If not, wait a few seconds and retry.
> WARNING: could not create listen socket for "127.0.0.1"
> FATAL: could not create any TCP/IP sockets
>
> (bowerbird, in particular, has a few recent examples)
>
> I think the reason why we're getting "No error" instead of a useful
> strerror report is that socket.c doesn't provide an implementation
> of bind() that includes TranslateSocketError().
>
listen also doesn't have such an implementation and probably few others.
> Why is that?
>
>
Not sure, but I could see that bind and listen doesn't have the equivalent
Win sock API (checked in winsock2.h) and while googling on same,
I found that there are reasons [1] why Win Sockets doesn't have the
equivalent of some of the socket API's.
I think here we should add a win32 wrapper over bind and listen
API's which ensures TranslateSocketError() should be called for
error cases.
On Mon, Jan 11, 2016 at 6:19 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
-- On Sun, Jan 10, 2016 at 11:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Some of the Windows buildfarm members occasionally fail like this:
>
> LOG: could not bind IPv4 socket: No error
> HINT: Is another postmaster already running on port 64470? If not, wait a few seconds and retry.
> WARNING: could not create listen socket for "127.0.0.1"
> FATAL: could not create any TCP/IP sockets
>
> (bowerbird, in particular, has a few recent examples)
>
> I think the reason why we're getting "No error" instead of a useful
> strerror report is that socket.c doesn't provide an implementation
> of bind() that includes TranslateSocketError().>listen also doesn't have such an implementation and probably few others.
The reason they don't is that when this compatibility layer was written, it was to support the signal emulation. So the calls that were put in there were the ones that we need(ed) to be able to interrupt with a signal. As both bind() and listen() are not blocking commands (at least not normally), there is no need to interrupt them, and thus there is no function in socket.c for them.
I don't think anybody at the time was even considering the error handling. Only insofar as handling the calls that were very clearly not the same as the Unix variants. listen/bind were just missed.
> Why is that?
>Not sure, but I could see that bind and listen doesn't have the equivalentWin sock API (checked in winsock2.h) and while googling on same,I found that there are reasons [1] why Win Sockets doesn't have theequivalent of some of the socket API's.I think here we should add a win32 wrapper over bind and listenAPI's which ensures TranslateSocketError() should be called forerror cases.
Yeah, that seems like a good idea.
Magnus Hagander <magnus@hagander.net> writes: > On Mon, Jan 11, 2016 at 6:19 AM, Amit Kapila <amit.kapila16@gmail.com> > wrote: >> On Sun, Jan 10, 2016 at 11:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I think the reason why we're getting "No error" instead of a useful >>> strerror report is that socket.c doesn't provide an implementation >>> of bind() that includes TranslateSocketError(). >> listen also doesn't have such an implementation and probably few others. >> I think here we should add a win32 wrapper over bind and listen >> API's which ensures TranslateSocketError() should be called for >> error cases. > Yeah, that seems like a good idea. I finally got around to doing this, after being annoyed by yet another Windows buildfarm failure with no clear indication as to the cause: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2016-04-12%2022%3A30%3A12 While we wait to see if that actually helps give useful errors, I had a thought about what may be happening here. PostgresNode.pm picks a random high port number and tests to see if it's free using pg_isready, with (unless I'm misreading) any non-zero result code being taken as "it's free". The problem here is that that completely fails to recognize a port being used by a non-Postgres process as not-free --- most likely, you'll get PQPING_NO_RESPONSE for that case. If there's other stuff using high ports on a particular buildfarm machine, you'd expect occasional random test failures due to this. The observed fact that some buildfarm critters are much more prone to this type of failure than others is well explained by this hypothesis. I think we should forget about pg_isready altogether here, and instead write some code that either tries to bind() the target port number itself, or tries a low-level TCP connection request to the target port. I'm not sure what's the most convenient way to accomplish either in Perl. The bind() solution would provide a more trustworthy answer, but it might actually create more problems than it solves if the OS requires a cooling-off period before giving the port out to a different process. regards, tom lane
On Wed, Apr 13, 2016 at 9:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > While we wait to see if that actually helps give useful errors, > I had a thought about what may be happening here. PostgresNode.pm > picks a random high port number and tests to see if it's free using > pg_isready, with (unless I'm misreading) any non-zero result code > being taken as "it's free". The problem here is that that completely > fails to recognize a port being used by a non-Postgres process as > not-free --- most likely, you'll get PQPING_NO_RESPONSE for that case. > If there's other stuff using high ports on a particular buildfarm machine, > you'd expect occasional random test failures due to this. The observed > fact that some buildfarm critters are much more prone to this type of > failure than others is well explained by this hypothesis. Each test run uses its own custom unix_socket_directories, PGHOST is enforced to use it, and all the port tests go through that as well. And it seems to me that the same port number can be used as long as the socket directory is different, no? At least that's how PostgresNode has been designed to work, and this is useful when running tests in parallel to avoid port and host collision. -- Michael
Michael Paquier <michael.paquier@gmail.com> writes: > On Wed, Apr 13, 2016 at 9:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> If there's other stuff using high ports on a particular buildfarm machine, >> you'd expect occasional random test failures due to this. The observed >> fact that some buildfarm critters are much more prone to this type of >> failure than others is well explained by this hypothesis. > Each test run uses its own custom unix_socket_directories, PGHOST is > enforced to use it, and all the port tests go through that as well. By that argument, we don't need the free-port-searching code on Unix at all. But this discussion is mostly about Windows machines. regards, tom lane
On Wed, Apr 13, 2016 at 10:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Michael Paquier <michael.paquier@gmail.com> writes: >> On Wed, Apr 13, 2016 at 9:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> If there's other stuff using high ports on a particular buildfarm machine, >>> you'd expect occasional random test failures due to this. The observed >>> fact that some buildfarm critters are much more prone to this type of >>> failure than others is well explained by this hypothesis. > >> Each test run uses its own custom unix_socket_directories, PGHOST is >> enforced to use it, and all the port tests go through that as well. > > By that argument, we don't need the free-port-searching code on Unix at > all. But this discussion is mostly about Windows machines. Well, yes. That's true, we could do without. Even if this could give an indication about a node running, as long as a port has been associated to a node once, we just need to be sure that a new port is not allocated. On Windows, I am not sure that it is worth the complication to be honest, and the current code gives a small safety net, which is better than nothing. -- Michael
Michael Paquier wrote: > On Wed, Apr 13, 2016 at 10:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Michael Paquier <michael.paquier@gmail.com> writes: > >> On Wed, Apr 13, 2016 at 9:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>> If there's other stuff using high ports on a particular buildfarm machine, > >>> you'd expect occasional random test failures due to this. The observed > >>> fact that some buildfarm critters are much more prone to this type of > >>> failure than others is well explained by this hypothesis. > > > >> Each test run uses its own custom unix_socket_directories, PGHOST is > >> enforced to use it, and all the port tests go through that as well. > > > > By that argument, we don't need the free-port-searching code on Unix at > > all. But this discussion is mostly about Windows machines. > > Well, yes. That's true, we could do without. Even if this could give > an indication about a node running, as long as a port has been > associated to a node once, we just need to be sure that a new port is > not allocated. On Windows, I am not sure that it is worth the > complication to be honest, and the current code gives a small safety > net, which is better than nothing. If we need to fix the test so that it works in a wider environment for Windows, I don't think it makes sense to remove anything -- rather we should change the test as Tom suggests to verify that the port is really free rather than just doing the pg_isready test. Maybe the additional test will be useless in non-Windows environment, but why cares? It will work all the same. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > Michael Paquier wrote: >> Well, yes. That's true, we could do without. Even if this could give >> an indication about a node running, as long as a port has been >> associated to a node once, we just need to be sure that a new port is >> not allocated. On Windows, I am not sure that it is worth the >> complication to be honest, and the current code gives a small safety >> net, which is better than nothing. > If we need to fix the test so that it works in a wider environment for > Windows, I don't think it makes sense to remove anything -- rather we > should change the test as Tom suggests to verify that the port is really > free rather than just doing the pg_isready test. Maybe the additional > test will be useless in non-Windows environment, but why cares? It will > work all the same. I think Michael is arguing that it's not worth fixing. He might be right; it's not like this is the only cause of irreproducible failures on the Windows critters. Still, it bugs me if we know how to make the regression tests more reliable and do not do so. Back when I packaged mysql for Red Hat, I was constantly annoyed by how often their tests failed under load. Don't want to be like that. regards, tom lane
On Thu, Apr 14, 2016 at 8:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@2ndquadrant.com> writes: >> Michael Paquier wrote: >>> Well, yes. That's true, we could do without. Even if this could give >>> an indication about a node running, as long as a port has been >>> associated to a node once, we just need to be sure that a new port is >>> not allocated. On Windows, I am not sure that it is worth the >>> complication to be honest, and the current code gives a small safety >>> net, which is better than nothing. > >> If we need to fix the test so that it works in a wider environment for >> Windows, I don't think it makes sense to remove anything -- rather we >> should change the test as Tom suggests to verify that the port is really >> free rather than just doing the pg_isready test. Maybe the additional >> test will be useless in non-Windows environment, but why cares? It will >> work all the same. > > I think Michael is arguing that it's not worth fixing. He might be right; > it's not like this is the only cause of irreproducible failures on the > Windows critters. Still, it bugs me if we know how to make the regression > tests more reliable and do not do so. Back when I packaged mysql for Red > Hat, I was constantly annoyed by how often their tests failed under load. > Don't want to be like that. Some experiment is proving that it is actually not that complicated to make that cross-platform: use Socket; my $remote = 'localhost'; my $port = 5432; $iaddr = inet_aton($remote); $paddr = sockaddr_in($port, $iaddr); $proto = getprotobyname("tcp"); socket(SOCK, PF_INET, SOCK_STREAM, $proto) || die "socket: $!"; connect(SOCK, $paddr) || die "connect: $!"; close (SOCK) || die "close: $!"; IO::Socket::INET is another option, but I am not seeing it in perl < 5.12, and that's not part of ActivePerl, which makes life harder on Windows. Socket is available on both. Does that address your concerns? -- Michael
On Thu, Apr 14, 2016 at 9:38 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > IO::Socket::INET is another option, but I am not seeing it in perl < > 5.12, and that's not part of ActivePerl, which makes life harder on > Windows. Socket is available on both. Does that address your concerns? And this gives the patch attached, just took the time to hack it. -- Michael
Attachment
Michael Paquier <michael.paquier@gmail.com> writes: > On Thu, Apr 14, 2016 at 9:38 AM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> IO::Socket::INET is another option, but I am not seeing it in perl < >> 5.12, and that's not part of ActivePerl, which makes life harder on >> Windows. Socket is available on both. Does that address your concerns? > And this gives the patch attached, just took the time to hack it. I think this is a good idea, but (1) I'm inclined not to restrict it to Windows, and (2) I think we should hold off applying it until we've seen a failure or two more, and can confirm whether d1b7d4877 does anything useful for the error messages. regards, tom lane
On Fri, Apr 15, 2016 at 12:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Michael Paquier <michael.paquier@gmail.com> writes: >> On Thu, Apr 14, 2016 at 9:38 AM, Michael Paquier >> <michael.paquier@gmail.com> wrote: >>> IO::Socket::INET is another option, but I am not seeing it in perl < >>> 5.12, and that's not part of ActivePerl, which makes life harder on >>> Windows. Socket is available on both. Does that address your concerns? > >> And this gives the patch attached, just took the time to hack it. > > I think this is a good idea, but (1) I'm inclined not to restrict it to > Windows, and (2) I think we should hold off applying it until we've seen > a failure or two more, and can confirm whether d1b7d4877 does anything > useful for the error messages. Both arguments are fine for me. -- Michael
I wrote: > Michael Paquier <michael.paquier@gmail.com> writes: >> And this gives the patch attached, just took the time to hack it. > I think this is a good idea, but (1) I'm inclined not to restrict it to > Windows, and (2) I think we should hold off applying it until we've seen > a failure or two more, and can confirm whether d1b7d4877 does anything > useful for the error messages. OK, we now have failures from both bowerbird and jacana with the error reporting patch applied: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2016-04-21%2012%3A03%3A02 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2016-04-19%2021%3A00%3A39 and they both boil down to this: pg_ctl: could not start server Examine the log output. # pg_ctl failed; logfile: LOG: could not bind IPv4 socket: Permission denied HINT: Is another postmaster already running on port 60200? If not, wait a few seconds and retry. WARNING: could not create listen socket for "127.0.0.1" FATAL: could not create any TCP/IP sockets LOG: database system is shut down So "permission denied" is certainly more useful than "no error", which makes me feel that d1b7d4877+22989a8e3 are doing what they intended to and should get back-patched --- any objections? However, it's still not entirely clear what is the root cause of the failure and whether a patch along the discussed lines would prevent its recurrence. Looking at TranslateSocketError, it seems we must be seeing an underlying error code of WSAEACCES. A little googling says that Windows might indeed return that, rather than the more expected WSAEADDRINUSE, if someone else has the port open with SO_EXCLUSIVEADDRUSE: Another possible reason for the WSAEACCES error is that when thebind function is called (on Windows NT 4.0 with SP4 and later),anotherapplication, service, or kernel mode driver is bound tothe same address with exclusive access. Such exclusiveaccess is anew feature of Windows NT 4.0 with SP4 and later, and isimplemented by using the SO_EXCLUSIVEADDRUSEoption. So theory A is that some other program is binding random high port numbers with SO_EXCLUSIVEADDRUSE. Theory B is that this is the handiwork of Windows antivirus software doing what Windows antivirus software typically does, ie inject random permissions failures depending on the phase of the moon. It's not very clear that a test along the lines described (that is, attempt to connect to, not bind to, the target port) would pre-detect either type of error. Under theory A, a connect() test would recognize the problem only if the other program were using the port to listen rather than make an outbound connection; and the latter seems much more likely. (Possibly we could detect the latter case by checking the error code returned by connect(), but Michael's proposed patch does no such thing.) Under theory B, we're pretty much screwed, we don't know what will happen. I wonder what Andrew can tell us about what else is running on that machine and whether either theory has any credibility. BTW, if Windows *had* returned WSAEADDRINUSE, TranslateSocketError would have failed to translate it --- surely that's an oversight? regards, tom lane
On Thu, Apr 21, 2016 at 11:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> Michael Paquier <michael.paquier@gmail.com> writes: >>> And this gives the patch attached, just took the time to hack it. > >> I think this is a good idea, but (1) I'm inclined not to restrict it to >> Windows, and (2) I think we should hold off applying it until we've seen >> a failure or two more, and can confirm whether d1b7d4877 does anything >> useful for the error messages. > > OK, we now have failures from both bowerbird and jacana with the error > reporting patch applied: > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2016-04-21%2012%3A03%3A02 > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2016-04-19%2021%3A00%3A39 > > and they both boil down to this: > > pg_ctl: could not start server > Examine the log output. > # pg_ctl failed; logfile: > LOG: could not bind IPv4 socket: Permission denied > HINT: Is another postmaster already running on port 60200? If not, wait a few seconds and retry. > WARNING: could not create listen socket for "127.0.0.1" > FATAL: could not create any TCP/IP sockets > LOG: database system is shut down > > So "permission denied" is certainly more useful than "no error", which > makes me feel that d1b7d4877+22989a8e3 are doing what they intended to > and should get back-patched --- any objections? +1. That's useful in itself. > However, > [...] > > So theory A is that some other program is binding random high port numbers > with SO_EXCLUSIVEADDRUSE. Theory B is that this is the handiwork of > Windows antivirus software doing what Windows antivirus software typically > does, ie inject random permissions failures depending on the phase of the > moon. It's not very clear that a test along the lines described (that is, > attempt to connect to, not bind to, the target port) would pre-detect > either type of error. Under theory A, a connect() test would recognize > the problem only if the other program were using the port to listen rather > than make an outbound connection; and the latter seems much more likely. > (Possibly we could detect the latter case by checking the error code > returned by connect(), but Michael's proposed patch does no such thing.) Perl's connect() can be made more chatty. $! returns the error string, $!+0 the errno. With the patch I sent previously, we'd need to change this portion: + socket(SOCK, PF_INET, SOCK_STREAM, $proto) or die; + $found = 0 if connect(SOCK, $paddr); + close(SOCK); Basically, that would something like that, which would be still better than nothing I think: if (!connect()) { print 'connect error = ', $!, '\n'; } Honestly, I think even if we will never reach perfection here, something like my previous patch would still allow us to make the tests more reliable on a platform where services listen to localhost. > Under theory B, we're pretty much screwed, we don't know what will happen. Indeed. If things are completely random, there is nothing guaranteeing us that a connect() failing at instant T, meaning that a port is available at this moment, is not going to be taken at moment (T+1) because of the window between which the free port is checked and postgres is going to bind this port. If we free up the port just before starting Postgres there would be a reduced failure window, still that cannot be reduced to 0. > BTW, if Windows *had* returned WSAEADDRINUSE, TranslateSocketError would > have failed to translate it --- surely that's an oversight? Yes, and I can see you fixed that with 125ad53 already. -- Michael
I wrote: > However, it's still not entirely clear what is the root cause of the > failure and whether a patch along the discussed lines would prevent its > recurrence. Looking at TranslateSocketError, it seems we must be seeing > an underlying error code of WSAEACCES. A little googling says that > Windows might indeed return that, rather than the more expected > WSAEADDRINUSE, if someone else has the port open with SO_EXCLUSIVEADDRUSE: > Another possible reason for the WSAEACCES error is that when the > bind function is called (on Windows NT 4.0 with SP4 and later), > another application, service, or kernel mode driver is bound to > the same address with exclusive access. Such exclusive access is a > new feature of Windows NT 4.0 with SP4 and later, and is > implemented by using the SO_EXCLUSIVEADDRUSE option. > So theory A is that some other program is binding random high port numbers > with SO_EXCLUSIVEADDRUSE. Theory B is that this is the handiwork of > Windows antivirus software doing what Windows antivirus software typically > does, ie inject random permissions failures depending on the phase of the > moon. It's not very clear that a test along the lines described (that is, > attempt to connect to, not bind to, the target port) would pre-detect > either type of error. Under theory A, a connect() test would recognize > the problem only if the other program were using the port to listen rather > than make an outbound connection; and the latter seems much more likely. I took a second look at the above-quoted Microsoft documentation, and noticed that it specifies that this error occurs when another application is *bound* to the target address. If by that they mean that the other app did a bind(), then indeed what we're seeing here is a conflict with a listening app, so that the proposed patch would detect it. So I went ahead and pushed the patch --- in any case, it shouldn't make things any worse. Also, I did a bit of digging in the buildfarm logs, and noticed that bowerbird and jacana together have reported 34 "could not bind socket" failures in BinInstallCheck since 2015-12-07 (when the current logic for selecting a random port went in). Between 2015-01-01 and 2015-12-07, they reported only *one* such failure. So whatever the exact explanation is, we've greatly increased the probability of such failures by using a random port rather than the fixed port 65432 that was used before. I'm not entirely sure what to make of this observation, but the statistics seem pretty clear. regards, tom lane
On Mon, Apr 25, 2016 at 4:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I took a second look at the above-quoted Microsoft documentation, and > noticed that it specifies that this error occurs when another application > is *bound* to the target address. If by that they mean that the other > app did a bind(), then indeed what we're seeing here is a conflict with > a listening app, so that the proposed patch would detect it. So I went > ahead and pushed the patch --- in any case, it shouldn't make things > any worse. Not worse, and still not enough... bowerbird complained again: http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2016-04-25%2002%3A13%3A54 -- Michael
Michael Paquier <michael.paquier@gmail.com> writes: > Not worse, and still not enough... bowerbird complained again: > http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2016-04-25%2002%3A13%3A54 That's a different symptom that seems unrelated: cannot remove directory for C:\prog\bf\root\HEAD\pgsql.build\src\bin\scripts\tmp_check\data_main_21Nw\pgdata\global: Directorynot empty at C:/Perl64/lib/File/Temp.pm line 902. cannot remove directory for C:\prog\bf\root\HEAD\pgsql.build\src\bin\scripts\tmp_check\data_main_21Nw\pgdata\pg_xlog: Directorynot empty at C:/Perl64/lib/File/Temp.pm line 902. cannot remove directory for C:\prog\bf\root\HEAD\pgsql.build\src\bin\scripts\tmp_check\data_main_21Nw\pgdata: Permissiondenied at C:/Perl64/lib/File/Temp.pm line 902. cannot remove directory for C:\prog\bf\root\HEAD\pgsql.build\src\bin\scripts\tmp_check\data_main_21Nw: Directory not emptyat C:/Perl64/lib/File/Temp.pm line 902. ### Signalling QUIT to 12200 for node "main" # Running: pg_ctl kill QUIT 12200 We've seen that one before, though less often than the port-in-use errors. Maybe it's failing to wait long enough for server shutdown? regards, tom lane
I wrote: > Michael Paquier <michael.paquier@gmail.com> writes: >> Not worse, and still not enough... bowerbird complained again: >> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2016-04-25%2002%3A13%3A54 > That's a different symptom that seems unrelated: > cannot remove directory for C:\prog\bf\root\HEAD\pgsql.build\src\bin\scripts\tmp_check\data_main_21Nw\pgdata\global: Directorynot empty at C:/Perl64/lib/File/Temp.pm line 902. Ah, scratch that, I was taking that as being the cause of the reported failure but it's just noise, cf <31417.1461595864@sss.pgh.pa.us>. You're right, we're still getting # pg_ctl failed; logfile: LOG: could not bind IPv4 socket: Permission denied HINT: Is another postmaster already running on port 60208? If not, wait a few seconds and retry. WARNING: could not create listen socket for "127.0.0.1" FATAL: could not create any TCP/IP sockets LOG: database system is shut down Bail out! pg_ctl failed So the connect() test is inadequate. Let's try bind() with SO_REUSEADDR and see whether that makes things better or worse. regards, tom lane