Thread: Why doesn't src/backend/port/win32/socket.c implement bind()?

Why doesn't src/backend/port/win32/socket.c implement bind()?

From

Tom Lane

Date:

10 January 2016, 18:25:48

Some of the Windows buildfarm members occasionally fail like this:

LOG:  could not bind IPv4 socket: No error
HINT:  Is another postmaster already running on port 64470? If not, wait a few seconds and retry.
WARNING:  could not create listen socket for "127.0.0.1"
FATAL:  could not create any TCP/IP sockets

(bowerbird, in particular, has a few recent examples)

I think the reason why we're getting "No error" instead of a useful
strerror report is that socket.c doesn't provide an implementation
of bind() that includes TranslateSocketError().  Why is that?
        regards, tom lane

Re: Why doesn't src/backend/port/win32/socket.c implement bind()?

From

Amit Kapila

Date:

11 January 2016, 05:19:11

On Sun, Jan 10, 2016 at 11:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Some of the Windows buildfarm members occasionally fail like this:
>
> LOG: could not bind IPv4 socket: No error
> HINT: Is another postmaster already running on port 64470? If not, wait a few seconds and retry.
> WARNING: could not create listen socket for "127.0.0.1"
> FATAL: could not create any TCP/IP sockets
>
> (bowerbird, in particular, has a few recent examples)
>
> I think the reason why we're getting "No error" instead of a useful
> strerror report is that socket.c doesn't provide an implementation
> of bind() that includes TranslateSocketError().

listen also doesn't have such an implementation and probably few others.

> Why is that?
>

Not sure, but I could see that bind and listen doesn't have the equivalent

Win sock API (checked in winsock2.h) and while googling on same,

I found that there are reasons [1] why Win Sockets doesn't have the

equivalent of some of the socket API's.

I think here we should add a win32 wrapper over bind and listen

API's which ensures TranslateSocketError() should be called for

error cases.

[1] - http://stackoverflow.com/questions/3255899/why-are-there-wsa-pendants-for-socket-connect-send-and-so-on-but-not-fo

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Why doesn't src/backend/port/win32/socket.c implement bind()?

From

Magnus Hagander

Date:

11 January 2016, 11:36:50

On Mon, Jan 11, 2016 at 6:19 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:

On Sun, Jan 10, 2016 at 11:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Some of the Windows buildfarm members occasionally fail like this:
>
> LOG: could not bind IPv4 socket: No error
> HINT: Is another postmaster already running on port 64470? If not, wait a few seconds and retry.
> WARNING: could not create listen socket for "127.0.0.1"
> FATAL: could not create any TCP/IP sockets
>
> (bowerbird, in particular, has a few recent examples)
>
> I think the reason why we're getting "No error" instead of a useful
> strerror report is that socket.c doesn't provide an implementation
> of bind() that includes TranslateSocketError().
>

listen also doesn't have such an implementation and probably few others.

The reason they don't is that when this compatibility layer was written, it was to support the signal emulation. So the calls that were put in there were the ones that we need(ed) to be able to interrupt with a signal. As both bind() and listen() are not blocking commands (at least not normally), there is no need to interrupt them, and thus there is no function in socket.c for them.

I don't think anybody at the time was even considering the error handling. Only insofar as handling the calls that were very clearly not the same as the Unix variants. listen/bind were just missed.

> Why is that?
>

Not sure, but I could see that bind and listen doesn't have the equivalent
Win sock API (checked in winsock2.h) and while googling on same,
I found that there are reasons [1] why Win Sockets doesn't have the
equivalent of some of the socket API's.

I think here we should add a win32 wrapper over bind and listen
API's which ensures TranslateSocketError() should be called for
error cases.

Yeah, that seems like a good idea.

Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

Re: Why doesn't src/backend/port/win32/socket.c implement bind()?

From

Tom Lane

Date:

13 April 2016, 00:06:23

Magnus Hagander <magnus@hagander.net> writes:
> On Mon, Jan 11, 2016 at 6:19 AM, Amit Kapila <amit.kapila16@gmail.com>
> wrote:
>> On Sun, Jan 10, 2016 at 11:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I think the reason why we're getting "No error" instead of a useful
>>> strerror report is that socket.c doesn't provide an implementation
>>> of bind() that includes TranslateSocketError().

>> listen also doesn't have such an implementation and probably few others.
>> I think here we should add a win32 wrapper over bind and listen
>> API's which ensures TranslateSocketError() should be called for
>> error cases.

> Yeah, that seems like a good idea.

I finally got around to doing this, after being annoyed by yet another
Windows buildfarm failure with no clear indication as to the cause:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2016-04-12%2022%3A30%3A12

While we wait to see if that actually helps give useful errors,
I had a thought about what may be happening here.  PostgresNode.pm
picks a random high port number and tests to see if it's free using
pg_isready, with (unless I'm misreading) any non-zero result code
being taken as "it's free".  The problem here is that that completely
fails to recognize a port being used by a non-Postgres process as
not-free --- most likely, you'll get PQPING_NO_RESPONSE for that case.
If there's other stuff using high ports on a particular buildfarm machine,
you'd expect occasional random test failures due to this.  The observed
fact that some buildfarm critters are much more prone to this type of
failure than others is well explained by this hypothesis.

I think we should forget about pg_isready altogether here, and instead
write some code that either tries to bind() the target port number itself,
or tries a low-level TCP connection request to the target port.  I'm
not sure what's the most convenient way to accomplish either in Perl.

The bind() solution would provide a more trustworthy answer, but it
might actually create more problems than it solves if the OS requires a
cooling-off period before giving the port out to a different process.
        regards, tom lane

Re: Why doesn't src/backend/port/win32/socket.c implement bind()?

From

Michael Paquier

Date:

13 April 2016, 12:21:01

On Wed, Apr 13, 2016 at 9:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> While we wait to see if that actually helps give useful errors,
> I had a thought about what may be happening here.  PostgresNode.pm
> picks a random high port number and tests to see if it's free using
> pg_isready, with (unless I'm misreading) any non-zero result code
> being taken as "it's free".  The problem here is that that completely
> fails to recognize a port being used by a non-Postgres process as
> not-free --- most likely, you'll get PQPING_NO_RESPONSE for that case.
> If there's other stuff using high ports on a particular buildfarm machine,
> you'd expect occasional random test failures due to this.  The observed
> fact that some buildfarm critters are much more prone to this type of
> failure than others is well explained by this hypothesis.

Each test run uses its own custom unix_socket_directories, PGHOST is
enforced to use it, and all the port tests go through that as well.
And it seems to me that the same port number can be used as long as
the socket directory is different, no? At least that's how
PostgresNode has been designed to work, and this is useful when
running tests in parallel to avoid port and host collision.
-- 
Michael

Re: Why doesn't src/backend/port/win32/socket.c implement bind()?

From

Tom Lane

Date:

13 April 2016, 13:33:21

Michael Paquier <michael.paquier@gmail.com> writes:
> On Wed, Apr 13, 2016 at 9:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> If there's other stuff using high ports on a particular buildfarm machine,
>> you'd expect occasional random test failures due to this.  The observed
>> fact that some buildfarm critters are much more prone to this type of
>> failure than others is well explained by this hypothesis.

> Each test run uses its own custom unix_socket_directories, PGHOST is
> enforced to use it, and all the port tests go through that as well.

By that argument, we don't need the free-port-searching code on Unix at
all.  But this discussion is mostly about Windows machines.
        regards, tom lane

Re: Why doesn't src/backend/port/win32/socket.c implement bind()?

From

Michael Paquier

Date:

13 April 2016, 22:56:45

On Wed, Apr 13, 2016 at 10:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> On Wed, Apr 13, 2016 at 9:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> If there's other stuff using high ports on a particular buildfarm machine,
>>> you'd expect occasional random test failures due to this.  The observed
>>> fact that some buildfarm critters are much more prone to this type of
>>> failure than others is well explained by this hypothesis.
>
>> Each test run uses its own custom unix_socket_directories, PGHOST is
>> enforced to use it, and all the port tests go through that as well.
>
> By that argument, we don't need the free-port-searching code on Unix at
> all.  But this discussion is mostly about Windows machines.

Well, yes. That's true, we could do without. Even if this could give
an indication about a node running, as long as a port has been
associated to a node once, we just need to be sure that a new port is
not allocated. On Windows, I am not sure that it is worth the
complication to be honest, and the current code gives a small safety
net, which is better than nothing.
-- 
Michael

Re: Why doesn't src/backend/port/win32/socket.c implement bind()?

From

Alvaro Herrera

Date:

13 April 2016, 23:15:35

Michael Paquier wrote:
> On Wed, Apr 13, 2016 at 10:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Michael Paquier <michael.paquier@gmail.com> writes:
> >> On Wed, Apr 13, 2016 at 9:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>> If there's other stuff using high ports on a particular buildfarm machine,
> >>> you'd expect occasional random test failures due to this.  The observed
> >>> fact that some buildfarm critters are much more prone to this type of
> >>> failure than others is well explained by this hypothesis.
> >
> >> Each test run uses its own custom unix_socket_directories, PGHOST is
> >> enforced to use it, and all the port tests go through that as well.
> >
> > By that argument, we don't need the free-port-searching code on Unix at
> > all.  But this discussion is mostly about Windows machines.
> 
> Well, yes. That's true, we could do without. Even if this could give
> an indication about a node running, as long as a port has been
> associated to a node once, we just need to be sure that a new port is
> not allocated. On Windows, I am not sure that it is worth the
> complication to be honest, and the current code gives a small safety
> net, which is better than nothing.

If we need to fix the test so that it works in a wider environment for
Windows, I don't think it makes sense to remove anything -- rather we
should change the test as Tom suggests to verify that the port is really
free rather than just doing the pg_isready test.  Maybe the additional
test will be useless in non-Windows environment, but why cares?  It will
work all the same.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Why doesn't src/backend/port/win32/socket.c implement bind()?

From

Tom Lane

Date:

13 April 2016, 23:47:01

Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Michael Paquier wrote:
>> Well, yes. That's true, we could do without. Even if this could give
>> an indication about a node running, as long as a port has been
>> associated to a node once, we just need to be sure that a new port is
>> not allocated. On Windows, I am not sure that it is worth the
>> complication to be honest, and the current code gives a small safety
>> net, which is better than nothing.

> If we need to fix the test so that it works in a wider environment for
> Windows, I don't think it makes sense to remove anything -- rather we
> should change the test as Tom suggests to verify that the port is really
> free rather than just doing the pg_isready test.  Maybe the additional
> test will be useless in non-Windows environment, but why cares?  It will
> work all the same.

I think Michael is arguing that it's not worth fixing.  He might be right;
it's not like this is the only cause of irreproducible failures on the
Windows critters.  Still, it bugs me if we know how to make the regression
tests more reliable and do not do so.  Back when I packaged mysql for Red
Hat, I was constantly annoyed by how often their tests failed under load.
Don't want to be like that.
        regards, tom lane

Re: Why doesn't src/backend/port/win32/socket.c implement bind()?

From

Michael Paquier

Date:

14 April 2016, 00:38:46

On Thu, Apr 14, 2016 at 8:46 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> Michael Paquier wrote:
>>> Well, yes. That's true, we could do without. Even if this could give
>>> an indication about a node running, as long as a port has been
>>> associated to a node once, we just need to be sure that a new port is
>>> not allocated. On Windows, I am not sure that it is worth the
>>> complication to be honest, and the current code gives a small safety
>>> net, which is better than nothing.
>
>> If we need to fix the test so that it works in a wider environment for
>> Windows, I don't think it makes sense to remove anything -- rather we
>> should change the test as Tom suggests to verify that the port is really
>> free rather than just doing the pg_isready test.  Maybe the additional
>> test will be useless in non-Windows environment, but why cares?  It will
>> work all the same.
>
> I think Michael is arguing that it's not worth fixing.  He might be right;
> it's not like this is the only cause of irreproducible failures on the
> Windows critters.  Still, it bugs me if we know how to make the regression
> tests more reliable and do not do so.  Back when I packaged mysql for Red
> Hat, I was constantly annoyed by how often their tests failed under load.
> Don't want to be like that.

Some experiment is proving that it is actually not that complicated to
make that cross-platform:
use Socket;

my $remote = 'localhost';
my $port = 5432;
$iaddr   = inet_aton($remote);
$paddr   = sockaddr_in($port, $iaddr);
$proto   = getprotobyname("tcp");
socket(SOCK, PF_INET, SOCK_STREAM, $proto)  || die "socket: $!";
connect(SOCK, $paddr)               || die "connect: $!";
close (SOCK)                        || die "close: $!";

IO::Socket::INET is another option, but I am not seeing it in perl <
5.12, and that's not part of ActivePerl, which makes life harder on
Windows. Socket is available on both. Does that address your concerns?
-- 
Michael

Re: Why doesn't src/backend/port/win32/socket.c implement bind()?

From

Michael Paquier

Date:

14 April 2016, 11:30:39

On Thu, Apr 14, 2016 at 9:38 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> IO::Socket::INET is another option, but I am not seeing it in perl <
> 5.12, and that's not part of ActivePerl, which makes life harder on
> Windows. Socket is available on both. Does that address your concerns?

And this gives the patch attached, just took the time to hack it.
--
Michael

Attachment

tap-fix-socket.patch

Re: Why doesn't src/backend/port/win32/socket.c implement bind()?

From

Tom Lane

Date:

14 April 2016, 15:13:14

Michael Paquier <michael.paquier@gmail.com> writes:
> On Thu, Apr 14, 2016 at 9:38 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> IO::Socket::INET is another option, but I am not seeing it in perl <
>> 5.12, and that's not part of ActivePerl, which makes life harder on
>> Windows. Socket is available on both. Does that address your concerns?

> And this gives the patch attached, just took the time to hack it.

I think this is a good idea, but (1) I'm inclined not to restrict it to
Windows, and (2) I think we should hold off applying it until we've seen
a failure or two more, and can confirm whether d1b7d4877 does anything
useful for the error messages.
        regards, tom lane

Re: Why doesn't src/backend/port/win32/socket.c implement bind()?

From

Michael Paquier

Date:

14 April 2016, 23:13:12

On Fri, Apr 15, 2016 at 12:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> On Thu, Apr 14, 2016 at 9:38 AM, Michael Paquier
>> <michael.paquier@gmail.com> wrote:
>>> IO::Socket::INET is another option, but I am not seeing it in perl <
>>> 5.12, and that's not part of ActivePerl, which makes life harder on
>>> Windows. Socket is available on both. Does that address your concerns?
>
>> And this gives the patch attached, just took the time to hack it.
>
> I think this is a good idea, but (1) I'm inclined not to restrict it to
> Windows, and (2) I think we should hold off applying it until we've seen
> a failure or two more, and can confirm whether d1b7d4877 does anything
> useful for the error messages.

Both arguments are fine for me.
-- 
Michael

Re: Why doesn't src/backend/port/win32/socket.c implement bind()?

From

Tom Lane

Date:

21 April 2016, 14:46:33

I wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> And this gives the patch attached, just took the time to hack it.

> I think this is a good idea, but (1) I'm inclined not to restrict it to
> Windows, and (2) I think we should hold off applying it until we've seen
> a failure or two more, and can confirm whether d1b7d4877 does anything
> useful for the error messages.

OK, we now have failures from both bowerbird and jacana with the error
reporting patch applied:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2016-04-21%2012%3A03%3A02
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2016-04-19%2021%3A00%3A39

and they both boil down to this:

pg_ctl: could not start server
Examine the log output.
# pg_ctl failed; logfile:
LOG:  could not bind IPv4 socket: Permission denied
HINT:  Is another postmaster already running on port 60200? If not, wait a few seconds and retry.
WARNING:  could not create listen socket for "127.0.0.1"
FATAL:  could not create any TCP/IP sockets
LOG:  database system is shut down

So "permission denied" is certainly more useful than "no error", which
makes me feel that d1b7d4877+22989a8e3 are doing what they intended to
and should get back-patched --- any objections?

However, it's still not entirely clear what is the root cause of the
failure and whether a patch along the discussed lines would prevent its
recurrence.  Looking at TranslateSocketError, it seems we must be seeing
an underlying error code of WSAEACCES.  A little googling says that
Windows might indeed return that, rather than the more expected
WSAEADDRINUSE, if someone else has the port open with SO_EXCLUSIVEADDRUSE:
Another possible reason for the WSAEACCES error is that when thebind function is called (on Windows NT 4.0 with SP4 and
later),anotherapplication, service, or kernel mode driver is bound tothe same address with exclusive access. Such
exclusiveaccess is anew feature of Windows NT 4.0 with SP4 and later, and isimplemented by using the
SO_EXCLUSIVEADDRUSEoption.

So theory A is that some other program is binding random high port numbers
with SO_EXCLUSIVEADDRUSE.  Theory B is that this is the handiwork of
Windows antivirus software doing what Windows antivirus software typically
does, ie inject random permissions failures depending on the phase of the
moon.  It's not very clear that a test along the lines described (that is,
attempt to connect to, not bind to, the target port) would pre-detect
either type of error.  Under theory A, a connect() test would recognize
the problem only if the other program were using the port to listen rather
than make an outbound connection; and the latter seems much more likely.
(Possibly we could detect the latter case by checking the error code
returned by connect(), but Michael's proposed patch does no such thing.)
Under theory B, we're pretty much screwed, we don't know what will happen.

I wonder what Andrew can tell us about what else is running on that
machine and whether either theory has any credibility.

BTW, if Windows *had* returned WSAEADDRINUSE, TranslateSocketError would
have failed to translate it --- surely that's an oversight?
        regards, tom lane

Re: Why doesn't src/backend/port/win32/socket.c implement bind()?

From

Michael Paquier

Date:

22 April 2016, 07:34:30

On Thu, Apr 21, 2016 at 11:46 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> Michael Paquier <michael.paquier@gmail.com> writes:
>>> And this gives the patch attached, just took the time to hack it.
>
>> I think this is a good idea, but (1) I'm inclined not to restrict it to
>> Windows, and (2) I think we should hold off applying it until we've seen
>> a failure or two more, and can confirm whether d1b7d4877 does anything
>> useful for the error messages.
>
> OK, we now have failures from both bowerbird and jacana with the error
> reporting patch applied:
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2016-04-21%2012%3A03%3A02
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2016-04-19%2021%3A00%3A39
>
> and they both boil down to this:
>
> pg_ctl: could not start server
> Examine the log output.
> # pg_ctl failed; logfile:
> LOG:  could not bind IPv4 socket: Permission denied
> HINT:  Is another postmaster already running on port 60200? If not, wait a few seconds and retry.
> WARNING:  could not create listen socket for "127.0.0.1"
> FATAL:  could not create any TCP/IP sockets
> LOG:  database system is shut down
>
> So "permission denied" is certainly more useful than "no error", which
> makes me feel that d1b7d4877+22989a8e3 are doing what they intended to
> and should get back-patched --- any objections?

+1. That's useful in itself.

> However,
> [...]
>
> So theory A is that some other program is binding random high port numbers
> with SO_EXCLUSIVEADDRUSE.  Theory B is that this is the handiwork of
> Windows antivirus software doing what Windows antivirus software typically
> does, ie inject random permissions failures depending on the phase of the
> moon.  It's not very clear that a test along the lines described (that is,
> attempt to connect to, not bind to, the target port) would pre-detect
> either type of error.  Under theory A, a connect() test would recognize
> the problem only if the other program were using the port to listen rather
> than make an outbound connection; and the latter seems much more likely.
> (Possibly we could detect the latter case by checking the error code
> returned by connect(), but Michael's proposed patch does no such thing.)

Perl's connect() can be made more chatty. $! returns the error string,
$!+0 the errno. With the patch I sent previously, we'd need to change
this portion:
+           socket(SOCK, PF_INET, SOCK_STREAM, $proto) or die;
+           $found = 0 if connect(SOCK, $paddr);
+           close(SOCK);
Basically, that would something like that, which would be still better
than nothing I think:
if (!connect())
{    print 'connect error = ', $!, '\n';
}
Honestly, I think even if we will never reach perfection here,
something like my previous patch would still allow us to make the
tests more reliable on a platform where services listen to localhost.

> Under theory B, we're pretty much screwed, we don't know what will happen.

Indeed. If things are completely random, there is nothing guaranteeing
us that a connect() failing at instant T, meaning that a port is
available at this moment, is not going to be taken at moment (T+1)
because of the window between which the free port is checked and
postgres is going to bind this port. If we free up the port just
before starting Postgres there would be a reduced failure window,
still that cannot be reduced to 0.

> BTW, if Windows *had* returned WSAEADDRINUSE, TranslateSocketError would
> have failed to translate it --- surely that's an oversight?

Yes, and I can see you fixed that with 125ad53 already.
-- 
Michael

Re: Why doesn't src/backend/port/win32/socket.c implement bind()?

From

Tom Lane

Date:

24 April 2016, 19:44:06

I wrote:
> However, it's still not entirely clear what is the root cause of the
> failure and whether a patch along the discussed lines would prevent its
> recurrence.  Looking at TranslateSocketError, it seems we must be seeing
> an underlying error code of WSAEACCES.  A little googling says that
> Windows might indeed return that, rather than the more expected
> WSAEADDRINUSE, if someone else has the port open with SO_EXCLUSIVEADDRUSE:

>     Another possible reason for the WSAEACCES error is that when the
>     bind function is called (on Windows NT 4.0 with SP4 and later),
>     another application, service, or kernel mode driver is bound to
>     the same address with exclusive access. Such exclusive access is a
>     new feature of Windows NT 4.0 with SP4 and later, and is
>     implemented by using the SO_EXCLUSIVEADDRUSE option.

> So theory A is that some other program is binding random high port numbers
> with SO_EXCLUSIVEADDRUSE.  Theory B is that this is the handiwork of
> Windows antivirus software doing what Windows antivirus software typically
> does, ie inject random permissions failures depending on the phase of the
> moon.  It's not very clear that a test along the lines described (that is,
> attempt to connect to, not bind to, the target port) would pre-detect
> either type of error.  Under theory A, a connect() test would recognize
> the problem only if the other program were using the port to listen rather
> than make an outbound connection; and the latter seems much more likely.

I took a second look at the above-quoted Microsoft documentation, and
noticed that it specifies that this error occurs when another application
is *bound* to the target address.  If by that they mean that the other
app did a bind(), then indeed what we're seeing here is a conflict with
a listening app, so that the proposed patch would detect it.  So I went
ahead and pushed the patch --- in any case, it shouldn't make things
any worse.

Also, I did a bit of digging in the buildfarm logs, and noticed that
bowerbird and jacana together have reported 34 "could not bind socket"
failures in BinInstallCheck since 2015-12-07 (when the current logic for
selecting a random port went in).  Between 2015-01-01 and 2015-12-07,
they reported only *one* such failure.  So whatever the exact explanation
is, we've greatly increased the probability of such failures by using a
random port rather than the fixed port 65432 that was used before.
I'm not entirely sure what to make of this observation, but the statistics
seem pretty clear.
        regards, tom lane

Re: Why doesn't src/backend/port/win32/socket.c implement bind()?

From

Michael Paquier

Date:

25 April 2016, 03:26:43

On Mon, Apr 25, 2016 at 4:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I took a second look at the above-quoted Microsoft documentation, and
> noticed that it specifies that this error occurs when another application
> is *bound* to the target address.  If by that they mean that the other
> app did a bind(), then indeed what we're seeing here is a conflict with
> a listening app, so that the proposed patch would detect it.  So I went
> ahead and pushed the patch --- in any case, it shouldn't make things
> any worse.

Not worse, and still not enough... bowerbird complained again:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2016-04-25%2002%3A13%3A54
--
Michael

Re: Why doesn't src/backend/port/win32/socket.c implement bind()?

From

Tom Lane

Date:

25 April 2016, 03:31:29

Michael Paquier <michael.paquier@gmail.com> writes:
> Not worse, and still not enough... bowerbird complained again:
> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2016-04-25%2002%3A13%3A54

That's a different symptom that seems unrelated:

cannot remove directory for C:\prog\bf\root\HEAD\pgsql.build\src\bin\scripts\tmp_check\data_main_21Nw\pgdata\global:
Directorynot empty at C:/Perl64/lib/File/Temp.pm line 902.

cannot remove directory for C:\prog\bf\root\HEAD\pgsql.build\src\bin\scripts\tmp_check\data_main_21Nw\pgdata\pg_xlog:
Directorynot empty at C:/Perl64/lib/File/Temp.pm line 902.

cannot remove directory for C:\prog\bf\root\HEAD\pgsql.build\src\bin\scripts\tmp_check\data_main_21Nw\pgdata:
Permissiondenied at C:/Perl64/lib/File/Temp.pm line 902.

cannot remove directory for C:\prog\bf\root\HEAD\pgsql.build\src\bin\scripts\tmp_check\data_main_21Nw: Directory not
emptyat C:/Perl64/lib/File/Temp.pm line 902.

### Signalling QUIT to 12200 for node "main"
# Running: pg_ctl kill QUIT 12200

We've seen that one before, though less often than the port-in-use errors.
Maybe it's failing to wait long enough for server shutdown?
        regards, tom lane

Re: Why doesn't src/backend/port/win32/socket.c implement bind()?

From

Tom Lane

Date:

25 April 2016, 15:14:24

I wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> Not worse, and still not enough... bowerbird complained again:
>> http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2016-04-25%2002%3A13%3A54

> That's a different symptom that seems unrelated:

> cannot remove directory for C:\prog\bf\root\HEAD\pgsql.build\src\bin\scripts\tmp_check\data_main_21Nw\pgdata\global:
Directorynot empty at C:/Perl64/lib/File/Temp.pm line 902.

Ah, scratch that, I was taking that as being the cause of the reported
failure but it's just noise, cf <31417.1461595864@sss.pgh.pa.us>.

You're right, we're still getting

# pg_ctl failed; logfile:
LOG:  could not bind IPv4 socket: Permission denied
HINT:  Is another postmaster already running on port 60208? If not, wait a few seconds and retry.
WARNING:  could not create listen socket for "127.0.0.1"
FATAL:  could not create any TCP/IP sockets
LOG:  database system is shut down
Bail out!  pg_ctl failed

So the connect() test is inadequate.  Let's try bind() with SO_REUSEADDR
and see whether that makes things better or worse.
        regards, tom lane