Re: Cygwin PostgreSQL Regression Test Problems (Revisited) - Mailing list pgsql-ports
| From | Jason Tishler | 
|---|---|
| Subject | Re: Cygwin PostgreSQL Regression Test Problems (Revisited) | 
| Date | |
| Msg-id | 20010402131917.C798@dothill.com Whole thread Raw  | 
		
| In response to | Re: Cygwin PostgreSQL Regression Test Problems (Revisited) (Tom Lane <tgl@sss.pgh.pa.us>) | 
| Responses | 
                	
            		Re: Cygwin PostgreSQL Regression Test Problems (Revisited)
            		
            		 | 
		
| List | pgsql-ports | 
Tom,
On Sun, Apr 01, 2001 at 01:57:35PM -0400, Tom Lane wrote:
> Jason Tishler <Jason.Tishler@dothill.com> writes:
> > I'm glad that you agree.  Please post to the list when the change is in
> > CVS and I will test that this solves the Cygwin regression test (i.e.,
> > psql) hangs.
>
> Done as of yesterday; should be in this morning's snapshot.
Thanks.
> > Actually, the blocking connect() change for Cygwin is obviated by the
> > pqWait() fix.  So, I am now no longer recommending making the blocking
> > connect() change for Cygwin.  Unless, you do so for other Unixes too.
>
> I made both changes in the hope that the blocking connect change would
> suppress your problem with connection-refused failures.  If it does not,
> then we may as well reverse out the fe-connect.c change.  Let me know.
With both changes or only the fe-connect.c one, psql does not hang and
displays the following error message when the connection is refused:
psql: connectDBStart() -- connect() failed: Connection refused
        Is the postmaster running locally
        and accepting connections on Unix socket '/tmp/.s.PGSQL.65432'?
With only the fe-misc.c change, psql does not hang and displays the
following error message when the connection is refused:
psql: PQconnectPoll() -- connect() failed: error 10061
        Is the postmaster running locally
        and accepting connections on Unix socket '/tmp/.s.PGSQL.65432'?
In both cases there are no hangs, just the error messages are different.
Unfortunately, for the non-blocking case the error message is cryptic.
I tried tracking down error "10061" which comes from getsockopt(), but
I was unsuccessful.  Is there any way to improve the readability of this
error message?
Also, the blocking connect change did *not* fix the connection refused
(spurious) regression test failures.  So this change should probably be
backed out.
> > I'm wondering whether it makes sense to add a simple connection retry
> > policy as suggested above by Hiroshi?
>
> I do not think it is appropriate for libpq to do that.
When I made my suggestion above, I was concerned that may be libpq was not
the right layer to be implementing connection policies and that possibly
psql was the better place.
> For one thing, where would you stop --- why exactly two tries?
This was another one of my concerns too.
> > 2. Change the backlog parameter to listen() in src/backend/libpq/pqcomm.c
> > to a number that will "ensure" that the parallel_schedule version of the
> > regression test does not generate connection refused conditions.  Note
> > that I'm not even sure this will really work on all (or any) platforms.
>
> We already use SOMAXCONN which is supposed to be defined by the system
> as the maximum allowed queue depth.  If Cygwin fails to define it, or
> defines it as something less than it should be, then we might consider
> installing a Cygwin-specific hack to redefine SOMAXCONN.
Cygwin defines SOMAXCONN to be 5.  However, winsock.h defines it to be 5
while winsock2.h defines it to be 0x7fffffff.  So, I'm not sure what it
the real Cygwin (i.e., Windows) maximum.
> However Hiroshi says later that he already tried this.
Even if it worked, this would have just pushed the problem instead of
really fixing it.
> I'm inclined to think
> that Cygwin simply has a problem with servicing concurrent connection
> requests, perhaps even before the alleged SOMAXCONN value is reached.
You meant Windows.  Right? :,)
In summary, I feel that the fe-connect.c change should be backed out so
that Cygwin will be consistent with other UNIXes.  I also hope that the
non-blocking connection failure message can be made more readable and
that make check will not generate spurious failure messages under Cygwin
on slow machines.
Thanks,
Jason
--
Jason Tishler
Director, Software Engineering       Phone: +1 (732) 264-8770 x235
Dot Hill Systems Corp.               Fax:   +1 (732) 264-8798
82 Bethany Road, Suite 7             Email: Jason.Tishler@dothill.com
Hazlet, NJ 07730 USA                 WWW:   http://www.dothill.com
		
	pgsql-ports by date: