Home > mailing lists

Re: Why is src/test/modules/committs/t/002_standby.pl flaky? - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: Why is src/test/modules/committs/t/002_standby.pl flaky?
Date	February 1, 2022 17:38:30
Msg-id	20220201173830.3mzesliogjlpmnkc@alap3.anarazel.de Whole thread Raw
In response to	Re: Why is src/test/modules/committs/t/002_standby.pl flaky? (Thomas Munro <thomas.munro@gmail.com>)
Responses	Re: Why is src/test/modules/committs/t/002_standby.pl flaky?
List	pgsql-hackers

Tree view

Hi,

On 2022-02-01 18:02:34 +1300, Thomas Munro wrote:
> 1.  It sounds like no one really loves the WSAPoll() kludge, even
> though it apparently works for simple cases.

Yea, at least I don't :)


> 2.  The long-lived-WaitEventSets-everywhere concept was initially
> appealling to me and solves the walreceiver problem (when combined
> with a sticky seen_fd_close flag), and I've managed to get that
> working correctly across libpq reconnects.  As mentioned, I also have
> some toy patches along those lines for the equivalent but more complex
> problem in postgres_fdw, because I've been studying how to make
> parallel append generate a tidy stream of epoll_wait()/kevent() calls,
> instead of a quadratic explosion of setup/teardown spam.  I'll write
> some more about those patches and hopefully propose them soon anyway,
> but on reflection I don't really want that Unix efficiency problem to
> be tangled up with this Windows correctness problem.  That'd require a
> programming rule that I don't want to burden us with forever: you'd
> *never* be able to use a socket in more than one WaitEventSet, and
> WaitLatchOrSocket() would have to be removed.

Which seems like a bad direction to go in.


> 3.  The real solution to this problem is to recognise that we just
> have the event objects in the wrong place.  WaitEventSets shouldn't
> own them: they need to be 1:1 with sockets, or Winsock will eat
> events.  Likewise for the flag you need for edge->level conversion, or
> *we'll* eat events.  Having now tried that, it's starting to feel like
> the best way forward, even though my initial prototype (see attached)
> is maybe a tad cumbersome with bookkeeping.  I believe it means that
> all existing coding patterns *should* now be safe (not yet confirmed
> by testing), and we're free to put sockets in multiple WESes even at
> the same time if the need arises.

Agreed.


> The basic question is: how should a socket user find the associated
> event handle and flags?  Some answers:
> 
> 1.  "pgsocket" could become a pointer to a heap-allocated wrapper
> object containing { socket, event, flags } on Windows, or something
> like that, but that seems a bit invasive and tangled up with public
> APIs like libpq, which put me off trying that.  I'm willing to explore
> it if people object to my other idea.

I'm not sure if the libpq aspect really is a problem. We're not going to have
to do that conversion repeatedly, I think.


> 2.  "pgsocket" could stay unchanged, but we could have a parallel
> array with extra socket state, indexed by file descriptor.  We could
> use new socket()/close() libpq events so that libpq's sockets could be
> registered in this scheme without libpq itself having to know anything
> about that.  That worked pretty nicely when I developed it on my
> FreeBSD box, but on Windows I soon learned that SOCKET is really yet
> another name for HANDLE, so it's not a dense number space anchored at
> 0 like Unix file descriptors.  The array could be prohibitively big.

Yes, that seems like a no-go. It also doesn't seem like it'd gain much in the
robustness department over 1) - you'd not know if a socket had been closed and
a new one with the same integer value had been created.


> 3.  I tried the same as #2 but with a hash table, and ran into another
> small problem when putting it all together: we probably don't want to
> longjump out of libpq callbacks on allocation failure.  So, I modified
> simplehash to add a no-OOM behaviour.  That's the POC patch set I'm
> attaching for show-and-tell.  Some notes and TODOs in the commit
> messages and comments.

1) seems more plausible to me, but I can see this working as well.


> From bdd90aeb65d82ecae8fe58b441d25a1e1b129bf3 Mon Sep 17 00:00:00 2001
> From: Thomas Munro <thomas.munro@gmail.com>
> Date: Sat, 29 Jan 2022 02:15:10 +1300
> Subject: [PATCH 1/3] Add low level socket events for libpq.
> 
> Provide a way to get a callback when a socket is created or closed.
> 
> XXX TODO handle callback failure
> XXX TODO investigate overheads/other implications of having a callback
> installed

What do we need this for? I still don't understand what kind of reconnects we
need to automatically need to detect.


> +#ifdef SH_RAW_ALLOCATOR_NOZERO
> +    memset(tb, 0, sizeof(SH_TYPE));
> +#endif
...
> +#ifdef SH_RAW_ALLOCATOR_NOZERO
> +    memset(newdata, 0, sizeof(SH_ELEMENT_TYPE) * newsize);
> +#endif

Seems like this should be handled in an allocator wrapper, rather than in
multiple places in the simplehash code?


> +#if defined(WIN32) || defined(USE_ASSERT_CHECKING)
> +static socktab_hash *socket_table;
> +#endif

Perhaps a separate #define for this would be appropriate? So we don't have to
spell the exact condition out every time.



> +ExtraSocketState *
> +SocketTableAdd(pgsocket sock, bool no_oom)
> +{
> +#if defined(WIN32) || defined(USE_ASSERT_CHECKING)
> +    SocketTableEntry *ste;
> +    ExtraSocketState *ess;

Given there's goto targets that test for ess != NULL, it seems nicer to
initialize it to NULL. I don't think there's problematic paths right now, but
it seems unnecessary to "risk" that changing over time.


> +#if !defined(FRONTEND)
> +struct ExtraSocketState
> +{
> +#ifdef WIN32
> +    HANDLE        event_handle;        /* one event for the life of the socket */
> +    int            flags;                /* most recent WSAEventSelect() flags */
> +    bool        seen_fd_close;        /* has FD_CLOSE been received? */
> +#else
> +    int            dummy;                /* none of this is needed for Unix */
> +#endif
> +};

Seems like we might want to track more readiness events than just close? If we
e.g. started tracking whether we've seen writes blocking  / write readiness,
we could get rid of cruft like

        /*
         * Windows does not guarantee to log an FD_WRITE network event
         * indicating that more data can be sent unless the previous send()
         * failed with WSAEWOULDBLOCK.  While our caller might well have made
         * such a call, we cannot assume that here.  Therefore, if waiting for
         * write-ready, force the issue by doing a dummy send().  If the dummy
         * send() succeeds, assume that the socket is in fact write-ready, and
         * return immediately.  Also, if it fails with something other than
         * WSAEWOULDBLOCK, return a write-ready indication to let our caller
         * deal with the error condition.
         */

that seems likely to just make bugs less likely, rather than actually fix them...

Greetings,

Andres Freund

pgsql-hackers by date:

From: Robert Treat
Date: 01 February 2022, 17:35:41
Subject: Re: autovacuum prioritization

From: Bruce Momjian
Date: 01 February 2022, 17:50:46
Subject: Re: XTS cipher mode for cluster file encryption

Re: Why is src/test/modules/committs/t/002_standby.pl flaky? - Mailing list pgsql-hackers

Previous

Next