Re: Why is src/test/modules/committs/t/002_standby.pl flaky? - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Why is src/test/modules/committs/t/002_standby.pl flaky? |
Date | |
Msg-id | 20220201173830.3mzesliogjlpmnkc@alap3.anarazel.de Whole thread Raw |
In response to | Re: Why is src/test/modules/committs/t/002_standby.pl flaky? (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: Why is src/test/modules/committs/t/002_standby.pl flaky?
|
List | pgsql-hackers |
Hi, On 2022-02-01 18:02:34 +1300, Thomas Munro wrote: > 1. It sounds like no one really loves the WSAPoll() kludge, even > though it apparently works for simple cases. Yea, at least I don't :) > 2. The long-lived-WaitEventSets-everywhere concept was initially > appealling to me and solves the walreceiver problem (when combined > with a sticky seen_fd_close flag), and I've managed to get that > working correctly across libpq reconnects. As mentioned, I also have > some toy patches along those lines for the equivalent but more complex > problem in postgres_fdw, because I've been studying how to make > parallel append generate a tidy stream of epoll_wait()/kevent() calls, > instead of a quadratic explosion of setup/teardown spam. I'll write > some more about those patches and hopefully propose them soon anyway, > but on reflection I don't really want that Unix efficiency problem to > be tangled up with this Windows correctness problem. That'd require a > programming rule that I don't want to burden us with forever: you'd > *never* be able to use a socket in more than one WaitEventSet, and > WaitLatchOrSocket() would have to be removed. Which seems like a bad direction to go in. > 3. The real solution to this problem is to recognise that we just > have the event objects in the wrong place. WaitEventSets shouldn't > own them: they need to be 1:1 with sockets, or Winsock will eat > events. Likewise for the flag you need for edge->level conversion, or > *we'll* eat events. Having now tried that, it's starting to feel like > the best way forward, even though my initial prototype (see attached) > is maybe a tad cumbersome with bookkeeping. I believe it means that > all existing coding patterns *should* now be safe (not yet confirmed > by testing), and we're free to put sockets in multiple WESes even at > the same time if the need arises. Agreed. > The basic question is: how should a socket user find the associated > event handle and flags? Some answers: > > 1. "pgsocket" could become a pointer to a heap-allocated wrapper > object containing { socket, event, flags } on Windows, or something > like that, but that seems a bit invasive and tangled up with public > APIs like libpq, which put me off trying that. I'm willing to explore > it if people object to my other idea. I'm not sure if the libpq aspect really is a problem. We're not going to have to do that conversion repeatedly, I think. > 2. "pgsocket" could stay unchanged, but we could have a parallel > array with extra socket state, indexed by file descriptor. We could > use new socket()/close() libpq events so that libpq's sockets could be > registered in this scheme without libpq itself having to know anything > about that. That worked pretty nicely when I developed it on my > FreeBSD box, but on Windows I soon learned that SOCKET is really yet > another name for HANDLE, so it's not a dense number space anchored at > 0 like Unix file descriptors. The array could be prohibitively big. Yes, that seems like a no-go. It also doesn't seem like it'd gain much in the robustness department over 1) - you'd not know if a socket had been closed and a new one with the same integer value had been created. > 3. I tried the same as #2 but with a hash table, and ran into another > small problem when putting it all together: we probably don't want to > longjump out of libpq callbacks on allocation failure. So, I modified > simplehash to add a no-OOM behaviour. That's the POC patch set I'm > attaching for show-and-tell. Some notes and TODOs in the commit > messages and comments. 1) seems more plausible to me, but I can see this working as well. > From bdd90aeb65d82ecae8fe58b441d25a1e1b129bf3 Mon Sep 17 00:00:00 2001 > From: Thomas Munro <thomas.munro@gmail.com> > Date: Sat, 29 Jan 2022 02:15:10 +1300 > Subject: [PATCH 1/3] Add low level socket events for libpq. > > Provide a way to get a callback when a socket is created or closed. > > XXX TODO handle callback failure > XXX TODO investigate overheads/other implications of having a callback > installed What do we need this for? I still don't understand what kind of reconnects we need to automatically need to detect. > +#ifdef SH_RAW_ALLOCATOR_NOZERO > + memset(tb, 0, sizeof(SH_TYPE)); > +#endif ... > +#ifdef SH_RAW_ALLOCATOR_NOZERO > + memset(newdata, 0, sizeof(SH_ELEMENT_TYPE) * newsize); > +#endif Seems like this should be handled in an allocator wrapper, rather than in multiple places in the simplehash code? > +#if defined(WIN32) || defined(USE_ASSERT_CHECKING) > +static socktab_hash *socket_table; > +#endif Perhaps a separate #define for this would be appropriate? So we don't have to spell the exact condition out every time. > +ExtraSocketState * > +SocketTableAdd(pgsocket sock, bool no_oom) > +{ > +#if defined(WIN32) || defined(USE_ASSERT_CHECKING) > + SocketTableEntry *ste; > + ExtraSocketState *ess; Given there's goto targets that test for ess != NULL, it seems nicer to initialize it to NULL. I don't think there's problematic paths right now, but it seems unnecessary to "risk" that changing over time. > +#if !defined(FRONTEND) > +struct ExtraSocketState > +{ > +#ifdef WIN32 > + HANDLE event_handle; /* one event for the life of the socket */ > + int flags; /* most recent WSAEventSelect() flags */ > + bool seen_fd_close; /* has FD_CLOSE been received? */ > +#else > + int dummy; /* none of this is needed for Unix */ > +#endif > +}; Seems like we might want to track more readiness events than just close? If we e.g. started tracking whether we've seen writes blocking / write readiness, we could get rid of cruft like /* * Windows does not guarantee to log an FD_WRITE network event * indicating that more data can be sent unless the previous send() * failed with WSAEWOULDBLOCK. While our caller might well have made * such a call, we cannot assume that here. Therefore, if waiting for * write-ready, force the issue by doing a dummy send(). If the dummy * send() succeeds, assume that the socket is in fact write-ready, and * return immediately. Also, if it fails with something other than * WSAEWOULDBLOCK, return a write-ready indication to let our caller * deal with the error condition. */ that seems likely to just make bugs less likely, rather than actually fix them... Greetings, Andres Freund
pgsql-hackers by date: