Re: [PATCH] OAuth: fix performance bug with stuck multiplexer events - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: [PATCH] OAuth: fix performance bug with stuck multiplexer events |
Date | |
Msg-id | CA+hUKGKV5aM1K3gc_kAMtUq-BkD5AugLUCDVa-RMnbfFypALwQ@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] OAuth: fix performance bug with stuck multiplexer events (Jacob Champion <jacob.champion@enterprisedb.com>) |
Responses |
Re: [PATCH] OAuth: fix performance bug with stuck multiplexer events
Re: [PATCH] OAuth: fix performance bug with stuck multiplexer events |
List | pgsql-hackers |
On Thu, Aug 7, 2025 at 11:55 AM Jacob Champion <jacob.champion@enterprisedb.com> wrote: > On Wed, Aug 6, 2025 at 9:13 AM Jacob Champion > <jacob.champion@enterprisedb.com> wrote: > > Maybe "drain" would no longer be the > > verb to use there. > > I keep describing this as "combing" the queue when I talk about it in > person, so v3-0001 renames this new operation to comb_multiplexer(). > And the CI (plus the more strenuous TLS tests) confirms that the > callback count is still stable with this weaker guarantee, so I've > gotten rid of the event-counting code. I was about to hit send on an email suggesting "reset_multiplexer()", and an attempt at distilling the explanation to a short paragraph, but your names and explanations are also good so please feel 100% free to ignore these suggestions. "Unlike epoll descriptors, kqueue descriptors only transition from readable to unreadable when kevent() is called and finds nothing, after removing level-triggered conditions that have gone away. We therefore need a dummy kevent() call after operations might have been performed on the monitored sockets or timer_fd. Any event returned is ignored here, but it also remains queued (being level-triggered) and leaves the descriptor readable. This is a no-op for epoll descriptors." Reviewing the timer stuff, it is again tempting to try to use an EVFILT_TIMER directly, but I like your approach: it's nice to be able to mirror the Linux coding, with this minor kink ironed out. FWIW I re-read the kqueue paper's discussion of the goals of making kqueue descriptors themselves monitorable/pollable, and it seems it was mainly intended for hierarchies of kqueues, like your timer_fd, with the specific aim of expressing priorities. It doesn't talk about giving them to code that doesn't know it has a kqueue fd (the client) and never calls kevent() and infers the events instead (libcurl). That said, the fact that the filter function for kqueue fds just checks queue size > 0 without running the filter functions for the queued events does seem like a bit of an abstraction leak from this vantage point. At least it's easy enough to work around in the kqueue-managing middleman code once you understand it. > Now that I'm no longer counting events, I can collapse the changes to > register_socket(). I can't revert those changes entirely, because then > we regress the case where Curl switches a socket from IN to OUT (this > is enforced by the new unit tests). But I'm not sure that the existing > comment adequately explained that fix anyway, and I didn't remember to > call it out in my initial email, so I've split it out into v3-0002. > It's much smaller. Much nicer! Yeah, that all makes sense. > The tests (now in 0005) have been adjusted for the new "combing" > behavior, and I've added a case to ensure that multiple stale events > are swept up by a single call to comb_multiplexer(). This all looks pretty good to me. I like the C TAP test. PostgreSQL needs more of this. s/signalled/signaled/ (= US spelling) in a couple of places.
pgsql-hackers by date: