Thread: pg_basebackup WAL streamer shutdown is bogus - leading to slow tests
Hi, I was wondering in [1] what we could do about the slowest tests on windows. On 2021-12-31 11:25:28 -0800, Andres Freund wrote: > Picking a random successful cfbot run [1] I see the following tap tests taking > more than 20 seconds: > > 67188 ms pg_basebackup t/010_pg_basebackup.pl > 59710 ms recovery t/001_stream_rep.pl Comparing these times to measurements taken on my normal linux workstation, something seemed just *very* off, even with a slow CI instance and windows in the mix. A bunch of printf debugging later, I realized the problem is that several of the pg_basebackups in tests take a *long* time. E.g. for t/001_stream_rep.pl the backups from the standby each take just over 10s. That's awfully specific... # Taking pg_basebackup my_backup from node "standby_1" # Running: pg_basebackup -D C:/dev/postgres/./tmp_check/t_001_stream_rep_standby_1_data/backup/my_backup -h C:/Users/myadmin/AppData/Local/Temp/yba26PBYX1-p 59181 --checkpoint fast --no-sync --label my_backup -v # ran in 10.145s # Backup finished This reproduceably happens and it's *not* related to the socket shutdown() changes we've been debugging lately - even after a revert the problem persists. Because our logging for basebackups is quite weak, both for server and client side, I needed to add a fair bit more debugging to figure it out: pg_basebackup: wait to finish at 0.492 pg_basebackup: waiting for background process to finish streaming ... pg_basebackup: stream poll timeout 10.112 The problem is that there's just no implemented way to timely shutdown the WAL streaming thread in pg_basebackup. The code in pg_basebackup.c says: if (verbose) pg_log_info("waiting for background process to finish streaming ..."); ... /* * On Windows, since we are in the same process, we can just store the * value directly in the variable, and then set the flag that says * it's there. */ ... xlogendptr = ((uint64) hi) << 32 | lo; InterlockedIncrement(&has_xlogendptr); But just setting a variable doesn't do much if the thread is in HandleCopyStream()->CopyStreamPoll()->select() The only reason we ever succeed shutting down, without more WAL coming in, is that pg_basebackup defaults to sending a status message every 10 seconds. At which point the thread sees has_xlogendptr = true, and shuts down. A test specific workaround would be to just add --status-interval=1 to Cluster.pm::backup(). But that seems very unsatisfying. I don't immediately see a solution for this, other than to add StreamCtl->stop_event (mirroring ->stop_socket) and then convert CopyStreamPoll() to use WaitForMultipleObjects(). Microsoft's select() doesn't support pipes and there's no socketpair(). Any more straightforward ideas? From a cursory look at history, it used to be that pg_basebackup had this behaviour on all platforms, but it got fixed for other platforms in 7834d20b57a by Tom (assuming the problem wasn't present there). Greetings, Andres Freund [1] https://postgr.es/m/20211231192528.wirwj4qaaw3ted5g%40alap3.anarazel.de
On Sun, Jan 16, 2022 at 10:22 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > I was wondering in [1] what we could do about the slowest tests on > windows. > > On 2021-12-31 11:25:28 -0800, Andres Freund wrote: > > Picking a random successful cfbot run [1] I see the following tap tests taking > > more than 20 seconds: > > > > 67188 ms pg_basebackup t/010_pg_basebackup.pl > > 59710 ms recovery t/001_stream_rep.pl > > Comparing these times to measurements taken on my normal linux workstation, > something seemed just *very* off, even with a slow CI instance and windows in > the mix. > > A bunch of printf debugging later, I realized the problem is that several of > the pg_basebackups in tests take a *long* time. E.g. for t/001_stream_rep.pl > the backups from the standby each take just over 10s. That's awfully > specific... > > # Taking pg_basebackup my_backup from node "standby_1" > # Running: pg_basebackup -D C:/dev/postgres/./tmp_check/t_001_stream_rep_standby_1_data/backup/my_backup -h C:/Users/myadmin/AppData/Local/Temp/yba26PBYX1-p 59181 --checkpoint fast --no-sync --label my_backup -v > # ran in 10.145s > # Backup finished > > This reproduceably happens and it's *not* related to the socket shutdown() > changes we've been debugging lately - even after a revert the problem > persists. > > Because our logging for basebackups is quite weak, both for server and client > side, I needed to add a fair bit more debugging to figure it out: > > pg_basebackup: wait to finish at 0.492 > pg_basebackup: waiting for background process to finish streaming ... > pg_basebackup: stream poll timeout 10.112 > > The problem is that there's just no implemented way to timely shutdown the WAL > streaming thread in pg_basebackup. The code in pg_basebackup.c says: > > if (verbose) > pg_log_info("waiting for background process to finish streaming ..."); > ... > /* > * On Windows, since we are in the same process, we can just store the > * value directly in the variable, and then set the flag that says > * it's there. > */ > ... > xlogendptr = ((uint64) hi) << 32 | lo; > InterlockedIncrement(&has_xlogendptr); > > But just setting a variable doesn't do much if the thread is in > HandleCopyStream()->CopyStreamPoll()->select() > > The only reason we ever succeed shutting down, without more WAL coming in, is > that pg_basebackup defaults to sending a status message every 10 seconds. At > which point the thread sees has_xlogendptr = true, and shuts down. > > > A test specific workaround would be to just add --status-interval=1 to > Cluster.pm::backup(). But that seems very unsatisfying. > > I don't immediately see a solution for this, other than to add > StreamCtl->stop_event (mirroring ->stop_socket) and then convert > CopyStreamPoll() to use WaitForMultipleObjects(). Microsoft's select() > doesn't support pipes and there's no socketpair(). > > Any more straightforward ideas? > > > From a cursory look at history, it used to be that pg_basebackup had this > behaviour on all platforms, but it got fixed for other platforms in > 7834d20b57a by Tom (assuming the problem wasn't present there). Ugh, yeah that sounds like a correct analysis to me, and ugh, yeah that's not very nice. And yes, I think we have to create an event, and then use WSAEventSelect() + WaitForSingleObjectEx(). Should be enough to just use one event I think, and then the timeout -- but it might be more readable to have a separate event for the socket and the stop? But we can have just one event that's both used to stop and then use WSAEventSelect() to associate it with the socket as well as neede. (And yes, I agree that it's a lot better to fix it properly than to just reduce the timeout) -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Andres Freund <andres@anarazel.de> writes: > I don't immediately see a solution for this, other than to add > StreamCtl->stop_event (mirroring ->stop_socket) and then convert > CopyStreamPoll() to use WaitForMultipleObjects(). Microsoft's select() > doesn't support pipes and there's no socketpair(). > Any more straightforward ideas? > From a cursory look at history, it used to be that pg_basebackup had this > behaviour on all platforms, but it got fixed for other platforms in > 7834d20b57a by Tom (assuming the problem wasn't present there). Hmm --- I see that I thought Windows was unaffected, but I didn't consider this angle. Can we send the child process a signal to kick it off its wait? regards, tom lane
On Sun, Jan 16, 2022 at 5:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Andres Freund <andres@anarazel.de> writes: > > I don't immediately see a solution for this, other than to add > > StreamCtl->stop_event (mirroring ->stop_socket) and then convert > > CopyStreamPoll() to use WaitForMultipleObjects(). Microsoft's select() > > doesn't support pipes and there's no socketpair(). > > Any more straightforward ideas? > > From a cursory look at history, it used to be that pg_basebackup had this > > behaviour on all platforms, but it got fixed for other platforms in > > 7834d20b57a by Tom (assuming the problem wasn't present there). > > Hmm --- I see that I thought Windows was unaffected, but I didn't > consider this angle. > > Can we send the child process a signal to kick it off its wait? No. (1) on Windows it's not a child process, it's a thread. And (2) Windows doesn't have signals. We emulate those *in the backend* for win32, but this problem is in the frontend where that emulation layer doesn't exist. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
On Sun, Jan 16, 2022 at 5:36 PM Magnus Hagander <magnus@hagander.net> wrote: > > On Sun, Jan 16, 2022 at 5:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > Andres Freund <andres@anarazel.de> writes: > > > I don't immediately see a solution for this, other than to add > > > StreamCtl->stop_event (mirroring ->stop_socket) and then convert > > > CopyStreamPoll() to use WaitForMultipleObjects(). Microsoft's select() > > > doesn't support pipes and there's no socketpair(). > > > Any more straightforward ideas? > > > From a cursory look at history, it used to be that pg_basebackup had this > > > behaviour on all platforms, but it got fixed for other platforms in > > > 7834d20b57a by Tom (assuming the problem wasn't present there). > > > > Hmm --- I see that I thought Windows was unaffected, but I didn't > > consider this angle. > > > > Can we send the child process a signal to kick it off its wait? > > No. (1) on Windows it's not a child process, it's a thread. And (2) > Windows doesn't have signals. We emulate those *in the backend* for > win32, but this problem is in the frontend where that emulation layer > doesn't exist. Actually, just after sending that... What we could do is do a WSACancelBlockingCall() which will cancel the select() thereby making us do the check. However, per the docs (https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-wsacancelblockingcall) this function is no longer exported in Winsock 2, so this does not seem to be the right way forward. There is no replacement function for it -- the suggestion is basically "don't do that, use multithreading instaed" which I think brings us back to the original suggestion of WSAEventSelect(). -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Hi, On 2022-01-16 17:39:11 +0100, Magnus Hagander wrote: > On Sun, Jan 16, 2022 at 5:36 PM Magnus Hagander <magnus@hagander.net> wrote: > > > > On Sun, Jan 16, 2022 at 5:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > > > > Andres Freund <andres@anarazel.de> writes: > > > > I don't immediately see a solution for this, other than to add > > > > StreamCtl->stop_event (mirroring ->stop_socket) and then convert > > > > CopyStreamPoll() to use WaitForMultipleObjects(). Microsoft's select() > > > > doesn't support pipes and there's no socketpair(). > > > > Any more straightforward ideas? > > > > From a cursory look at history, it used to be that pg_basebackup had this > > > > behaviour on all platforms, but it got fixed for other platforms in > > > > 7834d20b57a by Tom (assuming the problem wasn't present there). > > > > > > Hmm --- I see that I thought Windows was unaffected, but I didn't > > > consider this angle. > > > > > > Can we send the child process a signal to kick it off its wait? > > > > No. (1) on Windows it's not a child process, it's a thread. And (2) > > Windows doesn't have signals. We emulate those *in the backend* for > > win32, but this problem is in the frontend where that emulation layer > > doesn't exist. > > [...] which I think brings us back to the original suggestion of > WSAEventSelect(). I hacked that up last night. And a fix or two later, it seems to be working. What I'd missed at first is that the event needs to be reset in reached_end_position(), otherwise we'll busy loop. I wonder if using a short-lived event handle would have dangers of missing FD_CLOSE here as well? It'd probably be worth avoiding the risk by creating the event just once. I just wasn't immediately sure where to stash it. Probably just by adding a field in StreamCtl, that ReceiveXlogStream() then sets? So far it's constant once passed to ReceiveXlogStream(), but I don't really see a reason why it'd need to stay that way? Greetings, Andres Freund
Hi, On 2022-01-16 15:28:00 -0800, Andres Freund wrote: > I hacked that up last night. And a fix or two later, it seems to be > working. What I'd missed at first is that the event needs to be reset in > reached_end_position(), otherwise we'll busy loop. > > I wonder if using a short-lived event handle would have dangers of missing > FD_CLOSE here as well? It'd probably be worth avoiding the risk by creating > the event just once. > > I just wasn't immediately sure where to stash it. Probably just by adding a > field in StreamCtl, that ReceiveXlogStream() then sets? So far it's constant > once passed to ReceiveXlogStream(), but I don't really see a reason why it'd > need to stay that way? Oops, attached the patch this time. Greetings, Andres Freund
Attachment
On Mon, Jan 17, 2022 at 12:31 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2022-01-16 15:28:00 -0800, Andres Freund wrote: > > I hacked that up last night. And a fix or two later, it seems to be > > working. What I'd missed at first is that the event needs to be reset in > > reached_end_position(), otherwise we'll busy loop. You can create the event with bManualReset set to False to avoid that, no? With this usecase, I don't really see a reason not to do that instead? > > I wonder if using a short-lived event handle would have dangers of missing > > FD_CLOSE here as well? It'd probably be worth avoiding the risk by creating > > the event just once. > > > > I just wasn't immediately sure where to stash it. Probably just by adding a > > field in StreamCtl, that ReceiveXlogStream() then sets? So far it's constant > > once passed to ReceiveXlogStream(), but I don't really see a reason why it'd > > need to stay that way? > > Oops, attached the patch this time. Do we really want to create a new event every time? Those are kernel objects, so they're not entirely free, but that part maybe doesn't matter. Wouldn't it be cleaner to do it like we do in pgwin32_waitforsinglesocket() which is create it once and store it in a static variable? Or is that what you're suggesting above in the "I wonder if" part? -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Hi, On 2022-01-17 14:50:27 +0100, Magnus Hagander wrote: > On Mon, Jan 17, 2022 at 12:31 AM Andres Freund <andres@anarazel.de> wrote: > > > > Hi, > > > > On 2022-01-16 15:28:00 -0800, Andres Freund wrote: > > > I hacked that up last night. And a fix or two later, it seems to be > > > working. What I'd missed at first is that the event needs to be reset in > > > reached_end_position(), otherwise we'll busy loop. > > You can create the event with bManualReset set to False to avoid that, > no? With this usecase, I don't really see a reason not to do that > instead? The problem I'm referring to is that some types of events are edge triggered. Which we've been painfully reminded of recently: https://www.postgresql.org/message-id/CA%2BhUKG%2BOeoETZQ%3DQw5Ub5h3tmwQhBmDA%3DnuNO3KG%3DzWfUypFAw%40mail.gmail.com It appears there's no guarantee that you'll see e.g. FD_CLOSE if you use short-lived events (the FD_CLOSE is recorded internally but not signalled immediately if there's still readable data, and the internal record is reset by WSAEventSelect()). > > > I wonder if using a short-lived event handle would have dangers of missing > > > FD_CLOSE here as well? It'd probably be worth avoiding the risk by creating > > > the event just once. > > > > > > I just wasn't immediately sure where to stash it. Probably just by adding a > > > field in StreamCtl, that ReceiveXlogStream() then sets? So far it's constant > > > once passed to ReceiveXlogStream(), but I don't really see a reason why it'd > > > need to stay that way? > > > > Oops, attached the patch this time. > > Do we really want to create a new event every time? Those are kernel > objects, so they're not entirely free, but that part maybe doesn't > matter. Wouldn't it be cleaner to do it like we do in > pgwin32_waitforsinglesocket() which is create it once and store it in > a static variable? Or is that what you're suggesting above in the "I > wonder if" part? Yes, that's what I was suggesting. I wasn't thinking of using a static var, but putting it in StreamCtl. Note that what pgwin32_waitforsinglesocket() is doing doesn't protect against the problem referenced above, because it still is reset by WSAEventSelect. Greetings, Andres Freund
On 2022-01-17 10:06:56 -0800, Andres Freund wrote: > Yes, that's what I was suggesting. I wasn't thinking of using a static var, > but putting it in StreamCtl. Note that what pgwin32_waitforsinglesocket() > is doing doesn't protect against the problem referenced above, because it > still is reset by WSAEventSelect. Do we are about breaking StreamCtl ABI? I don't think so?
Hi, On 2022-01-29 12:44:22 -0800, Andres Freund wrote: > On 2022-01-17 10:06:56 -0800, Andres Freund wrote: > > Yes, that's what I was suggesting. I wasn't thinking of using a static var, > > but putting it in StreamCtl. Note that what pgwin32_waitforsinglesocket() > > is doing doesn't protect against the problem referenced above, because it > > still is reset by WSAEventSelect. > > Do we are about breaking StreamCtl ABI? I don't think so? Here's a version of the patch only creating the event once. Needs a small bit of comment polishing, but otherwise I think it's sane? Greetings, Andres Freund
Attachment
On Sat, Jan 29, 2022 at 9:44 PM Andres Freund <andres@anarazel.de> wrote: > > On 2022-01-17 10:06:56 -0800, Andres Freund wrote: > > Yes, that's what I was suggesting. I wasn't thinking of using a static var, > > but putting it in StreamCtl. Note that what pgwin32_waitforsinglesocket() > > is doing doesn't protect against the problem referenced above, because it > > still is reset by WSAEventSelect. > > Do we are about breaking StreamCtl ABI? I don't think so? I would say no. It's an internal API and it's not like pg_basebackup can load plugins. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
On Sat, Jan 29, 2022 at 10:47 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2022-01-29 12:44:22 -0800, Andres Freund wrote: > > On 2022-01-17 10:06:56 -0800, Andres Freund wrote: > > > Yes, that's what I was suggesting. I wasn't thinking of using a static var, > > > but putting it in StreamCtl. Note that what pgwin32_waitforsinglesocket() > > > is doing doesn't protect against the problem referenced above, because it > > > still is reset by WSAEventSelect. > > > > Do we are about breaking StreamCtl ABI? I don't think so? > > Here's a version of the patch only creating the event once. Needs a small bit > of comment polishing, but otherwise I think it's sane? LGTM in general, yes. I'm wondering about the part that does: + events[0] = stream->net_event; + nevents++; + + if (stream->stop_event != NULL) + { + events[1] = stream->stop_event; + nevents++; + } + Using a combination of nevents but hardcoded indexes does work -- but only as long as there is only one optional entry. Should they perhaps be written + events[nevents++] = stream->net_event; instead, for future proofing? But then you'd also have to change the if() statement on the return side I guess. Can of course also be changed at such a point where a third event might be added. Not important, but it poked me in the eye when I was reading it. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Hi, On 2022-01-30 16:51:12 +0100, Magnus Hagander wrote: > On Sat, Jan 29, 2022 at 10:47 PM Andres Freund <andres@anarazel.de> wrote: > > > > Hi, > > > > On 2022-01-29 12:44:22 -0800, Andres Freund wrote: > > > On 2022-01-17 10:06:56 -0800, Andres Freund wrote: > > > > Yes, that's what I was suggesting. I wasn't thinking of using a static var, > > > > but putting it in StreamCtl. Note that what pgwin32_waitforsinglesocket() > > > > is doing doesn't protect against the problem referenced above, because it > > > > still is reset by WSAEventSelect. > > > > > > Do we are about breaking StreamCtl ABI? I don't think so? > > > > Here's a version of the patch only creating the event once. Needs a small bit > > of comment polishing, but otherwise I think it's sane? > > LGTM in general, yes. Thanks for checking. > I'm wondering about the part that does: > + events[0] = stream->net_event; > + nevents++; > + > + if (stream->stop_event != NULL) > + { > + events[1] = stream->stop_event; > + nevents++; > + } > + > > Using a combination of nevents but hardcoded indexes does work -- but > only as long as there is only one optional entry. Should they perhaps > be written > + events[nevents++] = stream->net_event; > > instead, for future proofing? But then you'd also have to change the > if() statement on the return side I guess. I did wonder about it, but the index checks get sufficiently more complicated that it didn't quite seem worth it. It didn't seem that likely these would get a third event to check... I think we're going to have to generalize something like our wait events to be frontend usable at some point. The proportion and complexity of frontend code is increasing... Greetings, Andres Freund
Hi, On 2022-01-29 13:47:13 -0800, Andres Freund wrote: > Here's a version of the patch only creating the event once. Needs a small bit > of comment polishing, but otherwise I think it's sane? Ah, it needs a bit more. I was not cleaning up the event at the exit of ReceiveXlogStream(). For pg_basebackup that perhaps wouldn't matter, but pg_receivewal loops... We don't have a good spot for cleaning up right now. ReceiveXlogStream() has plenty returns. The attached changes those to a goto done; but pretty it is not. But probably still the best way for the backbranches? I think the receivelog.c interface probably could do with a bit of cleanup... The control flow is quite complicated, with repeated checks all over etc :(. And the whole thing with giving the appearance of being instantiatable multiple times, but then using global variables for state, is ... Attached a revised version. Greetings, Andres Freund