Thread: pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

From

Andres Freund

Date:

16 January 2022, 09:22:07

Hi,

I was wondering in [1] what we could do about the slowest tests on
windows.

On 2021-12-31 11:25:28 -0800, Andres Freund wrote:
> Picking a random successful cfbot run [1] I see the following tap tests taking
> more than 20 seconds:
>
> 67188 ms pg_basebackup t/010_pg_basebackup.pl
> 59710 ms recovery t/001_stream_rep.pl

Comparing these times to measurements taken on my normal linux workstation,
something seemed just *very* off, even with a slow CI instance and windows in
the mix.

A bunch of printf debugging later, I realized the problem is that several of
the pg_basebackups in tests take a *long* time. E.g. for t/001_stream_rep.pl
the backups from the standby each take just over 10s. That's awfully
specific...

# Taking pg_basebackup my_backup from node "standby_1"
# Running: pg_basebackup -D C:/dev/postgres/./tmp_check/t_001_stream_rep_standby_1_data/backup/my_backup -h
C:/Users/myadmin/AppData/Local/Temp/yba26PBYX1-p 59181 --checkpoint fast --no-sync --label my_backup -v

# ran in 10.145s
# Backup finished

This reproduceably happens and it's *not* related to the socket shutdown()
changes we've been debugging lately - even after a revert the problem
persists.

Because our logging for basebackups is quite weak, both for server and client
side, I needed to add a fair bit more debugging to figure it out:

pg_basebackup: wait to finish at 0.492
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: stream poll timeout 10.112

The problem is that there's just no implemented way to timely shutdown the WAL
streaming thread in pg_basebackup. The code in pg_basebackup.c says:

        if (verbose)
            pg_log_info("waiting for background process to finish streaming ...");
        ...
        /*
         * On Windows, since we are in the same process, we can just store the
         * value directly in the variable, and then set the flag that says
         * it's there.
         */
...
        xlogendptr = ((uint64) hi) << 32 | lo;
        InterlockedIncrement(&has_xlogendptr);

But just setting a variable doesn't do much if the thread is in
HandleCopyStream()->CopyStreamPoll()->select()

The only reason we ever succeed shutting down, without more WAL coming in, is
that pg_basebackup defaults to sending a status message every 10 seconds. At
which point the thread sees has_xlogendptr = true, and shuts down.

A test specific workaround would be to just add --status-interval=1 to
Cluster.pm::backup(). But that seems very unsatisfying.

I don't immediately see a solution for this, other than to add
StreamCtl->stop_event (mirroring ->stop_socket) and then convert
CopyStreamPoll() to use WaitForMultipleObjects().  Microsoft's select()
doesn't support pipes and there's no socketpair().

Any more straightforward ideas?

From a cursory look at history, it used to be that pg_basebackup had this
behaviour on all platforms, but it got fixed for other platforms in
7834d20b57a by Tom (assuming the problem wasn't present there).

Greetings,

Andres Freund

[1] https://postgr.es/m/20211231192528.wirwj4qaaw3ted5g%40alap3.anarazel.de

Re: pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

From

Magnus Hagander

Date:

16 January 2022, 11:35:56

On Sun, Jan 16, 2022 at 10:22 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> I was wondering in [1] what we could do about the slowest tests on
> windows.
>
> On 2021-12-31 11:25:28 -0800, Andres Freund wrote:
> > Picking a random successful cfbot run [1] I see the following tap tests taking
> > more than 20 seconds:
> >
> > 67188 ms pg_basebackup t/010_pg_basebackup.pl
> > 59710 ms recovery t/001_stream_rep.pl
>
> Comparing these times to measurements taken on my normal linux workstation,
> something seemed just *very* off, even with a slow CI instance and windows in
> the mix.
>
> A bunch of printf debugging later, I realized the problem is that several of
> the pg_basebackups in tests take a *long* time. E.g. for t/001_stream_rep.pl
> the backups from the standby each take just over 10s. That's awfully
> specific...
>
> # Taking pg_basebackup my_backup from node "standby_1"
> # Running: pg_basebackup -D C:/dev/postgres/./tmp_check/t_001_stream_rep_standby_1_data/backup/my_backup -h
C:/Users/myadmin/AppData/Local/Temp/yba26PBYX1-p 59181 --checkpoint fast --no-sync --label my_backup -v
 
> # ran in 10.145s
> # Backup finished
>
> This reproduceably happens and it's *not* related to the socket shutdown()
> changes we've been debugging lately - even after a revert the problem
> persists.
>
> Because our logging for basebackups is quite weak, both for server and client
> side, I needed to add a fair bit more debugging to figure it out:
>
> pg_basebackup: wait to finish at 0.492
> pg_basebackup: waiting for background process to finish streaming ...
> pg_basebackup: stream poll timeout 10.112
>
> The problem is that there's just no implemented way to timely shutdown the WAL
> streaming thread in pg_basebackup. The code in pg_basebackup.c says:
>
>         if (verbose)
>             pg_log_info("waiting for background process to finish streaming ...");
>         ...
>         /*
>          * On Windows, since we are in the same process, we can just store the
>          * value directly in the variable, and then set the flag that says
>          * it's there.
>          */
> ...
>                 xlogendptr = ((uint64) hi) << 32 | lo;
>                 InterlockedIncrement(&has_xlogendptr);
>
> But just setting a variable doesn't do much if the thread is in
> HandleCopyStream()->CopyStreamPoll()->select()
>
> The only reason we ever succeed shutting down, without more WAL coming in, is
> that pg_basebackup defaults to sending a status message every 10 seconds. At
> which point the thread sees has_xlogendptr = true, and shuts down.
>
>
> A test specific workaround would be to just add --status-interval=1 to
> Cluster.pm::backup(). But that seems very unsatisfying.
>
> I don't immediately see a solution for this, other than to add
> StreamCtl->stop_event (mirroring ->stop_socket) and then convert
> CopyStreamPoll() to use WaitForMultipleObjects().  Microsoft's select()
> doesn't support pipes and there's no socketpair().
>
> Any more straightforward ideas?
>
>
> From a cursory look at history, it used to be that pg_basebackup had this
> behaviour on all platforms, but it got fixed for other platforms in
> 7834d20b57a by Tom (assuming the problem wasn't present there).

Ugh, yeah that sounds like a correct analysis to me, and ugh, yeah
that's not very nice.

And yes, I think we have to create an event, and then use
WSAEventSelect() + WaitForSingleObjectEx(). Should be enough to just
use one event I think, and then the timeout -- but it might be more
readable to have a separate event for the socket and the stop? But we
can have just one event that's both used to stop and then use
WSAEventSelect() to associate it with the socket as well as neede.

(And yes, I agree that it's a lot better to fix it properly than to
just reduce the timeout)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

From

Tom Lane

Date:

16 January 2022, 16:34:42

Andres Freund <andres@anarazel.de> writes:
> I don't immediately see a solution for this, other than to add
> StreamCtl->stop_event (mirroring ->stop_socket) and then convert
> CopyStreamPoll() to use WaitForMultipleObjects().  Microsoft's select()
> doesn't support pipes and there's no socketpair().
> Any more straightforward ideas?
> From a cursory look at history, it used to be that pg_basebackup had this
> behaviour on all platforms, but it got fixed for other platforms in
> 7834d20b57a by Tom (assuming the problem wasn't present there).

Hmm --- I see that I thought Windows was unaffected, but I didn't
consider this angle.

Can we send the child process a signal to kick it off its wait?

            regards, tom lane

Re: pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

From

Magnus Hagander

Date:

16 January 2022, 16:36:13

On Sun, Jan 16, 2022 at 5:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Andres Freund <andres@anarazel.de> writes:
> > I don't immediately see a solution for this, other than to add
> > StreamCtl->stop_event (mirroring ->stop_socket) and then convert
> > CopyStreamPoll() to use WaitForMultipleObjects().  Microsoft's select()
> > doesn't support pipes and there's no socketpair().
> > Any more straightforward ideas?
> > From a cursory look at history, it used to be that pg_basebackup had this
> > behaviour on all platforms, but it got fixed for other platforms in
> > 7834d20b57a by Tom (assuming the problem wasn't present there).
>
> Hmm --- I see that I thought Windows was unaffected, but I didn't
> consider this angle.
>
> Can we send the child process a signal to kick it off its wait?

No. (1) on Windows it's not a child process, it's a thread. And (2)
Windows doesn't have signals. We emulate those *in the backend* for
win32, but this problem is in the frontend where that emulation layer
doesn't exist.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

From

Magnus Hagander

Date:

16 January 2022, 16:39:11

On Sun, Jan 16, 2022 at 5:36 PM Magnus Hagander <magnus@hagander.net> wrote:
>
> On Sun, Jan 16, 2022 at 5:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Andres Freund <andres@anarazel.de> writes:
> > > I don't immediately see a solution for this, other than to add
> > > StreamCtl->stop_event (mirroring ->stop_socket) and then convert
> > > CopyStreamPoll() to use WaitForMultipleObjects().  Microsoft's select()
> > > doesn't support pipes and there's no socketpair().
> > > Any more straightforward ideas?
> > > From a cursory look at history, it used to be that pg_basebackup had this
> > > behaviour on all platforms, but it got fixed for other platforms in
> > > 7834d20b57a by Tom (assuming the problem wasn't present there).
> >
> > Hmm --- I see that I thought Windows was unaffected, but I didn't
> > consider this angle.
> >
> > Can we send the child process a signal to kick it off its wait?
>
> No. (1) on Windows it's not a child process, it's a thread. And (2)
> Windows doesn't have signals. We emulate those *in the backend* for
> win32, but this problem is in the frontend where that emulation layer
> doesn't exist.

Actually, just after sending that...

What we could do is do a WSACancelBlockingCall() which will cancel the
select() thereby making us do the check. However, per the docs
(https://docs.microsoft.com/en-us/windows/win32/api/winsock2/nf-winsock2-wsacancelblockingcall)
this function is no longer exported in Winsock 2, so this does not
seem to be the right way forward. There is no replacement function for
it -- the suggestion is basically "don't do that, use multithreading
instaed" which I think brings us back to the original suggestion of
WSAEventSelect().

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

From

Andres Freund

Date:

16 January 2022, 23:28:00

Hi,

On 2022-01-16 17:39:11 +0100, Magnus Hagander wrote:
> On Sun, Jan 16, 2022 at 5:36 PM Magnus Hagander <magnus@hagander.net> wrote:
> >
> > On Sun, Jan 16, 2022 at 5:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > >
> > > Andres Freund <andres@anarazel.de> writes:
> > > > I don't immediately see a solution for this, other than to add
> > > > StreamCtl->stop_event (mirroring ->stop_socket) and then convert
> > > > CopyStreamPoll() to use WaitForMultipleObjects().  Microsoft's select()
> > > > doesn't support pipes and there's no socketpair().
> > > > Any more straightforward ideas?
> > > > From a cursory look at history, it used to be that pg_basebackup had this
> > > > behaviour on all platforms, but it got fixed for other platforms in
> > > > 7834d20b57a by Tom (assuming the problem wasn't present there).
> > >
> > > Hmm --- I see that I thought Windows was unaffected, but I didn't
> > > consider this angle.
> > >
> > > Can we send the child process a signal to kick it off its wait?
> >
> > No. (1) on Windows it's not a child process, it's a thread. And (2)
> > Windows doesn't have signals. We emulate those *in the backend* for
> > win32, but this problem is in the frontend where that emulation layer
> > doesn't exist.
>
> [...] which I think brings us back to the original suggestion of
> WSAEventSelect().

I hacked that up last night. And a fix or two later, it seems to be
working. What I'd missed at first is that the event needs to be reset in
reached_end_position(), otherwise we'll busy loop.

I wonder if using a short-lived event handle would have dangers of missing
FD_CLOSE here as well? It'd probably be worth avoiding the risk by creating
the event just once.

I just wasn't immediately sure where to stash it. Probably just by adding a
field in StreamCtl, that ReceiveXlogStream() then sets? So far it's constant
once passed to ReceiveXlogStream(), but I don't really see a reason why it'd
need to stay that way?

Greetings,

Andres Freund

Re: pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

From

Andres Freund

Date:

16 January 2022, 23:31:18

Hi,

On 2022-01-16 15:28:00 -0800, Andres Freund wrote:
> I hacked that up last night. And a fix or two later, it seems to be
> working. What I'd missed at first is that the event needs to be reset in
> reached_end_position(), otherwise we'll busy loop.
> 
> I wonder if using a short-lived event handle would have dangers of missing
> FD_CLOSE here as well? It'd probably be worth avoiding the risk by creating
> the event just once.
> 
> I just wasn't immediately sure where to stash it. Probably just by adding a
> field in StreamCtl, that ReceiveXlogStream() then sets? So far it's constant
> once passed to ReceiveXlogStream(), but I don't really see a reason why it'd
> need to stay that way?

Oops, attached the patch this time.

Greetings,

Andres Freund

Attachment

v2-0001-Avoid-slow-shutdown-of-pg_basebackup-windows-edit.patch

Re: pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

From

Magnus Hagander

Date:

17 January 2022, 13:50:27

On Mon, Jan 17, 2022 at 12:31 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2022-01-16 15:28:00 -0800, Andres Freund wrote:
> > I hacked that up last night. And a fix or two later, it seems to be
> > working. What I'd missed at first is that the event needs to be reset in
> > reached_end_position(), otherwise we'll busy loop.

You can create the event with bManualReset set to False to avoid that,
no? With this usecase, I don't really see a reason not to do that
instead?

> > I wonder if using a short-lived event handle would have dangers of missing
> > FD_CLOSE here as well? It'd probably be worth avoiding the risk by creating
> > the event just once.
> >
> > I just wasn't immediately sure where to stash it. Probably just by adding a
> > field in StreamCtl, that ReceiveXlogStream() then sets? So far it's constant
> > once passed to ReceiveXlogStream(), but I don't really see a reason why it'd
> > need to stay that way?
>
> Oops, attached the patch this time.

Do we really want to create a new event every time? Those are kernel
objects, so they're not entirely free, but that part maybe doesn't
matter. Wouldn't it be cleaner to do it like we do in
pgwin32_waitforsinglesocket() which is create it once and store it in
a static variable? Or is that what you're suggesting above in the "I
wonder if" part?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

From

Andres Freund

Date:

17 January 2022, 18:06:56

Hi,

On 2022-01-17 14:50:27 +0100, Magnus Hagander wrote:
> On Mon, Jan 17, 2022 at 12:31 AM Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > On 2022-01-16 15:28:00 -0800, Andres Freund wrote:
> > > I hacked that up last night. And a fix or two later, it seems to be
> > > working. What I'd missed at first is that the event needs to be reset in
> > > reached_end_position(), otherwise we'll busy loop.
> 
> You can create the event with bManualReset set to False to avoid that,
> no? With this usecase, I don't really see a reason not to do that
> instead?

The problem I'm referring to is that some types of events are edge
triggered. Which we've been painfully reminded of recently:
https://www.postgresql.org/message-id/CA%2BhUKG%2BOeoETZQ%3DQw5Ub5h3tmwQhBmDA%3DnuNO3KG%3DzWfUypFAw%40mail.gmail.com

It appears there's no guarantee that you'll see e.g. FD_CLOSE if you use
short-lived events (the FD_CLOSE is recorded internally but not signalled
immediately if there's still readable data, and the internal record is reset
by WSAEventSelect()).

> > > I wonder if using a short-lived event handle would have dangers of missing
> > > FD_CLOSE here as well? It'd probably be worth avoiding the risk by creating
> > > the event just once.
> > >
> > > I just wasn't immediately sure where to stash it. Probably just by adding a
> > > field in StreamCtl, that ReceiveXlogStream() then sets? So far it's constant
> > > once passed to ReceiveXlogStream(), but I don't really see a reason why it'd
> > > need to stay that way?
> >
> > Oops, attached the patch this time.
> 
> Do we really want to create a new event every time? Those are kernel
> objects, so they're not entirely free, but that part maybe doesn't
> matter. Wouldn't it be cleaner to do it like we do in
> pgwin32_waitforsinglesocket() which is create it once and store it in
> a static variable? Or is that what you're suggesting above in the "I
> wonder if" part?

Yes, that's what I was suggesting. I wasn't thinking of using a static var,
but putting it in StreamCtl. Note that what pgwin32_waitforsinglesocket()
is doing doesn't protect against the problem referenced above, because it
still is reset by WSAEventSelect.

Greetings,

Andres Freund

Re: pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

From

Andres Freund

Date:

29 January 2022, 20:44:22

On 2022-01-17 10:06:56 -0800, Andres Freund wrote:
> Yes, that's what I was suggesting. I wasn't thinking of using a static var,
> but putting it in StreamCtl. Note that what pgwin32_waitforsinglesocket()
> is doing doesn't protect against the problem referenced above, because it
> still is reset by WSAEventSelect.

Do we are about breaking StreamCtl ABI? I don't think so?

Re: pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

From

Andres Freund

Date:

29 January 2022, 21:47:13

Hi,

On 2022-01-29 12:44:22 -0800, Andres Freund wrote:
> On 2022-01-17 10:06:56 -0800, Andres Freund wrote:
> > Yes, that's what I was suggesting. I wasn't thinking of using a static var,
> > but putting it in StreamCtl. Note that what pgwin32_waitforsinglesocket()
> > is doing doesn't protect against the problem referenced above, because it
> > still is reset by WSAEventSelect.
> 
> Do we are about breaking StreamCtl ABI? I don't think so?

Here's a version of the patch only creating the event once. Needs a small bit
of comment polishing, but otherwise I think it's sane?

Greetings,

Andres Freund

Attachment

v3-0001-Avoid-slow-shutdown-of-pg_basebackup-windows-edit.patch

Re: pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

From

Magnus Hagander

Date:

30 January 2022, 15:45:45

On Sat, Jan 29, 2022 at 9:44 PM Andres Freund <andres@anarazel.de> wrote:
>
> On 2022-01-17 10:06:56 -0800, Andres Freund wrote:
> > Yes, that's what I was suggesting. I wasn't thinking of using a static var,
> > but putting it in StreamCtl. Note that what pgwin32_waitforsinglesocket()
> > is doing doesn't protect against the problem referenced above, because it
> > still is reset by WSAEventSelect.
>
> Do we are about breaking StreamCtl ABI? I don't think so?

I would say no. It's an internal API and it's not like pg_basebackup
can load plugins.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

From

Magnus Hagander

Date:

30 January 2022, 15:51:12

On Sat, Jan 29, 2022 at 10:47 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2022-01-29 12:44:22 -0800, Andres Freund wrote:
> > On 2022-01-17 10:06:56 -0800, Andres Freund wrote:
> > > Yes, that's what I was suggesting. I wasn't thinking of using a static var,
> > > but putting it in StreamCtl. Note that what pgwin32_waitforsinglesocket()
> > > is doing doesn't protect against the problem referenced above, because it
> > > still is reset by WSAEventSelect.
> >
> > Do we are about breaking StreamCtl ABI? I don't think so?
>
> Here's a version of the patch only creating the event once. Needs a small bit
> of comment polishing, but otherwise I think it's sane?

LGTM in general, yes.

I'm wondering about the part that does:
+       events[0] = stream->net_event;
+       nevents++;
+
+       if (stream->stop_event != NULL)
+       {
+               events[1] = stream->stop_event;
+               nevents++;
+       }
+

Using a combination of nevents but hardcoded indexes does work -- but
only as long as there is only one optional entry. Should they perhaps
be written
+       events[nevents++] = stream->net_event;

instead, for future proofing? But then you'd also have to change the
if() statement on the return side I guess.

Can of course also be changed at such a point where a third event
might be added. Not important, but it poked me in the eye when I was
reading it.

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

From

Andres Freund

Date:

30 January 2022, 22:44:20

Hi,

On 2022-01-30 16:51:12 +0100, Magnus Hagander wrote:
> On Sat, Jan 29, 2022 at 10:47 PM Andres Freund <andres@anarazel.de> wrote:
> >
> > Hi,
> >
> > On 2022-01-29 12:44:22 -0800, Andres Freund wrote:
> > > On 2022-01-17 10:06:56 -0800, Andres Freund wrote:
> > > > Yes, that's what I was suggesting. I wasn't thinking of using a static var,
> > > > but putting it in StreamCtl. Note that what pgwin32_waitforsinglesocket()
> > > > is doing doesn't protect against the problem referenced above, because it
> > > > still is reset by WSAEventSelect.
> > >
> > > Do we are about breaking StreamCtl ABI? I don't think so?
> >
> > Here's a version of the patch only creating the event once. Needs a small bit
> > of comment polishing, but otherwise I think it's sane?
> 
> LGTM in general, yes.

Thanks for checking.


> I'm wondering about the part that does:
> +       events[0] = stream->net_event;
> +       nevents++;
> +
> +       if (stream->stop_event != NULL)
> +       {
> +               events[1] = stream->stop_event;
> +               nevents++;
> +       }
> +
> 
> Using a combination of nevents but hardcoded indexes does work -- but
> only as long as there is only one optional entry. Should they perhaps
> be written
> +       events[nevents++] = stream->net_event;
> 
> instead, for future proofing? But then you'd also have to change the
> if() statement on the return side I guess.

I did wonder about it, but the index checks get sufficiently more complicated
that it didn't quite seem worth it. It didn't seem that likely these would get
a third event to check...

I think we're going to have to generalize something like our wait events to be
frontend usable at some point. The proportion and complexity of frontend code
is increasing...

Greetings,

Andres Freund

Re: pg_basebackup WAL streamer shutdown is bogus - leading to slow tests

From

Andres Freund

Date:

31 January 2022, 00:41:45

Hi,

On 2022-01-29 13:47:13 -0800, Andres Freund wrote:
> Here's a version of the patch only creating the event once. Needs a small bit
> of comment polishing, but otherwise I think it's sane?

Ah, it needs a bit more. I was not cleaning up the event at the exit of
ReceiveXlogStream(). For pg_basebackup that perhaps wouldn't matter, but
pg_receivewal loops...

We don't have a good spot for cleaning up right now. ReceiveXlogStream() has
plenty returns. The attached changes those to a goto done; but pretty it is
not. But probably still the best way for the backbranches?

I think the receivelog.c interface probably could do with a bit of
cleanup... The control flow is quite complicated, with repeated checks all
over etc :(.  And the whole thing with giving the appearance of being
instantiatable multiple times, but then using global variables for state, is
...

Attached a revised version.

Greetings,

Andres Freund

Attachment

v3-0001-Avoid-slow-shutdown-of-pg_basebackup-windows-edit.patch