Thread: parallelizing the archiver

parallelizing the archiver

From

"Bossart, Nathan"

Date:

07 September 2021, 22:36:18

Hi hackers,

I'd like to gauge interest in parallelizing the archiver process.
From a quick scan, I was only able to find one recent thread [0] that
brought up this topic, and ISTM the conventional wisdom is to use a
backup utility like pgBackRest that does things in parallel behind-
the-scenes.  My experience is that the generating-more-WAL-than-we-
can-archive problem is pretty common, and parallelization seems to
help quite a bit, so perhaps it's a good time to consider directly
supporting parallel archiving in PostgreSQL.

Based on previous threads I've seen, I believe many in the community
would like to replace archive_command entirely, but what I'm proposing
here would build on the existing tools.  I'm currently thinking of
something a bit like autovacuum_max_workers, but the archive workers
would be created once and would follow a competing consumers model.
Another approach I'm looking at is to use background worker processes,
although I'm not sure if linking such a critical piece of
functionality to max_worker_processes is a good idea.  However, I do
see that logical replication uses background workers.

Anyway, I'm curious what folks think about this.  I think it'd help
simplify server administration for many users.

Nathan

[0] https://www.postgresql.org/message-id/flat/20180828060221.x33gokifqi3csjj4%40depesz.com

Re: parallelizing the archiver

From

Julien Rouhaud

Date:

08 September 2021, 06:38:03

On Wed, Sep 8, 2021 at 6:36 AM Bossart, Nathan <bossartn@amazon.com> wrote:
>
> I'd like to gauge interest in parallelizing the archiver process.
> [...]
> Based on previous threads I've seen, I believe many in the community
> would like to replace archive_command entirely, but what I'm proposing
> here would build on the existing tools.

Having a new implementation that would remove the archive_command is
probably a better long term solution, but I don't know of anyone
working on that and it's probably gonna take some time.  Right now we
have a lot of users that face archiving bottleneck so I think it would
be a good thing to implement parallel archiving, fully compatible with
current archive_command, as a short term solution.

Re: parallelizing the archiver

From

"Bossart, Nathan"

Date:

09 September 2021, 22:30:54

On 9/7/21, 11:38 PM, "Julien Rouhaud" <rjuju123@gmail.com> wrote:
> On Wed, Sep 8, 2021 at 6:36 AM Bossart, Nathan <bossartn@amazon.com> wrote:
>>
>> I'd like to gauge interest in parallelizing the archiver process.
>> [...]
>> Based on previous threads I've seen, I believe many in the community
>> would like to replace archive_command entirely, but what I'm proposing
>> here would build on the existing tools.
>
> Having a new implementation that would remove the archive_command is
> probably a better long term solution, but I don't know of anyone
> working on that and it's probably gonna take some time.  Right now we
> have a lot of users that face archiving bottleneck so I think it would
> be a good thing to implement parallel archiving, fully compatible with
> current archive_command, as a short term solution.

Thanks for chiming in.  I'm planning to work on a patch next week.

Nathan

Re: parallelizing the archiver

From

Julien Rouhaud

Date:

10 September 2021, 05:08:21

On Fri, Sep 10, 2021 at 6:30 AM Bossart, Nathan <bossartn@amazon.com> wrote:
>
> Thanks for chiming in.  I'm planning to work on a patch next week.

Great news!

About the technical concerns:

> I'm currently thinking of
> something a bit like autovacuum_max_workers, but the archive workers
> would be created once and would follow a competing consumers model.

In this approach, the launched archiver workers would be kept as long
as the instance is up, or should they be stopped if they're not
required anymore, e.g. if there was a temporary write activity spike?
I think we should make sure that at least one worker is always up.

> Another approach I'm looking at is to use background worker processes,
> although I'm not sure if linking such a critical piece of
> functionality to max_worker_processes is a good idea.  However, I do
> see that logical replication uses background workers.

I think that using background workers is a good approach, and the
various guc in that area should allow users to properly configure
archiving too.  If that's not the case, it might be an opportunity to
add some new infrastructure that could benefit all bgworkers users.

Re: parallelizing the archiver

From

Andrey Borodin

Date:

10 September 2021, 05:28:20


> 8 сент. 2021 г., в 03:36, Bossart, Nathan <bossartn@amazon.com> написал(а):
>
> Anyway, I'm curious what folks think about this.  I think it'd help
> simplify server administration for many users.

BTW this thread is also related [0].

My 2 cents.
It's OK if external tool is responsible for concurrency. Do we want this complexity in core? Many users do not enable
archivingat all. 
Maybe just add parallelism API for external tool?
It's much easier to control concurrency in external tool that in PostgreSQL core. Maintaining parallel worker is a
tremendouslyharder than spawning goroutine, thread, task or whatever. 
External tool needs to know when xlog segment is ready and needs to report when it's done. Postgres should just ensure
thatexternal archiever\restorer is running. 
For example external tool could read xlog names from stdin and report finished files from stdout. I can prototype such
toolswiftly :) 
E.g. postgres runs ```wal-g wal-archiver``` and pushes ready segment filenames on stdin. And no more listing of
archive_statusand hacky algorithms to predict next WAL name and completition time! 

Thoughts?

Best regards, Andrey Borodin.

[0]
https://www.postgresql.org/message-id/flat/CA%2BTgmobhAbs2yabTuTRkJTq_kkC80-%2Bjw%3DpfpypdOJ7%2BgAbQbw%40mail.gmail.com

Re: parallelizing the archiver

From

Julien Rouhaud

Date:

10 September 2021, 05:52:02

On Fri, Sep 10, 2021 at 1:28 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
> It's OK if external tool is responsible for concurrency. Do we want this complexity in core? Many users do not enable
archivingat all.

> Maybe just add parallelism API for external tool?
> It's much easier to control concurrency in external tool that in PostgreSQL core. Maintaining parallel worker is a
tremendouslyharder than spawning goroutine, thread, task or whatever.

Yes, but it also means that it's up to every single archiving tool to
implement a somewhat hackish parallel version of an archive_command,
hoping that core won't break it.  If this problem is solved in
postgres core whithout API change, then all existing tool will
automatically benefit from it (maybe not the one who used to have
hacks to make it parallel though, but it seems easier to disable it
rather than implement it).

> External tool needs to know when xlog segment is ready and needs to report when it's done. Postgres should just
ensurethat external archiever\restorer is running.

> For example external tool could read xlog names from stdin and report finished files from stdout. I can prototype
suchtool swiftly :)

> E.g. postgres runs ```wal-g wal-archiver``` and pushes ready segment filenames on stdin. And no more listing of
archive_statusand hacky algorithms to predict next WAL name and completition time!

Yes, but that requires fundamental design changes for the archive
commands right?  So while I agree it could be a better approach
overall, it seems like a longer term option.  As far as I understand,
what Nathan suggested seems more likely to be achieved in pg15 and
could benefit from a larger set of backup solutions.  This can give us
enough time to properly design a better approach for designing a new
archiving approach.

Re: parallelizing the archiver

From

Andrey Borodin

Date:

10 September 2021, 06:03:46


> 10 сент. 2021 г., в 10:52, Julien Rouhaud <rjuju123@gmail.com> написал(а):
>
> On Fri, Sep 10, 2021 at 1:28 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>>
>> It's OK if external tool is responsible for concurrency. Do we want this complexity in core? Many users do not
enablearchiving at all. 
>> Maybe just add parallelism API for external tool?
>> It's much easier to control concurrency in external tool that in PostgreSQL core. Maintaining parallel worker is a
tremendouslyharder than spawning goroutine, thread, task or whatever. 
>
> Yes, but it also means that it's up to every single archiving tool to
> implement a somewhat hackish parallel version of an archive_command,
> hoping that core won't break it.
I'm not proposing to remove existing archive_command. Just deprecate it one-WAL-per-call form.

>  If this problem is solved in
> postgres core whithout API change, then all existing tool will
> automatically benefit from it (maybe not the one who used to have
> hacks to make it parallel though, but it seems easier to disable it
> rather than implement it).
True hacky tools already can coordinate swarm of their processes and are prepared that they are called multiple times
concurrently:) 

>> External tool needs to know when xlog segment is ready and needs to report when it's done. Postgres should just
ensurethat external archiever\restorer is running. 
>> For example external tool could read xlog names from stdin and report finished files from stdout. I can prototype
suchtool swiftly :) 
>> E.g. postgres runs ```wal-g wal-archiver``` and pushes ready segment filenames on stdin. And no more listing of
archive_statusand hacky algorithms to predict next WAL name and completition time! 
>
> Yes, but that requires fundamental design changes for the archive
> commands right?  So while I agree it could be a better approach
> overall, it seems like a longer term option.  As far as I understand,
> what Nathan suggested seems more likely to be achieved in pg15 and
> could benefit from a larger set of backup solutions.  This can give us
> enough time to properly design a better approach for designing a new
> archiving approach.

It's a very simplistic approach. If some GUC is set - archiver will just feed ready files to stdin of archive command.
Whatfundamental design changes we need? 

Best regards, Andrey Borodin.

Re: parallelizing the archiver

From

Julien Rouhaud

Date:

10 September 2021, 06:11:57

On Fri, Sep 10, 2021 at 2:03 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
> > 10 сент. 2021 г., в 10:52, Julien Rouhaud <rjuju123@gmail.com> написал(а):
> >
> > Yes, but it also means that it's up to every single archiving tool to
> > implement a somewhat hackish parallel version of an archive_command,
> > hoping that core won't break it.
> I'm not proposing to remove existing archive_command. Just deprecate it one-WAL-per-call form.

Which is a big API beak.

> It's a very simplistic approach. If some GUC is set - archiver will just feed ready files to stdin of archive
command.What fundamental design changes we need? 

I'm talking about the commands themselves.  Your suggestion is to
change archive_command to be able to spawn a daemon, and it looks like
a totally different approach.  I'm not saying that having a daemon
based approach to take care of archiving is a bad idea, I'm saying
that trying to fit that with the current archive_command + some new
GUC looks like a bad idea.

Re: parallelizing the archiver

From

Andrey Borodin

Date:

10 September 2021, 06:29:22


> 10 сент. 2021 г., в 11:11, Julien Rouhaud <rjuju123@gmail.com> написал(а):
>
> On Fri, Sep 10, 2021 at 2:03 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>>
>>> 10 сент. 2021 г., в 10:52, Julien Rouhaud <rjuju123@gmail.com> написал(а):
>>>
>>> Yes, but it also means that it's up to every single archiving tool to
>>> implement a somewhat hackish parallel version of an archive_command,
>>> hoping that core won't break it.
>> I'm not proposing to remove existing archive_command. Just deprecate it one-WAL-per-call form.
>
> Which is a big API beak.
Huge extension, not a break.

>> It's a very simplistic approach. If some GUC is set - archiver will just feed ready files to stdin of archive
command.What fundamental design changes we need? 
>
> I'm talking about the commands themselves.  Your suggestion is to
> change archive_command to be able to spawn a daemon, and it looks like
> a totally different approach.  I'm not saying that having a daemon
> based approach to take care of archiving is a bad idea, I'm saying
> that trying to fit that with the current archive_command + some new
> GUC looks like a bad idea.
It fits nicely, even in corner cases. E.g. restore_command run from pg_rewind seems compatible with this approach.
One more example: after failover DBA can just ```ls|wal-g wal-push``` to archive all WALs unarchived before network
partition.

This is simple yet powerful approach, without any contradiction to existing archive_command API.
Why it's a bad idea?

Best regards, Andrey Borodin.

Re: parallelizing the archiver

From

Robert Haas

Date:

10 September 2021, 13:13:36

On Tue, Sep 7, 2021 at 6:36 PM Bossart, Nathan <bossartn@amazon.com> wrote:
> Based on previous threads I've seen, I believe many in the community
> would like to replace archive_command entirely, but what I'm proposing
> here would build on the existing tools.  I'm currently thinking of
> something a bit like autovacuum_max_workers, but the archive workers
> would be created once and would follow a competing consumers model.

To me, it seems way more beneficial to think about being able to
invoke archive_command with many files at a time instead of just one.
I think for most plausible archive commands that would be way more
efficient than what you propose here. It's *possible* that if we had
that, we'd still want this, but I'm not even convinced.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: parallelizing the archiver

From

Julien Rouhaud

Date:

10 September 2021, 14:19:29

On Fri, Sep 10, 2021 at 9:13 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> To me, it seems way more beneficial to think about being able to
> invoke archive_command with many files at a time instead of just one.
> I think for most plausible archive commands that would be way more
> efficient than what you propose here. It's *possible* that if we had
> that, we'd still want this, but I'm not even convinced.

Those approaches don't really seems mutually exclusive?  In both case
you will need to internally track the status of each WAL file and
handle non contiguous file sequences.  In case of parallel commands
you only need additional knowledge that some commands is already
working on a file.  Wouldn't it be even better to eventually be able
launch multiple batches of multiple files rather than a single batch?

If we start with parallelism first, the whole ecosystem could
immediately benefit from it as is.  To be able to handle multiple
files in a single command, we would need some way to let the server
know which files were successfully archived and which files weren't,
so it requires a different communication approach than the command
return code.

But as I said, I'm not convinced that using the archive_command
approach for that is the best approach  If I understand correctly,
most of the backup solutions would prefer to have a daemon being
launched and use it at a queuing system.  Wouldn't it be better to
have a new archive_mode, e.g. "daemon", and have postgres responsible
to (re)start it, and pass information through the daemon's
stdin/stdout or something like that?

Re: parallelizing the archiver

From

Robert Haas

Date:

10 September 2021, 15:22:18

On Fri, Sep 10, 2021 at 10:19 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> Those approaches don't really seems mutually exclusive?  In both case
> you will need to internally track the status of each WAL file and
> handle non contiguous file sequences.  In case of parallel commands
> you only need additional knowledge that some commands is already
> working on a file.  Wouldn't it be even better to eventually be able
> launch multiple batches of multiple files rather than a single batch?

Well, I guess I'm not convinced. Perhaps people with more knowledge of
this than I may already know why it's beneficial, but in my experience
commands like 'cp' and 'scp' are usually limited by the speed of I/O,
not the fact that you only have one of them running at once. Running
several at once, again in my experience, is typically not much faster.
On the other hand, scp has a LOT of startup overhead, so it's easy to
see the benefits of batching.

[rhaas pgsql]$ touch x y z
[rhaas pgsql]$ time sh -c 'scp x cthulhu: && scp y cthulhu: && scp z cthulhu:'
x                                             100%  207KB  78.8KB/s   00:02
y                                             100%    0     0.0KB/s   00:00
z                                             100%    0     0.0KB/s   00:00

real 0m9.418s
user 0m0.045s
sys 0m0.071s
[rhaas pgsql]$ time sh -c 'scp x y z cthulhu:'
x                                             100%  207KB 273.1KB/s   00:00
y                                             100%    0     0.0KB/s   00:00
z                                             100%    0     0.0KB/s   00:00

real 0m3.216s
user 0m0.017s
sys 0m0.020s

> If we start with parallelism first, the whole ecosystem could
> immediately benefit from it as is.  To be able to handle multiple
> files in a single command, we would need some way to let the server
> know which files were successfully archived and which files weren't,
> so it requires a different communication approach than the command
> return code.

That is possibly true. I think it might work to just assume that you
have to retry everything if it exits non-zero, but that requires the
archive command to be smart enough to do something sensible if an
identical file is already present in the archive.

> But as I said, I'm not convinced that using the archive_command
> approach for that is the best approach  If I understand correctly,
> most of the backup solutions would prefer to have a daemon being
> launched and use it at a queuing system.  Wouldn't it be better to
> have a new archive_mode, e.g. "daemon", and have postgres responsible
> to (re)start it, and pass information through the daemon's
> stdin/stdout or something like that?

Sure. Actually, I think a background worker would be better than a
separate daemon. Then it could just talk to shared memory directly.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: parallelizing the archiver

From

Julien Rouhaud

Date:

10 September 2021, 15:48:54

On Fri, Sep 10, 2021 at 11:22 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> Well, I guess I'm not convinced. Perhaps people with more knowledge of
> this than I may already know why it's beneficial, but in my experience
> commands like 'cp' and 'scp' are usually limited by the speed of I/O,
> not the fact that you only have one of them running at once. Running
> several at once, again in my experience, is typically not much faster.
> On the other hand, scp has a LOT of startup overhead, so it's easy to
> see the benefits of batching.

I totally agree that batching as many file as possible in a single
command is probably what's gonna achieve the best performance.  But if
the archiver only gets an answer from the archive_command once it
tried to process all of the file, it also means that postgres won't be
able to remove any WAL file until all of them could be processed.  It
means that users will likely have to limit the batch size and
therefore pay more startup overhead than they would like.  In case of
archiving on server with high latency / connection overhead it may be
better to be able to run multiple commands in parallel.  I may be
overthinking here and definitely having feedback from people with more
experience around that would be welcome.

> That is possibly true. I think it might work to just assume that you
> have to retry everything if it exits non-zero, but that requires the
> archive command to be smart enough to do something sensible if an
> identical file is already present in the archive.

Yes, it could be.  I think that we need more feedback for that too.

> Sure. Actually, I think a background worker would be better than a
> separate daemon. Then it could just talk to shared memory directly.

I thought about it too, but I was under the impression that most
people would want to implement a custom daemon (or already have) with
some more parallel/thread friendly language.

Re: parallelizing the archiver

From

Andrey Borodin

Date:

10 September 2021, 15:55:21


> 10 сент. 2021 г., в 19:19, Julien Rouhaud <rjuju123@gmail.com> написал(а):
> Wouldn't it be better to
> have a new archive_mode, e.g. "daemon", and have postgres responsible
> to (re)start it, and pass information through the daemon's
> stdin/stdout or something like that?

We don't even need to introduce new archive_mode.
Currently archive_command has no expectations regarding stdin\stdout.
Let's just say that we will push new WAL names to stdin until archive_command exits.
And if archive_command prints something to stdout we will interpret it as archived WAL names.
That's it.

Existing archive_commands will continue as is.

Currently information about what is archived is stored on filesystem in archive_status dir. We do not need to change
anything.
If archive_command exits (with any exit code) we will restart it if there are WAL files that still were not archived.

Best regards, Andrey Borodin.

Re: parallelizing the archiver

From

"Bossart, Nathan"

Date:

10 September 2021, 17:06:59

On 9/10/21, 8:22 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:
> On Fri, Sep 10, 2021 at 10:19 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
>> Those approaches don't really seems mutually exclusive?  In both case
>> you will need to internally track the status of each WAL file and
>> handle non contiguous file sequences.  In case of parallel commands
>> you only need additional knowledge that some commands is already
>> working on a file.  Wouldn't it be even better to eventually be able
>> launch multiple batches of multiple files rather than a single batch?
>
> Well, I guess I'm not convinced. Perhaps people with more knowledge of
> this than I may already know why it's beneficial, but in my experience
> commands like 'cp' and 'scp' are usually limited by the speed of I/O,
> not the fact that you only have one of them running at once. Running
> several at once, again in my experience, is typically not much faster.
> On the other hand, scp has a LOT of startup overhead, so it's easy to
> see the benefits of batching.
>
> [...]
>
>> If we start with parallelism first, the whole ecosystem could
>> immediately benefit from it as is.  To be able to handle multiple
>> files in a single command, we would need some way to let the server
>> know which files were successfully archived and which files weren't,
>> so it requires a different communication approach than the command
>> return code.
>
> That is possibly true. I think it might work to just assume that you
> have to retry everything if it exits non-zero, but that requires the
> archive command to be smart enough to do something sensible if an
> identical file is already present in the archive.

My initial thinking was similar to Julien's.  Assuming I have an
archive_command that handles one file, I can just set
archive_max_workers to 3 and reap the benefits.  If I'm using an
existing utility that implements its own parallelism, I can keep
archive_max_workers at 1 and continue using it.  This would be a
simple incremental improvement.

That being said, I think the discussion about batching is a good one
to have.  If the overhead described in your SCP example is
representative of a typical archive_command, then parallelism does
seem a bit silly.  We'd essentially be using a ton more resources when
there's obvious room for improvement via reducing amount of overhead
per archive.  I think we could easily make the batch size configurable
so that existing archive commands would work (e.g.,
archive_batch_size=1).  However, unlike the simple parallel approach,
you'd likely have to adjust your archive_command if you wanted to make
use of batching.  That doesn't seem terrible to me, though.  As
discussed above, there are some implementation details to work out for
archive failures, but nothing about that seems intractable to me.
Plus, if you still wanted to parallelize things, feeding your
archive_command several files at a time could still be helpful.

I'm currently leaning toward exploring the batching approach first.  I
suppose we could always make a prototype of both solutions for
comparison with some "typical" archive commands if that would help
with the discussion.

Nathan

Re: parallelizing the archiver

From

Jacob Champion

Date:

10 September 2021, 17:07:01

On Fri, 2021-09-10 at 23:48 +0800, Julien Rouhaud wrote:
> I totally agree that batching as many file as possible in a single
> command is probably what's gonna achieve the best performance.  But if
> the archiver only gets an answer from the archive_command once it
> tried to process all of the file, it also means that postgres won't be
> able to remove any WAL file until all of them could be processed.  It
> means that users will likely have to limit the batch size and
> therefore pay more startup overhead than they would like.  In case of
> archiving on server with high latency / connection overhead it may be
> better to be able to run multiple commands in parallel.

Well, users would also have to limit the parallelism, right? If
connections are high-overhead, I wouldn't imagine that running hundreds
of them simultaneously would work very well in practice. (The proof
would be in an actual benchmark, obviously, but usually I would rather
have one process handling a hundred items than a hundred processes
handling one item each.)

For a batching scheme, would it be that big a deal to wait for all of
them to be archived before removal?

> > That is possibly true. I think it might work to just assume that you
> > have to retry everything if it exits non-zero, but that requires the
> > archive command to be smart enough to do something sensible if an
> > identical file is already present in the archive.
> 
> Yes, it could be.  I think that we need more feedback for that too.

Seems like this is the sticking point. What would be the smartest thing
for the command to do? If there's a destination file already, checksum
it and make sure it matches the source before continuing?

--Jacob

Re: parallelizing the archiver

From

Robert Haas

Date:

10 September 2021, 17:09:51

On Fri, Sep 10, 2021 at 11:49 AM Julien Rouhaud <rjuju123@gmail.com> wrote:
> I totally agree that batching as many file as possible in a single
> command is probably what's gonna achieve the best performance.  But if
> the archiver only gets an answer from the archive_command once it
> tried to process all of the file, it also means that postgres won't be
> able to remove any WAL file until all of them could be processed.  It
> means that users will likely have to limit the batch size and
> therefore pay more startup overhead than they would like.  In case of
> archiving on server with high latency / connection overhead it may be
> better to be able to run multiple commands in parallel.  I may be
> overthinking here and definitely having feedback from people with more
> experience around that would be welcome.

That's a fair point. I'm not sure how much it matters, though. I think
you want to imagine a system where there are let's say 10 WAL flies
being archived per second.  Using fork() + exec() to spawn a shell
command 10 times per second is a bit expensive, whether you do it
serially or in parallel, and even if the command is something with a
less-insane startup overhead than scp. If we start a shell command say
every 3 seconds and give it 30 files each time, we can reduce the
startup costs we're paying by ~97% at the price of having to wait up
to 3 additional seconds to know that archiving succeeded for any
particular file. That sounds like a pretty good trade-off, because the
main benefit of removing old files is that it keeps us from running
out of disk space, and you should not be running a busy system in such
a way that it is ever within 3 seconds of running out of disk space,
so whatever.

If on the other hand you imagine a system that's not very busy, say 1
WAL file being archived every 10 seconds, then using a batch size of
30 would very significantly delay removal of old files. However, on
this system, batching probably isn't really needed. The rate of WAL
file generation is low enough that if you pay the startup cost of your
archive_command for every file, you're probably still doing just fine.

Probably, any kind of parallelism or batching needs to take this kind
of time-based thinking into account. For batching, the rate at which
files are generated should affect the batch size. For parallelism, it
should affect the number of processes used.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: parallelizing the archiver

From

"Bossart, Nathan"

Date:

10 September 2021, 17:18:59

On 9/10/21, 10:12 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:
> If on the other hand you imagine a system that's not very busy, say 1
> WAL file being archived every 10 seconds, then using a batch size of
> 30 would very significantly delay removal of old files. However, on
> this system, batching probably isn't really needed. The rate of WAL
> file generation is low enough that if you pay the startup cost of your
> archive_command for every file, you're probably still doing just fine.
>
> Probably, any kind of parallelism or batching needs to take this kind
> of time-based thinking into account. For batching, the rate at which
> files are generated should affect the batch size. For parallelism, it
> should affect the number of processes used.

I was thinking that archive_batch_size would be the maximum batch
size.  If the archiver only finds a single file to archive, that's all
it'd send to the archive command.  If it finds more, it'd send up to
archive_batch_size to the command.

Nathan

Re: parallelizing the archiver

From

Robert Haas

Date:

10 September 2021, 17:42:06

On Fri, Sep 10, 2021 at 1:07 PM Bossart, Nathan <bossartn@amazon.com> wrote:
> That being said, I think the discussion about batching is a good one
> to have.  If the overhead described in your SCP example is
> representative of a typical archive_command, then parallelism does
> seem a bit silly.

I think that's pretty realistic, because a lot of people's archive
commands are going to actually be, or need to use, scp specifically.
However, there are also cases where people are using commands that
just put the file in some local directory (maybe on a remote mount
point) and I would expect the startup overhead to be much less in
those cases. Maybe people are archiving via HTTPS or similar as well,
and then you again have some connection overhead though, I suspect,
not as much as scp, since web pages do not take 3 seconds to get an
https connection going. I don't know why scp is so crazy slow.

Even in the relatively low-overhead cases, though, I think we would
want to do some real testing to see if the benefits are as we expect.
See http://postgr.es/m/20200420211018.w2qphw4yybcbxksl@alap3.anarazel.de
and downthread for context. I was *convinced* that parallel backup was
a win. Benchmarking was a tad underwhelming, but there was a clear if
modest benefit by running a synthetic test of copying a lot of files
serially or in parallel, with the files spread across multiple
filesystems on the same physical box. However, when Andres modified my
test program to use posix_fadvise(), posix_fallocate(), and
sync_file_range() while doing the copies, the benefits of parallelism
largely evaporated, and in fact in some cases enabling parallelism
caused major regressions. In other words, the apparent benefits of
parallelism were really due to suboptimal behaviors in the Linux page
cache and some NUMA effects that were in fact avoidable.

So I'm suspicious that the same things might end up being true here.
It's not exactly the same, because the goal of WAL archiving is to
keep up with the rate of WAL generation, and the goal of a backup is
(unless max-rate is used) to finish as fast as possible, and that
difference in goals might end up being significant. Also, you can make
an argument that some people will benefit from a parallelism feature
even if a perfectly-implemented archive_command doesn't, because many
people use really terrible archive_commnads. But all that said, I
think the parallel backup discussion is still a cautionary tale to
which some attention ought to be paid.

> We'd essentially be using a ton more resources when
> there's obvious room for improvement via reducing amount of overhead
> per archive.  I think we could easily make the batch size configurable
> so that existing archive commands would work (e.g.,
> archive_batch_size=1).  However, unlike the simple parallel approach,
> you'd likely have to adjust your archive_command if you wanted to make
> use of batching.  That doesn't seem terrible to me, though.  As
> discussed above, there are some implementation details to work out for
> archive failures, but nothing about that seems intractable to me.
> Plus, if you still wanted to parallelize things, feeding your
> archive_command several files at a time could still be helpful.

Yep.

> I'm currently leaning toward exploring the batching approach first.  I
> suppose we could always make a prototype of both solutions for
> comparison with some "typical" archive commands if that would help
> with the discussion.

Yeah, I think the concerns here are more pragmatic than philosophical,
at least for me.

I had kind of been thinking that the way to attack this problem is to
go straight to allowing for a background worker, because the other
problem with archive_command is that running a shell command like cp,
scp, or rsync is not really safe. It won't fsync your data, it might
not fail if the file is in the archive already, and it definitely
won't succeed without doing anything if there's a byte for byte
identical file in the archive and fail if there's a file with
different contents already in the archive. Fixing that stuff by
running different shell commands is hard, but it wouldn't be that hard
to do it in C code, and you could then also extend whatever code you
wrote to do batching and parallelism; starting more workers isn't
hard.

However, I can't see the idea of running a shell command going away
any time soon, in spite of its numerous and severe drawbacks. Such an
interface provides a huge degree of flexibility and allows system
admins to whack around behavior easily, which you don't get if you
have to code every change in C. So I think command-based enhancements
are fine to pursue also, even though I don't think it's the ideal
place for most users to end up.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: parallelizing the archiver

From

Andrey Borodin

Date:

10 September 2021, 18:34:00


> 10 сент. 2021 г., в 22:18, Bossart, Nathan <bossartn@amazon.com> написал(а):
>
> I was thinking that archive_batch_size would be the maximum batch
> size.  If the archiver only finds a single file to archive, that's all
> it'd send to the archive command.  If it finds more, it'd send up to
> archive_batch_size to the command.

I think that a concept of a "batch" is misleading.
If you pass filenames via stdin you don't need to know all names upfront.
Just send more names to the pipe if achiver_command is still running one more segments just became available.
This way level of parallelism will adapt to the workload.

Best regards, Andrey Borodin.

Re: parallelizing the archiver

From

Stephen Frost

Date:

14 September 2021, 20:14:58

Greetings,

* Julien Rouhaud (rjuju123@gmail.com) wrote:
> On Fri, Sep 10, 2021 at 2:03 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> > > 10 сент. 2021 г., в 10:52, Julien Rouhaud <rjuju123@gmail.com> написал(а):
> > > Yes, but it also means that it's up to every single archiving tool to
> > > implement a somewhat hackish parallel version of an archive_command,
> > > hoping that core won't break it.

We've got too many archiving tools as it is, if you want my 2c on that.

> > I'm not proposing to remove existing archive_command. Just deprecate it one-WAL-per-call form.
>
> Which is a big API beak.

We definitely need to stop being afraid of this.  We completely changed
around how restores work and made pretty much all of the backup/restore
tools have to make serious changes when we released v12.

I definitely don't think that we should be making assumptions that
changing archive command to start running things in parallel isn't
*also* an API break too, in any case.  It is also a change and there's
definitely a good chance that it'd break some of the archivers out
there.  If we're going to make a change here, let's make a sensible one.

> > It's a very simplistic approach. If some GUC is set - archiver will just feed ready files to stdin of archive
command.What fundamental design changes we need? 

Haven't really thought about this proposal but it does sound
interesting.

Thanks,

Stephen

Attachment

signature.asc

Re: parallelizing the archiver

From

Julien Rouhaud

Date:

15 September 2021, 04:39:05

On Wed, Sep 15, 2021 at 4:14 AM Stephen Frost <sfrost@snowman.net> wrote:
>
> > > I'm not proposing to remove existing archive_command. Just deprecate it one-WAL-per-call form.
> >
> > Which is a big API beak.
>
> We definitely need to stop being afraid of this.  We completely changed
> around how restores work and made pretty much all of the backup/restore
> tools have to make serious changes when we released v12.

I never said that we should avoid API break at all cost, I said that
if we break the API we should introduce something better.  The
proposal to pass multiple file names to the archive command said
nothing about how to tell which ones were successfully archived and
which ones weren't, which is a big problem in my opinion.  But I think
we should also consider different approach, such as maintaining some
kind of daemon and asynchronously passing all the WAL file names,
waiting for answers.  Or maybe something else.  It's just that simply
"passing multiple file names" doesn't seem like a big enough win to
justify an API break to me.

> I definitely don't think that we should be making assumptions that
> changing archive command to start running things in parallel isn't
> *also* an API break too, in any case.  It is also a change and there's
> definitely a good chance that it'd break some of the archivers out
> there.  If we're going to make a change here, let's make a sensible one.

But doing parallel archiving can and should be controlled with a GUC,
so if your archive_command isn't compatible you can simply just not
use it (on top of having a default of not using parallel archiving, at
least for some times).  It doesn't seem like a big problem.

Re: parallelizing the archiver

From

"Bossart, Nathan"

Date:

30 September 2021, 04:47:34

On 9/10/21, 10:42 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:
> I had kind of been thinking that the way to attack this problem is to
> go straight to allowing for a background worker, because the other
> problem with archive_command is that running a shell command like cp,
> scp, or rsync is not really safe. It won't fsync your data, it might
> not fail if the file is in the archive already, and it definitely
> won't succeed without doing anything if there's a byte for byte
> identical file in the archive and fail if there's a file with
> different contents already in the archive. Fixing that stuff by
> running different shell commands is hard, but it wouldn't be that hard
> to do it in C code, and you could then also extend whatever code you
> wrote to do batching and parallelism; starting more workers isn't
> hard.
>
> However, I can't see the idea of running a shell command going away
> any time soon, in spite of its numerous and severe drawbacks. Such an
> interface provides a huge degree of flexibility and allows system
> admins to whack around behavior easily, which you don't get if you
> have to code every change in C. So I think command-based enhancements
> are fine to pursue also, even though I don't think it's the ideal
> place for most users to end up.

I've given this quite a bit of thought.  I hacked together a batching
approach for benchmarking, and it seemed to be a decent improvement,
but you're still shelling out every N files, and all the stuff about
shell commands not being ideal that you mentioned still applies.
Perhaps it's still a good improvement, and maybe we should still do
it, but I get the idea that many believe we can still do better.  So,
I looked into adding support for setting up archiving via an
extension.

The attached patch is a first try at adding alternatives for
archive_command, restore_command, archive_cleanup_command, and
recovery_end_command.  It adds the GUCs archive_library,
restore_library, archive_cleanup_library, and recovery_end_library.
Each of these accepts a library name that is loaded at startup,
similar to shared_preload_libraries.  _PG_init() is still used for
initialization, and you can use the same library for multiple purposes
by checking the new exported variables (e.g.,
process_archive_library_in_progress).  The library is then responsible
for implementing the relevant function, such as _PG_archive() or
_PG_restore().  The attached patch also demonstrates a simple
implementation for an archive_library that is similar to the sample
archive_command in the documentation.

I tested the sample archive_command in the docs against the sample
archive_library implementation in the patch, and I saw about a 50%
speedup.  (The archive_library actually syncs the files to disk, too.)
This is similar to the improvement from batching.

Of course, there are drawbacks to using an extension.  Besides the
obvious added complexity of building an extension in C versus writing
a shell command, the patch disallows changing the libraries without
restarting the server.  Also, the patch makes no effort to simplify
error handling, memory management, etc.  This is left as an exercise
for the extension author.

I'm sure there are other ways to approach this, but I thought I'd give
it a try to see what was possible and to get the conversation started.

Nathan

Attachment

v1-0001-backup-module-proof-of-concept.patch

Re: parallelizing the archiver

From

"Bossart, Nathan"

Date:

01 October 2021, 18:05:36

On 9/29/21, 9:49 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
> I'm sure there are other ways to approach this, but I thought I'd give
> it a try to see what was possible and to get the conversation started.

BTW I am also considering the background worker approach that was
mentioned upthread.  My current thinking is that the backup extension
would define a special background worker that communicates with the
archiver via shared memory.  As noted upthread, this would enable
extension authors to do whatever batching, parallelism, etc. that they
want, and it should also prevent failures from taking down the
archiver process.  However, this approach might not make sense for
things like recovery_end_command that are only executed once.  Maybe
it's okay to leave that one alone for now.

Nathan

Re: parallelizing the archiver

From

Andrey Borodin

Date:

01 October 2021, 19:07:49


> 30 сент. 2021 г., в 09:47, Bossart, Nathan <bossartn@amazon.com> написал(а):
>
> The attached patch is a first try at adding alternatives for
> archive_command
Looks like an interesting alternative design.

> I tested the sample archive_command in the docs against the sample
> archive_library implementation in the patch, and I saw about a 50%
> speedup.  (The archive_library actually syncs the files to disk, too.)
> This is similar to the improvement from batching.
Why test sample agains sample? I think if one tests this agains real archive tool doing archive_status lookup and
ready->donerenaming results will be much different. 

> Of course, there are drawbacks to using an extension.  Besides the
> obvious added complexity of building an extension in C versus writing
> a shell command, the patch disallows changing the libraries without
> restarting the server.  Also, the patch makes no effort to simplify
> error handling, memory management, etc.  This is left as an exercise
> for the extension author.
I think the real problem with extension is quite different than mentioned above.
There are many archive tools that already feature parallel archiving. PgBackrest, wal-e, wal-g, pg_probackup, pghoard,
pgbarmanand others. These tools by far outweight tools that don't look into archive_status to parallelize archival. 
And we are going to ask them: add also a C extension without any feasible benefit to the user. You only get some
restrictionslike system restart to enable shared library. 

I think we need a design that legalises already existing de-facto standard features in archive tools. Or event better -
enablesthese tools to be more efficient, reliable etc. Either way we will create legacy code from the scratch. 

Thanks!

Best regards, Andrey Borodin.

Re: parallelizing the archiver

From

"Bossart, Nathan"

Date:

01 October 2021, 21:21:04

On 10/1/21, 12:08 PM, "Andrey Borodin" <x4mmm@yandex-team.ru> wrote:
> 30 сент. 2021 г., в 09:47, Bossart, Nathan <bossartn@amazon.com> написал(а):
>> I tested the sample archive_command in the docs against the sample
>> archive_library implementation in the patch, and I saw about a 50%
>> speedup.  (The archive_library actually syncs the files to disk, too.)
>> This is similar to the improvement from batching.
> Why test sample agains sample? I think if one tests this agains real archive tool doing archive_status lookup and
ready->donerenaming results will be much different.

My intent was to demonstrate the impact of reducing the amount of
overhead when archiving.  I don't doubt that third party archive tools
can show improvements by doing batching/parallelism behind the scenes.

>> Of course, there are drawbacks to using an extension.  Besides the
>> obvious added complexity of building an extension in C versus writing
>> a shell command, the patch disallows changing the libraries without
>> restarting the server.  Also, the patch makes no effort to simplify
>> error handling, memory management, etc.  This is left as an exercise
>> for the extension author.
> I think the real problem with extension is quite different than mentioned above.
> There are many archive tools that already feature parallel archiving. PgBackrest, wal-e, wal-g, pg_probackup,
pghoard,pgbarman and others. These tools by far outweight tools that don't look into archive_status to parallelize
archival.
> And we are going to ask them: add also a C extension without any feasible benefit to the user. You only get some
restrictionslike system restart to enable shared library.

>
> I think we need a design that legalises already existing de-facto standard features in archive tools. Or event better
-enables these tools to be more efficient, reliable etc. Either way we will create legacy code from the scratch.

My proposal wouldn't require any changes to any of these utilities.
This design just adds a new mechanism that would allow end users to
set up archiving a different way with less overhead in hopes that it
will help them keep up.  I suspect a lot of work has been put into the
archive tools you mentioned to make sure they can keep up with high
rates of WAL generation, so I'm skeptical that anything we do here
will really benefit them all that much.  Ideally, we'd do something
that improves matters for everyone, though.  I'm open to suggestions.

Nathan

Re: parallelizing the archiver

From

Stephen Frost

Date:

05 October 2021, 02:20:41

Greetings,

* Bossart, Nathan (bossartn@amazon.com) wrote:
> On 10/1/21, 12:08 PM, "Andrey Borodin" <x4mmm@yandex-team.ru> wrote:
> > 30 сент. 2021 г., в 09:47, Bossart, Nathan <bossartn@amazon.com> написал(а):
> >> Of course, there are drawbacks to using an extension.  Besides the
> >> obvious added complexity of building an extension in C versus writing
> >> a shell command, the patch disallows changing the libraries without
> >> restarting the server.  Also, the patch makes no effort to simplify
> >> error handling, memory management, etc.  This is left as an exercise
> >> for the extension author.
> > I think the real problem with extension is quite different than mentioned above.
> > There are many archive tools that already feature parallel archiving. PgBackrest, wal-e, wal-g, pg_probackup,
pghoard,pgbarman and others. These tools by far outweight tools that don't look into archive_status to parallelize
archival.
> > And we are going to ask them: add also a C extension without any feasible benefit to the user. You only get some
restrictionslike system restart to enable shared library. 
> >
> > I think we need a design that legalises already existing de-facto standard features in archive tools. Or event
better- enables these tools to be more efficient, reliable etc. Either way we will create legacy code from the scratch. 
>
> My proposal wouldn't require any changes to any of these utilities.
> This design just adds a new mechanism that would allow end users to
> set up archiving a different way with less overhead in hopes that it
> will help them keep up.  I suspect a lot of work has been put into the
> archive tools you mentioned to make sure they can keep up with high
> rates of WAL generation, so I'm skeptical that anything we do here
> will really benefit them all that much.  Ideally, we'd do something
> that improves matters for everyone, though.  I'm open to suggestions.

This has something we've contemplated quite a bit and the last thing
that I'd want to have is a requirement to configure a whole bunch of
additional parameters to enable this.  Why do we need to have some many
new GUCs?  I would have thought we'd probably be able to get away with
just having the appropriate hooks and then telling folks to load the
extension in shared_preload_libraries..

As for the hooks themselves, I'd certainly hope that they'd be designed
to handle batches of WAL rather than individual ones as that's long been
one of the main issues with the existing archive command approach.  I
appreciate that maybe that's less of an issue with a shared library but
it's still something to consider.

Admittedly, I haven't looked in depth with this patch set and am just
going off of the description of them provided in the thread, so perhaps
I missed something.

Thanks,

Stephen

Attachment

signature.asc

Re: parallelizing the archiver

From

"Bossart, Nathan"

Date:

05 October 2021, 03:07:46

On 10/4/21, 7:21 PM, "Stephen Frost" <sfrost@snowman.net> wrote:
> This has something we've contemplated quite a bit and the last thing
> that I'd want to have is a requirement to configure a whole bunch of
> additional parameters to enable this.  Why do we need to have some many
> new GUCs?  I would have thought we'd probably be able to get away with
> just having the appropriate hooks and then telling folks to load the
> extension in shared_preload_libraries..

That would certainly simplify my patch quite a bit.  I'll do it this
way in the next revision.

> As for the hooks themselves, I'd certainly hope that they'd be designed
> to handle batches of WAL rather than individual ones as that's long been
> one of the main issues with the existing archive command approach.  I
> appreciate that maybe that's less of an issue with a shared library but
> it's still something to consider.

Will do.  This seems like it should be easier with the hook because we
can provide a way to return which files were successfully archived.

Nathan

Re: parallelizing the archiver

From

Stephen Frost

Date:

05 October 2021, 03:18:12

Greetings,

* Bossart, Nathan (bossartn@amazon.com) wrote:
> On 10/4/21, 7:21 PM, "Stephen Frost" <sfrost@snowman.net> wrote:
> > This has something we've contemplated quite a bit and the last thing
> > that I'd want to have is a requirement to configure a whole bunch of
> > additional parameters to enable this.  Why do we need to have some many
> > new GUCs?  I would have thought we'd probably be able to get away with
> > just having the appropriate hooks and then telling folks to load the
> > extension in shared_preload_libraries..
>
> That would certainly simplify my patch quite a bit.  I'll do it this
> way in the next revision.
>
> > As for the hooks themselves, I'd certainly hope that they'd be designed
> > to handle batches of WAL rather than individual ones as that's long been
> > one of the main issues with the existing archive command approach.  I
> > appreciate that maybe that's less of an issue with a shared library but
> > it's still something to consider.
>
> Will do.  This seems like it should be easier with the hook because we
> can provide a way to return which files were successfully archived.

It's also been discussed, at least around the water cooler (as it were
in pandemic times- aka our internal slack channels..) that the existing
archive command might be reimplemented as an extension using these.  Not
sure if that's really necessary but it was a thought.  In any case,
thanks for working on this!

Stephen

Attachment

signature.asc

Re: parallelizing the archiver

From

"Bossart, Nathan"

Date:

05 October 2021, 03:31:52

On 10/4/21, 8:19 PM, "Stephen Frost" <sfrost@snowman.net> wrote:
> It's also been discussed, at least around the water cooler (as it were
> in pandemic times- aka our internal slack channels..) that the existing
> archive command might be reimplemented as an extension using these.  Not
> sure if that's really necessary but it was a thought.  In any case,
> thanks for working on this!

Interesting.  I like the idea of having one code path for everything
instead of branching for the hook and non-hook paths.  Thanks for
sharing your thoughts.

Nathan

Re: parallelizing the archiver

From

Magnus Hagander

Date:

06 October 2021, 20:33:18

On Tue, Oct 5, 2021 at 5:32 AM Bossart, Nathan <bossartn@amazon.com> wrote:

On 10/4/21, 8:19 PM, "Stephen Frost" <sfrost@snowman.net> wrote:
> It's also been discussed, at least around the water cooler (as it were
> in pandemic times- aka our internal slack channels..) that the existing
> archive command might be reimplemented as an extension using these. Not
> sure if that's really necessary but it was a thought. In any case,
> thanks for working on this!

Interesting. I like the idea of having one code path for everything
instead of branching for the hook and non-hook paths. Thanks for
sharing your thoughts.

I remember having had this discussion a few times, I think mainly with Stephen and David as well (but not on their internal slack channels :P).

I definitely think that's the way to go. It gives a single path for everything which makes it simpler in the most critical parts. And once you have picked an implementation other than it, you're now completely rid of the old implementation. And of course the good old idea that having an extension already using the API is a good way to show that the API is in a good place.

As much as I dislike our current interface in archive_command, and would like to see it go away completely, I do believe we need to ship something that has it - if nothing else then for backwards compatibility. But an extension like this would also make it easier to eventually, down the road, deprecate this solution.

Oh, and please put said implementation in a better place than contrib :)

Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Re: parallelizing the archiver

From

"Bossart, Nathan"

Date:

18 October 2021, 23:24:58

On 10/6/21, 1:34 PM, "Magnus Hagander" <magnus@hagander.net> wrote:
> I definitely think that's the way to go. It gives a single path for everything which makes it simpler in the most
criticalparts. And once you have picked an implementation other than it, you're now completely rid of the old
implementation. And of course the good old idea that having an extension already using the API is a good way to show
thatthe API is in a good place. 

>
> As much as I dislike our current interface in archive_command, and would like to see it go away completely, I do
believewe need to ship something that has it - if nothing else then for backwards compatibility. But an extension like
thiswould also make it easier to eventually, down the road, deprecate this solution. 

>
> Oh, and please put said implementation in a better place than contrib :)

I've attached an attempt at moving the archive_command logic to its
own module and replacing it with a hook.  This was actually pretty
straightforward.

I think the biggest question is where to put the archive_command
module, which I've called shell_archive.  The only existing directory
that looked to me like it might work is src/test/modules.  It might be
rather bold to relegate this functionality to a test module so
quickly, but on the other hand, perhaps it's the right thing to do
given we intend to deprecate it in the future.  I'm curious what
others think about this.

I'm still working on the documentation updates, which are quite
extensive.  I haven't included any of those in the patch yet.

Nathan

Attachment

v2-0001-Replace-archive_command-with-a-hook.patch

Re: parallelizing the archiver

From

Robert Haas

Date:

19 October 2021, 12:50:37

On Mon, Oct 18, 2021 at 7:25 PM Bossart, Nathan <bossartn@amazon.com> wrote:
> I think the biggest question is where to put the archive_command
> module, which I've called shell_archive.  The only existing directory
> that looked to me like it might work is src/test/modules.  It might be
> rather bold to relegate this functionality to a test module so
> quickly, but on the other hand, perhaps it's the right thing to do
> given we intend to deprecate it in the future.  I'm curious what
> others think about this.

I don't see that as being a viable path forward based on my customer
interactions working here at EDB.

I am not quite sure why we wouldn't just compile the functions into
the server. Functions pointers can point to core functions as surely
as loadable modules. The present design isn't too congenial to that
because it's relying on the shared library loading mechanism to wire
the thing in place - but there's no reason it has to be that way.
Logical decoding plugins don't work that way, for example. We could
still have a GUC, say call it archive_method, that selects the module
-- with 'shell' being a builtin method, and others being loadable as
modules. If you set archive_method='shell' then you enable this
module, and it has its own GUC, say call it archive_command, to
configure the behavior.

An advantage of this approach is that it's perfectly
backward-compatible. I understand that archive_command is a hateful
thing to many people here, but software has to serve the user base,
not just the developers. Lots of people use archive_command and rely
on it -- and are not interested in installing yet another piece of
out-of-core software to do what $OTHERDB has built in.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: parallelizing the archiver

From

David Steele

Date:

19 October 2021, 13:38:37

On 10/19/21 8:50 AM, Robert Haas wrote:
> On Mon, Oct 18, 2021 at 7:25 PM Bossart, Nathan <bossartn@amazon.com> wrote:
>> I think the biggest question is where to put the archive_command
>> module, which I've called shell_archive.  The only existing directory
>> that looked to me like it might work is src/test/modules.  It might be
>> rather bold to relegate this functionality to a test module so
>> quickly, but on the other hand, perhaps it's the right thing to do
>> given we intend to deprecate it in the future.  I'm curious what
>> others think about this.
> 
> I don't see that as being a viable path forward based on my customer
> interactions working here at EDB.
> 
> I am not quite sure why we wouldn't just compile the functions into
> the server. Functions pointers can point to core functions as surely
> as loadable modules. The present design isn't too congenial to that
> because it's relying on the shared library loading mechanism to wire
> the thing in place - but there's no reason it has to be that way.
> Logical decoding plugins don't work that way, for example. We could
> still have a GUC, say call it archive_method, that selects the module
> -- with 'shell' being a builtin method, and others being loadable as
> modules. If you set archive_method='shell' then you enable this
> module, and it has its own GUC, say call it archive_command, to
> configure the behavior.
> 
> An advantage of this approach is that it's perfectly
> backward-compatible. I understand that archive_command is a hateful
> thing to many people here, but software has to serve the user base,
> not just the developers. Lots of people use archive_command and rely
> on it -- and are not interested in installing yet another piece of
> out-of-core software to do what $OTHERDB has built in.

+1 to all of this, certainly for the time being. The archive_command 
mechanism is not great, but it is simple, and this part is not really 
what makes writing a good archive command hard.

I had also originally envisioned this a default extension in core, but 
having the default 'shell' method built-in is certainly simpler.

Regards,
-- 
-David
david@pgmasters.net

Re: parallelizing the archiver

From

Magnus Hagander

Date:

19 October 2021, 14:19:04

On Tue, Oct 19, 2021 at 2:50 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Oct 18, 2021 at 7:25 PM Bossart, Nathan <bossartn@amazon.com> wrote:
> I think the biggest question is where to put the archive_command
> module, which I've called shell_archive. The only existing directory
> that looked to me like it might work is src/test/modules. It might be
> rather bold to relegate this functionality to a test module so
> quickly, but on the other hand, perhaps it's the right thing to do
> given we intend to deprecate it in the future. I'm curious what
> others think about this.

I don't see that as being a viable path forward based on my customer
interactions working here at EDB.

I am not quite sure why we wouldn't just compile the functions into
the server. Functions pointers can point to core functions as surely
as loadable modules. The present design isn't too congenial to that
because it's relying on the shared library loading mechanism to wire
the thing in place - but there's no reason it has to be that way.
Logical decoding plugins don't work that way, for example. We could
still have a GUC, say call it archive_method, that selects the module
-- with 'shell' being a builtin method, and others being loadable as
modules. If you set archive_method='shell' then you enable this
module, and it has its own GUC, say call it archive_command, to
configure the behavior.

Yeah, seems reasonable. It wouldn't serve as well as an example to developers, but then it's probably not the "loadable module" part of building it that people need examples of. So as long as it's using the same internal APIs and just happens to be compiled in by default, I see no problem with that.

But, is logical decoding really that great an example? I mean, we build pgoutput.so as a library, we don't provide it compiled-in. So we could build the "shell archiver" based on that pattern, in which case we should create a postmaster/shell_archiver directory or something like that?

It should definitely not go under "test".

An advantage of this approach is that it's perfectly
backward-compatible. I understand that archive_command is a hateful
thing to many people here, but software has to serve the user base,
not just the developers. Lots of people use archive_command and rely
on it -- and are not interested in installing yet another piece of
out-of-core software to do what $OTHERDB has built in.

Backwards compatibility is definitely a must, I'd say. Regardless of exactly how the backwards-compatible pugin is shipped, it should be what's turned on by default.

Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Re: parallelizing the archiver

From

"Bossart, Nathan"

Date:

19 October 2021, 16:10:52

On 10/19/21, 6:39 AM, "David Steele" <david@pgmasters.net> wrote:
> On 10/19/21 8:50 AM, Robert Haas wrote:
>> I am not quite sure why we wouldn't just compile the functions into
>> the server. Functions pointers can point to core functions as surely
>> as loadable modules. The present design isn't too congenial to that
>> because it's relying on the shared library loading mechanism to wire
>> the thing in place - but there's no reason it has to be that way.
>> Logical decoding plugins don't work that way, for example. We could
>> still have a GUC, say call it archive_method, that selects the module
>> -- with 'shell' being a builtin method, and others being loadable as
>> modules. If you set archive_method='shell' then you enable this
>> module, and it has its own GUC, say call it archive_command, to
>> configure the behavior.
>>
>> An advantage of this approach is that it's perfectly
>> backward-compatible. I understand that archive_command is a hateful
>> thing to many people here, but software has to serve the user base,
>> not just the developers. Lots of people use archive_command and rely
>> on it -- and are not interested in installing yet another piece of
>> out-of-core software to do what $OTHERDB has built in.
>
> +1 to all of this, certainly for the time being. The archive_command
> mechanism is not great, but it is simple, and this part is not really
> what makes writing a good archive command hard.
>
> I had also originally envisioned this a default extension in core, but
> having the default 'shell' method built-in is certainly simpler.

I have no problem building it this way.  It's certainly better for
backward compatibility, which I think everyone here feels is
important.

Robert's proposed design is a bit more like my original proof-of-
concept [0].  There, I added an archive_library GUC which was
basically an extension of shared_preload_libraries (which creates some
interesting problems in the library loading logic).  You could only
set one of archive_command or archive_library at any given time.  When
the archive_library was set, we ran that library's _PG_init() just
like we do for any other library, and then we set the archiver
function pointer to the library's _PG_archive() function.

IIUC the main difference between this design and what Robert proposes
is that we'd also move the existing archive_command stuff somewhere
else and then access it via the archiver function pointer.  I think
that is clearly better than branching based on whether archive_command
or archive_library is set.  (BTW I'm not wedded to these GUCs.  If
folks would rather create something like the archive_method GUC, I
think that would work just as well.)

My original proof-of-concept also attempted to handle a bunch of other
shell command GUCs, but perhaps I'd better keep this focused on
archive_command for now.  What we do here could serve as an example of
how to adjust the other shell command GUCs later on.  I'll go ahead
and rework my patch to look more like what is being discussed here,
although I expect the exact design for the interface will continue to
evolve based on the feedback in this thread.

Nathan

[0] https://postgr.es/m/E9035E94-EC76-436E-B6C9-1C03FBD8EF54%40amazon.com

Re: parallelizing the archiver

From

Robert Haas

Date:

19 October 2021, 16:12:00

On Tue, Oct 19, 2021 at 10:19 AM Magnus Hagander <magnus@hagander.net> wrote:
> But, is logical decoding really that great an example? I mean, we build pgoutput.so as a library, we don't provide it
compiled-in.So we could build the "shell archiver" based on that pattern, in which case we should create a
postmaster/shell_archiverdirectory or something like that?

Well, I guess you could also use parallel contexts as an example.
There, the core facilities that most people will use are baked into
the server, but you can provide your own in an extension and the
parallel context stuff will happily call it for you if you so request.

I don't think the details here are too important. I'm just saying that
not everything needs to depend on _PG_init() as a way of bootstrapping
itself. TBH, if I ran the zoo and also had infinite time to tinker
with stuff like this, I'd probably make a pass through the hooks we
already have and try to refactor as many of them as possible to use
some mechanism other than _PG_init() to bootstrap themselves. That
mechanism actually sucks. When we use other mechanisms -- like a
language "C" function that knows the shared object name and function
name -- then load is triggered when it's needed, and the user gets the
behavior they want. Similarly with logical decoding and FDWs -- you,
as the user, say that you want this or that kind of logical decoding
or FDW or C function or whatever -- and then the system either notices
that it's already loaded and does what you want, or notices that it's
not loaded and loads it, and then does what you want.

But when the bootstrapping mechanism is _PG_init(), then the user has
got to make sure the library is loaded at the correct time. They have
to know whether it should go into shared_preload_libraries or whether
it should be put into one of the other various GUCs or if it can be
loaded on the fly with LOAD. If they don't load it in the right way,
or if it doesn't get loaded at all, well then probably it just
silently doesn't work. Plus there can be weird cases if it gets loaded
into some backends but not others and things like that.

And here we seem to have an opportunity to improve the interface by
not depending on it.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: parallelizing the archiver

From

Stephen Frost

Date:

19 October 2021, 18:50:34

Greetings,

* Magnus Hagander (magnus@hagander.net) wrote:
> Backwards compatibility is definitely a must, I'd say. Regardless of
> exactly how the backwards-compatible pugin is shipped, it should be what's
> turned on by default.

I keep seeing this thrown around and I don't quite get why we feel this
is the case.  I'm not completely against trying to maintain backwards
compatibility, but at the same time, we just went through changing quite
a bit around in v12 with the restore command and that's the other half
of this.  Why are we so concerned about backwards compatibility here
when there was hardly any complaint raised about breaking it in the
restore case?

If maintaining compatibility makes this a lot more difficult or ugly,
then I'm against doing so.  I don't know that to be the case, none of
the proposed approaches really sound all that bad to me, but I certainly
don't think we should be entirely avoiding the idea of breaking
backwards compatibility here.  We literally just did that and while
there's been some noise about it, it's hardly risen to the level of
being "something we should never, ever, even consider doing again" as
seems to be implied on this thread.

For those who might argue that maintaining compatibility for archive
command is somehow more important than for restore command- allow me to
save you the trouble and just let you know that I don't buy off on such
an argument.  If anything, it should be the opposite.  You back up your
database all the time and you're likely to see much more quickly if that
stops working.  Database restores, on the other hand, are nearly always
done in times of great stress and when you want things to be very clear
and easy to follow and for everything to 'just work'.

Thanks,

Stephen

Attachment

signature.asc

Re: parallelizing the archiver

From

"Bossart, Nathan"

Date:

20 October 2021, 22:20:21

On 10/19/21, 9:14 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
> My original proof-of-concept also attempted to handle a bunch of other
> shell command GUCs, but perhaps I'd better keep this focused on
> archive_command for now.  What we do here could serve as an example of
> how to adjust the other shell command GUCs later on.  I'll go ahead
> and rework my patch to look more like what is being discussed here,
> although I expect the exact design for the interface will continue to
> evolve based on the feedback in this thread.

Alright, I reworked the patch a bit to maintain backward
compatibility.  My initial intent for 0001 was to just do a clean
refactor to move the shell archiving stuff to its own file.  However,
after I did that, I realized that adding the hook wouldn't be too much
more work, so I did that as well.  This seems to be enough to support
custom archiving modules.  I included a basic example of such a module
in 0002.  0002 is included primarily for demonstration purposes.

I do wonder if there are some further enhancements we should make to
the archiving module interface.  With 0001 applied, archive_command is
silently ignored if you've preloaded a library that uses the hook.
There's no way to indicate that you actually want to use
archive_command or that you want to use a specific library as the
archive library.  On the other hand, just adding the hook keeps things
simple, and it doesn't preclude future improvements in this area.

Nathan

Attachment

Re: parallelizing the archiver

From

"Bossart, Nathan"

Date:

21 October 2021, 19:51:47

On 10/20/21, 3:23 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
> Alright, I reworked the patch a bit to maintain backward
> compatibility.  My initial intent for 0001 was to just do a clean
> refactor to move the shell archiving stuff to its own file.  However,
> after I did that, I realized that adding the hook wouldn't be too much
> more work, so I did that as well.  This seems to be enough to support
> custom archiving modules.  I included a basic example of such a module
> in 0002.  0002 is included primarily for demonstration purposes.

It looks like the FreeBSD build is failing because sys/wait.h is
missing.  Here is an attempt at fixing that.

Nathan

Attachment

Re: parallelizing the archiver

From

Robert Haas

Date:

21 October 2021, 20:19:56

On Tue, Oct 19, 2021 at 2:50 PM Stephen Frost <sfrost@snowman.net> wrote:
> I keep seeing this thrown around and I don't quite get why we feel this
> is the case.  I'm not completely against trying to maintain backwards
> compatibility, but at the same time, we just went through changing quite
> a bit around in v12 with the restore command and that's the other half
> of this.  Why are we so concerned about backwards compatibility here
> when there was hardly any complaint raised about breaking it in the
> restore case?

There are 0 references to restore_command in the v12 release notes.
Just in case you had the version number wrong in this email, I
compared the documentation for restore_command in v10 to the
documentation in v14. The differences seem to be only cosmetic. So I'm
not sure what functional change you think we made. It was probably
less significant than what was being discussed here in regards to
archive_command.

Also, more to the point, when there's a need to break backward
compatibility in order to get some improvement, it's worth
considering, but here there just isn't.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: parallelizing the archiver

From

Stephen Frost

Date:

21 October 2021, 20:28:59

Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Tue, Oct 19, 2021 at 2:50 PM Stephen Frost <sfrost@snowman.net> wrote:
> > I keep seeing this thrown around and I don't quite get why we feel this
> > is the case.  I'm not completely against trying to maintain backwards
> > compatibility, but at the same time, we just went through changing quite
> > a bit around in v12 with the restore command and that's the other half
> > of this.  Why are we so concerned about backwards compatibility here
> > when there was hardly any complaint raised about breaking it in the
> > restore case?
>
> There are 0 references to restore_command in the v12 release notes.
> Just in case you had the version number wrong in this email, I
> compared the documentation for restore_command in v10 to the
> documentation in v14. The differences seem to be only cosmetic. So I'm
> not sure what functional change you think we made. It was probably
> less significant than what was being discussed here in regards to
> archive_command.

restore_command used to be in recovery.conf, which disappeared with v12
and it now has to go into postgresql.auto.conf or postgresql.conf.

That's a huge breaking change.

> Also, more to the point, when there's a need to break backward
> compatibility in order to get some improvement, it's worth
> considering, but here there just isn't.

There won't be any thought towards a backwards-incompatible capability
if everyone is saying that we can't possibly break it.  That's why I was
commenting on it.

Thanks,

Stephen

Attachment

signature.asc

Re: parallelizing the archiver

From

Robert Haas

Date:

21 October 2021, 21:05:38

On Thu, Oct 21, 2021 at 4:29 PM Stephen Frost <sfrost@snowman.net> wrote:
> restore_command used to be in recovery.conf, which disappeared with v12
> and it now has to go into postgresql.auto.conf or postgresql.conf.
>
> That's a huge breaking change.

Not in the same sense. Moving the functionality to a different
configuration file can and probably did cause a lot of problems for
people, but the same basic functionality was still available.

(Also, I'm pretty sure that the recovery.conf changes would have
happened years earlier if there hadn't been backward compatibility
concerns, from Simon in particular. So saying that there was "hardly
any complaint raised" in that case doesn't seem to me to be entirely
accurate.)

> > Also, more to the point, when there's a need to break backward
> > compatibility in order to get some improvement, it's worth
> > considering, but here there just isn't.
>
> There won't be any thought towards a backwards-incompatible capability
> if everyone is saying that we can't possibly break it.  That's why I was
> commenting on it.

I can't speak for anyone else, but that is not what I am saying. I am
open to the idea of breaking it if we thereby get some valuable
benefit which cannot be obtained otherwise. But Nathan has now
implemented something which, from the sound of it, will allow us to
obtain all of the available benefits with no incompatibilities. If we
think of additional benefits that we cannot obtain without
incompatibilities, then we can consider that situation when it arises.
In the meantime, there's no need to go looking for reasons to break
stuff that works in existing releases.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: parallelizing the archiver

From

Magnus Hagander

Date:

22 October 2021, 14:33:47

On Thu, Oct 21, 2021 at 11:05 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Oct 21, 2021 at 4:29 PM Stephen Frost <sfrost@snowman.net> wrote:
> restore_command used to be in recovery.conf, which disappeared with v12
> and it now has to go into postgresql.auto.conf or postgresql.conf.
>
> That's a huge breaking change.

Not in the same sense. Moving the functionality to a different
configuration file can and probably did cause a lot of problems for
people, but the same basic functionality was still available.

Yeah.

And as a bonus it got a bunch of people to upgrade their backup software that suddenly stopped working. Or in some case, to install backup software instead of using the hand-rolled scripts. So there were some good side-effects specifically to breaking it as well.

(Also, I'm pretty sure that the recovery.conf changes would have
happened years earlier if there hadn't been backward compatibility
concerns, from Simon in particular. So saying that there was "hardly
any complaint raised" in that case doesn't seem to me to be entirely
accurate.)

> > Also, more to the point, when there's a need to break backward
> > compatibility in order to get some improvement, it's worth
> > considering, but here there just isn't.
>
> There won't be any thought towards a backwards-incompatible capability
> if everyone is saying that we can't possibly break it. That's why I was
> commenting on it.

I can't speak for anyone else, but that is not what I am saying. I am
open to the idea of breaking it if we thereby get some valuable
benefit which cannot be obtained otherwise. But Nathan has now
implemented something which, from the sound of it, will allow us to
obtain all of the available benefits with no incompatibilities. If we
think of additional benefits that we cannot obtain without
incompatibilities, then we can consider that situation when it arises.
In the meantime, there's no need to go looking for reasons to break
stuff that works in existing releases.

Agreed.

Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Re: parallelizing the archiver

From

Magnus Hagander

Date:

22 October 2021, 14:42:01

On Thu, Oct 21, 2021 at 9:51 PM Bossart, Nathan <bossartn@amazon.com> wrote:

On 10/20/21, 3:23 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
> Alright, I reworked the patch a bit to maintain backward
> compatibility. My initial intent for 0001 was to just do a clean
> refactor to move the shell archiving stuff to its own file. However,
> after I did that, I realized that adding the hook wouldn't be too much
> more work, so I did that as well. This seems to be enough to support
> custom archiving modules. I included a basic example of such a module
> in 0002. 0002 is included primarily for demonstration purposes.

It looks like the FreeBSD build is failing because sys/wait.h is
missing. Here is an attempt at fixing that.

I still like the idea of loading the library via a special parameter, archive_library or such.

One reason for that is that adding/removing modules in shared_preload_libraries has a terrible UX in that you have to replace the whole thing. This makes it much more complex to deal with when different modules just want to add to it.

E.g. my awsome backup program could set archive_library='my_awesome_backups', and know it didn't break anything else. but it couldn't set shared_preload_libraries='my_awesome_bacukps', because then it might break a bunch of other modules that used to be there. So it has to go try to parse the whole config and figure out where to make such modifications.

Now, this could *also* be solved by allowing shared_preload_library to be a "list" instead of a string, and allow postgresql.conf to accept syntax like shared_preload_libraries+='my_awesome_backups'.

But without that level fo functionality available, I think a separate parameter for the archive library would be a good thing.

Other than that:

+
+/*
+ * Is WAL archiving configured? For consistency with previous releases, this
+ * checks that archive_command is set when archiving via shell is enabled.
+ * Otherwise, we just check that an archive function is set, and it is the
+ * responsibility of that archive function to ensure it is properly configured.
+ */
+#define XLogArchivingConfigured() \
+ (PG_archive && (PG_archive != shell_archive || XLogArchiveCommand[0] != '\0'))

Wouldn't that be better as a callback into the module? So that shell_archive would implement the check for XLogArchiveCommand. Then another third party module can make it's own decision on what to check. And PGarchive would then be a struct that holds a function pointer to the archive command and another function pointer to the isenabled command? (I think having a struct for it would be useful regardless -- for possible future extensions with more API points).

Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Re: parallelizing the archiver

From

"Bossart, Nathan"

Date:

22 October 2021, 17:42:09

On 10/22/21, 7:43 AM, "Magnus Hagander" <magnus@hagander.net> wrote:
> I still like the idea of loading the library via a special
> parameter, archive_library or such.

My first attempt [0] added a GUC like this, so I can speak to some of
the interesting design decisions that follow.

The simplest thing we could do would be to add the archive_library GUC
and to load that just like the library is at the end of
shared_preload_libraries.  This would mean that the archive library
could be specified in either GUC, and there would effectively be no
difference between the two.

The next thing we could consider doing is adding a new boolean called
process_archive_library_in_progress, which would be analogous to
process_shared_preload_libraries_in_progress.  If a library is loaded
from the archive_library GUC, its _PG_init() will be called with
process_archive_library_in_progress set.  This also means that if a
library is specified in both shared_preload_libraries and
archive_library, we'd call its _PG_init() twice.  The library could
then branch based on whether
process_shared_preload_libraries_in_progress or
process_archive_library_in_progress was set.

Another approach would be to add a new initialization function (e.g.,
PG_archive_init()) that would be used if the library is being loaded
from archive_library.  Like before, you can use the library for both
shared_preload_libraries and archive_library, but your initialization
logic would be expected to go in separate functions.  However, there
still wouldn't be anything forcing that.  A library could still break
the rules and do everything in _PG_init() and be loaded via
shared_preload_libraries.

One more thing we could do is to discover the relevant symbols for
archiving in library loading function.  Rather than expecting the
initialization function to set the hook correctly, we'd just look up
the _PG_archive() function during loading.  Again, a library could
probably still break the rules and do everything in
_PG_init()/shared_preload_libraries, but there would at least be a
nicer interface available.

I believe the main drawbacks of going down this path are the
additional complexity in the backend and the slippery slope of adding
all kinds of new GUCs in the future.  My original patch also tried to
do something similar for some other shell command GUCs
(archive_cleanup_command, restore_command, and recovery_end_command).
While I'm going to try to keep this focused on archive_command for
now, presumably we'd eventually want the ability to use hooks for all
of these things.  I don't know if we really want to incur a new GUC
for every single one of these.  To be clear, I'm not against adding a
GUC if it seems like the right thing to do.  I just want to make sure
we are aware of the tradeoffs compared to a simple
shared_preload_libraries approach with its terrible UX.

> Wouldn't that be better as a callback into the module? So that
> shell_archive would implement the check for XLogArchiveCommand. Then
> another third party module can make it's own decision on what to
> check. And PGarchive would then be a struct that holds a function
> pointer to the archive command and another function pointer to the
> isenabled command? (I think having a struct for it would be useful
> regardless -- for possible future extensions with more API points).

+1.  This crossed my mind, too.  I'll add this in the next revision.

Nathan

[0] https://postgr.es/m/E9035E94-EC76-436E-B6C9-1C03FBD8EF54%40amazon.com

Re: parallelizing the archiver

From

Robert Haas

Date:

22 October 2021, 23:34:25

On Fri, Oct 22, 2021 at 1:42 PM Bossart, Nathan <bossartn@amazon.com> wrote:
> Another approach would be to add a new initialization function (e.g.,
> PG_archive_init()) that would be used if the library is being loaded
> from archive_library.  Like before, you can use the library for both
> shared_preload_libraries and archive_library, but your initialization
> logic would be expected to go in separate functions.  However, there
> still wouldn't be anything forcing that.  A library could still break
> the rules and do everything in _PG_init() and be loaded via
> shared_preload_libraries.

I was imagining something like what logical decoding does. In that
case, you make a _PG_output_plugin_init function and it returns a
table of callbacks. Then the core code invokes those callbacks at the
appropriate times.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: parallelizing the archiver

From

"Bossart, Nathan"

Date:

24 October 2021, 06:15:13

On 10/22/21, 4:35 PM, "Robert Haas" <robertmhaas@gmail.com> wrote:
> I was imagining something like what logical decoding does. In that
> case, you make a _PG_output_plugin_init function and it returns a
> table of callbacks. Then the core code invokes those callbacks at the
> appropriate times.

Here is an attempt at doing this.  Archive modules are expected to
declare _PG_archive_module_init(), which can define GUCs, register
background workers, etc.  This function must at least define the
archive callbacks.  For now, I've introduced two callbacks.  The first
is for checking that the archive module is configured, and the second
contains the actual archiving logic.

I've written this so that the same library can be used for multiple
purposes (e.g., it could be in shared_preload_libraries and
archive_library).  I don't know if that's really necessary, but it
seemed to me like a reasonable way to handle the changes to the
library loading logic that we need anyway.

0002 is still a sample backup module, but I also added some handling
for preexisting archives.  If the preexisting archive file has the
same contents as the current file to archive, archiving is allowed to
continue.  If the contents don't match, archiving fails.  This sample
module could still produce unexpected results if two servers were
sending archives to the same directory.  I stopped short of adding
handling for that case, but that might be a good thing to tackle next.

Nathan

Attachment

Re: parallelizing the archiver

From

Stephen Frost

Date:

25 October 2021, 16:26:13

Greetings,

* Magnus Hagander (magnus@hagander.net) wrote:
> On Thu, Oct 21, 2021 at 11:05 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > On Thu, Oct 21, 2021 at 4:29 PM Stephen Frost <sfrost@snowman.net> wrote:
> > > restore_command used to be in recovery.conf, which disappeared with v12
> > > and it now has to go into postgresql.auto.conf or postgresql.conf.
> > >
> > > That's a huge breaking change.
> >
> > Not in the same sense. Moving the functionality to a different
> > configuration file can and probably did cause a lot of problems for
> > people, but the same basic functionality was still available.
>
> Yeah.
>
> And as a bonus it got a bunch of people to upgrade their backup software
> that suddenly stopped working. Or in some case, to install backup software
> instead of using the hand-rolled scripts. So there were some good
> side-effects specifically to breaking it as well.

I feel like there's some confusion here- just to clear things up, I
wasn't suggesting that we wouldn't include the capability, just that we
should be open to changing the interface/configuration based on what
makes sense and not, necessarily, insist on perfect backwards
compatibility.  Seems everyone else has come out in support of that as
well at this point and so I don't think there's much more to say here.

The original complaint I had made was that it felt like folks were
pushing hard on backwards compatibility for the sake of it and I was
just trying to make sure it's clear that we can, and do, break backwards
compatibility sometimes and the bar to clear isn't necessarily all that
high, though of course we should be gaining something if we do decide to
make such a change.

Thanks,

Stephen

Attachment

signature.asc

Re: parallelizing the archiver

From

Robert Haas

Date:

25 October 2021, 17:01:13

On Sun, Oct 24, 2021 at 2:15 AM Bossart, Nathan <bossartn@amazon.com> wrote:
> Here is an attempt at doing this.  Archive modules are expected to
> declare _PG_archive_module_init(), which can define GUCs, register
> background workers, etc.  This function must at least define the
> archive callbacks.  For now, I've introduced two callbacks.  The first
> is for checking that the archive module is configured, and the second
> contains the actual archiving logic.

I don't see why this patch should need to make any changes to
internal_load_library(), PostmasterMain(), SubPostmasterMain(), or any
other central point of control, and I don't think it should.
pgarch_archiveXlog() can just load the library at the time it's
needed. That way it only gets loaded in the archiver process, and the
required changes are much more localized. Like instead of asserting
that the functions are initialized, just
load_external_function(libname, "_PG_archive_module_init") and call it
if they aren't.

I think the attempt in check_archive_command()/check_archive_library()
to force exactly one of those two to be set is not going to work well
and should be removed. In general, GUCs whose legal values depend on
the values of other GUCs don't end up working out well. I think what
should happen instead is that if archive_library=shell then
archive_command does whatever it does; otherwise archive_command is
without effect.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: parallelizing the archiver

From

"Bossart, Nathan"

Date:

25 October 2021, 17:14:26

On 10/25/21, 10:02 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:
> I don't see why this patch should need to make any changes to
> internal_load_library(), PostmasterMain(), SubPostmasterMain(), or any
> other central point of control, and I don't think it should.
> pgarch_archiveXlog() can just load the library at the time it's
> needed. That way it only gets loaded in the archiver process, and the
> required changes are much more localized. Like instead of asserting
> that the functions are initialized, just
> load_external_function(libname, "_PG_archive_module_init") and call it
> if they aren't.

IIUC this would mean that archive modules that need to define GUCs or
register background workers would have to separately define a
_PG_init() and be loaded via shared_preload_libraries in addition to
archive_library.  That doesn't seem too terrible to me, but it was
something I was trying to avoid.

> I think the attempt in check_archive_command()/check_archive_library()
> to force exactly one of those two to be set is not going to work well
> and should be removed. In general, GUCs whose legal values depend on
> the values of other GUCs don't end up working out well. I think what
> should happen instead is that if archive_library=shell then
> archive_command does whatever it does; otherwise archive_command is
> without effect.

I'm fine with this approach.  I'll go this route in the next revision.

Nathan

Re: parallelizing the archiver

From

Robert Haas

Date:

25 October 2021, 17:17:46

On Mon, Oct 25, 2021 at 1:14 PM Bossart, Nathan <bossartn@amazon.com> wrote:
> IIUC this would mean that archive modules that need to define GUCs or
> register background workers would have to separately define a
> _PG_init() and be loaded via shared_preload_libraries in addition to
> archive_library.  That doesn't seem too terrible to me, but it was
> something I was trying to avoid.

Hmm. That doesn't seem like a terrible goal, but I think we should try
to find some way of achieving it that looks tidier than this does.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: parallelizing the archiver

From

"Bossart, Nathan"

Date:

25 October 2021, 17:48:34

On 10/25/21, 10:18 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:
> On Mon, Oct 25, 2021 at 1:14 PM Bossart, Nathan <bossartn@amazon.com> wrote:
>> IIUC this would mean that archive modules that need to define GUCs or
>> register background workers would have to separately define a
>> _PG_init() and be loaded via shared_preload_libraries in addition to
>> archive_library.  That doesn't seem too terrible to me, but it was
>> something I was trying to avoid.
>
> Hmm. That doesn't seem like a terrible goal, but I think we should try
> to find some way of achieving it that looks tidier than this does.

We could just treat archive_library as if it is tacked onto the
shared_preload_libraries list.  I think I can make that look
relatively tidy.

Nathan

Re: parallelizing the archiver

From

"Bossart, Nathan"

Date:

25 October 2021, 19:45:21

On 10/25/21, 10:50 AM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
> On 10/25/21, 10:18 AM, "Robert Haas" <robertmhaas@gmail.com> wrote:
>> On Mon, Oct 25, 2021 at 1:14 PM Bossart, Nathan <bossartn@amazon.com> wrote:
>>> IIUC this would mean that archive modules that need to define GUCs or
>>> register background workers would have to separately define a
>>> _PG_init() and be loaded via shared_preload_libraries in addition to
>>> archive_library.  That doesn't seem too terrible to me, but it was
>>> something I was trying to avoid.
>>
>> Hmm. That doesn't seem like a terrible goal, but I think we should try
>> to find some way of achieving it that looks tidier than this does.
>
> We could just treat archive_library as if it is tacked onto the
> shared_preload_libraries list.  I think I can make that look
> relatively tidy.

Alright, here is an attempt at that.  With this revision, archive
libraries are preloaded (and _PG_init() is called), and the archiver
is responsible for calling _PG_archive_module_init() to get the
callbacks.  I've also removed the GUC check hooks as previously
discussed.

Nathan

Attachment

Re: parallelizing the archiver

From

Robert Haas

Date:

25 October 2021, 20:29:04

On Mon, Oct 25, 2021 at 3:45 PM Bossart, Nathan <bossartn@amazon.com> wrote:
> Alright, here is an attempt at that.  With this revision, archive
> libraries are preloaded (and _PG_init() is called), and the archiver
> is responsible for calling _PG_archive_module_init() to get the
> callbacks.  I've also removed the GUC check hooks as previously
> discussed.

I would need to spend more time on this to have a detailed opinion on
all of it, but I agree that part looks better this way.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: parallelizing the archiver

From

"Bossart, Nathan"

Date:

25 October 2021, 20:38:44

On 10/25/21, 1:29 PM, "Robert Haas" <robertmhaas@gmail.com> wrote:
> On Mon, Oct 25, 2021 at 3:45 PM Bossart, Nathan <bossartn@amazon.com> wrote:
>> Alright, here is an attempt at that.  With this revision, archive
>> libraries are preloaded (and _PG_init() is called), and the archiver
>> is responsible for calling _PG_archive_module_init() to get the
>> callbacks.  I've also removed the GUC check hooks as previously
>> discussed.
>
> I would need to spend more time on this to have a detailed opinion on
> all of it, but I agree that part looks better this way.

Great.  Unless I see additional feedback on the basic design shortly,
I'll give the documentation updates a try.

Nathan

Re: parallelizing the archiver

From

"Bossart, Nathan"

Date:

27 October 2021, 04:10:07

On 10/25/21, 1:41 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
> Great.  Unless I see additional feedback on the basic design shortly,
> I'll give the documentation updates a try.

Okay, here is a more complete patch with a first attempt at the
documentation changes.  I tried to keep the changes to the existing
docs as minimal as possible, and then I added a new chapter that
describes what goes into creating an archive module.  Separately, I
simplified the basic_archive module, moved it to src/test/modules,
and added a simple test.  My goal is for this to serve as a basic
example and to provide some test coverage on the new infrastructure.

Nathan

Attachment

v7-0001-Introduce-archive-module-infrastructure.patch

Re: parallelizing the archiver

From

Stephen Frost

Date:

01 November 2021, 17:57:26

Greetings,

* Bossart, Nathan (bossartn@amazon.com) wrote:
> On 10/25/21, 1:41 PM, "Bossart, Nathan" <bossartn@amazon.com> wrote:
> > Great.  Unless I see additional feedback on the basic design shortly,
> > I'll give the documentation updates a try.
>
> Okay, here is a more complete patch with a first attempt at the
> documentation changes.  I tried to keep the changes to the existing
> docs as minimal as possible, and then I added a new chapter that
> describes what goes into creating an archive module.  Separately, I
> simplified the basic_archive module, moved it to src/test/modules,
> and added a simple test.  My goal is for this to serve as a basic
> example and to provide some test coverage on the new infrastructure.

Definitely interested and plan to look at this more shortly, and
generally this all sounds good, but maybe we should have it be posted
under a new thread as it's moved pretty far from the subject and folks
might not appreciate what this is about at this point..?

Thanks,

Stephen

Attachment

signature.asc

Re: parallelizing the archiver

From

"Bossart, Nathan"

Date:

01 November 2021, 18:56:44

On 11/1/21, 10:57 AM, "Stephen Frost" <sfrost@snowman.net> wrote:
> Definitely interested and plan to look at this more shortly, and
> generally this all sounds good, but maybe we should have it be posted
> under a new thread as it's moved pretty far from the subject and folks
> might not appreciate what this is about at this point..?

Done: https://postgr.es/m/668D2428-F73B-475E-87AE-F89D67942270%40amazon.com

Looking forward to your feedback.

Nathan