Thread: WIP/PoC for parallel backup

WIP/PoC for parallel backup

From

Asif Rehman

Date:

21 August 2019, 13:47:04

Hi Hackers,

I have been looking into adding parallel backup feature in pg_basebackup. Currently pg_basebackup sends BASE_BACKUP command for taking full backup, server scans the PGDATA and sends the files to pg_basebackup. In general, server takes the following steps on BASE_BACKUP command:

- do pg_start_backup

- scans PGDATA, creates and send header containing information of tablespaces.

- sends each tablespace to pg_basebackup.

- and then do pg_stop_backup

All these steps are executed sequentially by a single process. The idea I am working on is to separate these steps into multiple commands in replication grammer. Add worker processes to the pg_basebackup where they can copy the contents of PGDATA in parallel.

The command line interface syntax would be like:

pg_basebackup --jobs=WORKERS

Replication commands:

- BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA

If the parallel option is there, then it will only do pg_start_backup, scans PGDATA and sends a list of file names.

- SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given list.

pg_basebackup will then send back a list of filenames in this command. This commands will be send by each worker and that worker will be getting the said files.

- STOP_BACKUP

when all workers finish then, pg_basebackup will send STOP_BACKUP command.

The pg_basebackup can start by sending "BASE_BACKUP PARALLEL" command and getting a list of filenames from the server in response. It should then divide this list as per --jobs parameter. (This division can be based on file sizes). Each of the worker process will issue a SEND_FILES_CONTENTS (file1, file2,...) command. In response, the server will send the files mentioned in the list back to the requesting worker process.

Once all the files are copied, then pg_basebackup will send the STOP_BACKUP command. Similar idea has been been discussed by Robert, on the incremental backup thread a while ago. This is similar to that but instead of START_BACKUP and SEND_FILE_LIST, I have combined them into BASE_BACKUP PARALLEL.

I have done a basic proof of concenpt (POC), which is also attached. I would appreciate some input on this. So far, I am simply dividing the list equally and assigning them to worker processes. I intend to fine tune this by taking into consideration file sizes. Further to add tar format support, I am considering that each worker process, processes all files belonging to a tablespace in its list (i.e. creates and copies tar file), before it processes the next tablespace. As a result, this will create tar files that are disjointed with respect tablespace data. For example:

Say, tablespace t1 has 20 files and we have 5 worker processes and tablespace t2 has 10. Ignoring all other factors for the sake of this example, each worker process will get a group of 4 files of t1 and 2 files of t2. Each process will create 2 tar files, one for t1 containing 4 files and another for t2 containing 2 files.

Regards,

Asif

Attachment

0001-Initial-POC-on-parallel-backup.patch

Re: WIP/PoC for parallel backup

From

Asim R P

Date:

23 August 2019, 10:17:51

Hi Asif

Interesting proposal. Bulk of the work in a backup is transferring files from source data directory to destination. Your patch is breaking this task down in multiple sets of files and transferring each set in parallel. This seems correct, however, your patch is also creating a new process to handle each set. Is that necessary? I think we should try to achieve this using multiple asynchronous libpq connections from a single basebackup process. That is to use PQconnectStartParams() interface instead of PQconnectdbParams(), wich is currently used by basebackup. On the server side, it may still result in multiple backend processes per connection, and an attempt should be made to avoid that as well, but it seems complicated.

What do you think?

Asim

Re: WIP/PoC for parallel backup

From

Ibrar Ahmed

Date:

23 August 2019, 13:03:10

On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote:

Hi Asif

Interesting proposal. Bulk of the work in a backup is transferring files from source data directory to destination. Your patch is breaking this task down in multiple sets of files and transferring each set in parallel. This seems correct, however, your patch is also creating a new process to handle each set. Is that necessary? I think we should try to achieve this using multiple asynchronous libpq connections from a single basebackup process. That is to use PQconnectStartParams() interface instead of PQconnectdbParams(), wich is currently used by basebackup. On the server side, it may still result in multiple backend processes per connection, and an attempt should be made to avoid that as well, but it seems complicated.

What do you think?

The main question is what we really want to solve here. What is the

bottleneck? and which HW want to saturate?. Why I am saying that because

there are multiple H/W involve while taking the backup (Network/CPU/Disk). If we

already saturated the disk then there is no need to add parallelism because

we will be blocked on disk I/O anyway. I implemented the parallel backup in a sperate

application and has wonderful results. I just skim through the code and have

some reservation that creating a separate process only for copying data is overkill.

There are two options, one is non-blocking calls or you can have some worker threads.

But before doing that need to see the pg_basebackup bottleneck, after that, we

can see what is the best way to solve that. Some numbers may help to understand the

actual benefit.

Ibrar Ahmed

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

23 August 2019, 16:04:07

On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote:

Hi Asif

Interesting proposal. Bulk of the work in a backup is transferring files from source data directory to destination. Your patch is breaking this task down in multiple sets of files and transferring each set in parallel. This seems correct, however, your patch is also creating a new process to handle each set. Is that necessary? I think we should try to achieve this using multiple asynchronous libpq connections from a single basebackup process. That is to use PQconnectStartParams() interface instead of PQconnectdbParams(), wich is currently used by basebackup. On the server side, it may still result in multiple backend processes per connection, and an attempt should be made to avoid that as well, but it seems complicated.

What do you think?

Asim

Thanks Asim for the feedback. This is a good suggestion. The main idea I wanted to discuss is the design where we can open multiple backend connections to get the data instead of a single connection.

On the client side we can have multiple approaches, One is to use asynchronous APIs ( as suggested by you) and other could be to decide between multi-process and multi-thread. The main point was we can extract lot of performance benefit by using the multiple connections and I built this POC to float the idea of how the parallel backup can work, since the core logic of getting the files using multiple connections will remain the same, wether we use asynchronous, multi-process or multi-threaded.

I am going to address the division of files to be distributed evenly among multiple workers based on file sizes, that would allow to get some concrete numbers as well as it will also us to gauge some benefits between async and multiprocess/thread approach on client side.

Regards,

Asif

Re: WIP/PoC for parallel backup

From

Stephen Frost

Date:

23 August 2019, 17:26:38

Greetings,

* Asif Rehman (asifr.rehman@gmail.com) wrote:
> On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote:
> > Interesting proposal.  Bulk of the work in a backup is transferring files
> > from source data directory to destination.  Your patch is breaking this
> > task down in multiple sets of files and transferring each set in parallel.
> > This seems correct, however, your patch is also creating a new process to
> > handle each set.  Is that necessary?  I think we should try to achieve this
> > using multiple asynchronous libpq connections from a single basebackup
> > process.  That is to use PQconnectStartParams() interface instead of
> > PQconnectdbParams(), wich is currently used by basebackup.  On the server
> > side, it may still result in multiple backend processes per connection, and
> > an attempt should be made to avoid that as well, but it seems complicated.
>
> Thanks Asim for the feedback. This is a good suggestion. The main idea I
> wanted to discuss is the design where we can open multiple backend
> connections to get the data instead of a single connection.
> On the client side we can have multiple approaches, One is to use
> asynchronous APIs ( as suggested by you) and other could be to decide
> between multi-process and multi-thread. The main point was we can extract
> lot of performance benefit by using the multiple connections and I built
> this POC to float the idea of how the parallel backup can work, since the
> core logic of getting the files using multiple connections will remain the
> same, wether we use asynchronous, multi-process or multi-threaded.
>
> I am going to address the division of files to be distributed evenly among
> multiple workers based on file sizes, that would allow to get some concrete
> numbers as well as it will also us to gauge some benefits between async and
> multiprocess/thread approach on client side.

I would expect you to quickly want to support compression on the server
side, before the data is sent across the network, and possibly
encryption, and so it'd likely make sense to just have independent
processes and connections through which to do that.

Thanks,

Stephen

Attachment

signature.asc

Re: WIP/PoC for parallel backup

From

Ibrar Ahmed

Date:

23 August 2019, 17:50:09

On Fri, Aug 23, 2019 at 10:26 PM Stephen Frost <sfrost@snowman.net> wrote:

Greetings,

* Asif Rehman (asifr.rehman@gmail.com) wrote:
> On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote:
> > Interesting proposal. Bulk of the work in a backup is transferring files
> > from source data directory to destination. Your patch is breaking this
> > task down in multiple sets of files and transferring each set in parallel.
> > This seems correct, however, your patch is also creating a new process to
> > handle each set. Is that necessary? I think we should try to achieve this
> > using multiple asynchronous libpq connections from a single basebackup
> > process. That is to use PQconnectStartParams() interface instead of
> > PQconnectdbParams(), wich is currently used by basebackup. On the server
> > side, it may still result in multiple backend processes per connection, and
> > an attempt should be made to avoid that as well, but it seems complicated.
>
> Thanks Asim for the feedback. This is a good suggestion. The main idea I
> wanted to discuss is the design where we can open multiple backend
> connections to get the data instead of a single connection.
> On the client side we can have multiple approaches, One is to use
> asynchronous APIs ( as suggested by you) and other could be to decide
> between multi-process and multi-thread. The main point was we can extract
> lot of performance benefit by using the multiple connections and I built
> this POC to float the idea of how the parallel backup can work, since the
> core logic of getting the files using multiple connections will remain the
> same, wether we use asynchronous, multi-process or multi-threaded.
>
> I am going to address the division of files to be distributed evenly among
> multiple workers based on file sizes, that would allow to get some concrete
> numbers as well as it will also us to gauge some benefits between async and
> multiprocess/thread approach on client side.

I would expect you to quickly want to support compression on the server
side, before the data is sent across the network, and possibly
encryption, and so it'd likely make sense to just have independent
processes and connections through which to do that.

+1 for compression and encryption, but I think parallelism will give us

the benefit with and without the compression.

Thanks,

Stephen

Ibrar Ahmed

Re: WIP/PoC for parallel backup

From

Ahsan Hadi

Date:

23 August 2019, 19:15:32

On Fri, 23 Aug 2019 at 10:26 PM, Stephen Frost <sfrost@snowman.net> wrote:

Greetings,

* Asif Rehman (asifr.rehman@gmail.com) wrote:
> On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote:
> > Interesting proposal. Bulk of the work in a backup is transferring files
> > from source data directory to destination. Your patch is breaking this
> > task down in multiple sets of files and transferring each set in parallel.
> > This seems correct, however, your patch is also creating a new process to
> > handle each set. Is that necessary? I think we should try to achieve this
> > using multiple asynchronous libpq connections from a single basebackup
> > process. That is to use PQconnectStartParams() interface instead of
> > PQconnectdbParams(), wich is currently used by basebackup. On the server
> > side, it may still result in multiple backend processes per connection, and
> > an attempt should be made to avoid that as well, but it seems complicated.
>
> Thanks Asim for the feedback. This is a good suggestion. The main idea I
> wanted to discuss is the design where we can open multiple backend
> connections to get the data instead of a single connection.
> On the client side we can have multiple approaches, One is to use
> asynchronous APIs ( as suggested by you) and other could be to decide
> between multi-process and multi-thread. The main point was we can extract
> lot of performance benefit by using the multiple connections and I built
> this POC to float the idea of how the parallel backup can work, since the
> core logic of getting the files using multiple connections will remain the
> same, wether we use asynchronous, multi-process or multi-threaded.
>
> I am going to address the division of files to be distributed evenly among
> multiple workers based on file sizes, that would allow to get some concrete
> numbers as well as it will also us to gauge some benefits between async and
> multiprocess/thread approach on client side.

I would expect you to quickly want to support compression on the server
side, before the data is sent across the network, and possibly
encryption, and so it'd likely make sense to just have independent
processes and connections through which to do that.

It would be interesting to see the benefits of compression (before the data is transferred over the network) on top of parallelism. Since there is also some overhead associated with performing the compression. I agree with your suggestion of trying to add parallelism first and then try compression before the data is sent across the network.

Thanks,

Stephen

Re: WIP/PoC for parallel backup

From

Stephen Frost

Date:

23 August 2019, 19:42:54

Greetings,

* Ahsan Hadi (ahsan.hadi@gmail.com) wrote:
> On Fri, 23 Aug 2019 at 10:26 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > I would expect you to quickly want to support compression on the server
> > side, before the data is sent across the network, and possibly
> > encryption, and so it'd likely make sense to just have independent
> > processes and connections through which to do that.
>
> It would be interesting to see the benefits of compression (before the data
> is transferred over the network) on top of parallelism. Since there is also
> some overhead associated with performing the compression. I agree with your
> suggestion of trying to add parallelism first and then try compression
> before the data is sent across the network.

You're welcome to take a look at pgbackrest for insight and to play with
regarding compression-before-transfer, how best to split up the files
and order them, encryption, et al.  We've put quite a bit of effort into
figuring all of that out.

Thanks!

Stephen

Attachment

signature.asc

Re: WIP/PoC for parallel backup

From

Robert Haas

Date:

24 September 2019, 17:53:03

On Wed, Aug 21, 2019 at 9:53 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> - BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA
> If the parallel option is there, then it will only do pg_start_backup, scans PGDATA and sends a list of file names.

So IIUC, this would mean that BASE_BACKUP without PARALLEL returns
tarfiles, and BASE_BACKUP with PARALLEL returns a result set with a
list of file names. I don't think that's a good approach. It's too
confusing to have one replication command that returns totally
different things depending on whether some option is given.

> - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given list.
> pg_basebackup will then send back a list of filenames in this command. This commands will be send by each worker and
thatworker will be getting the said files. 

Seems reasonable, but I think you should just pass one file name and
use the command multiple times, once per file.

> - STOP_BACKUP
> when all workers finish then, pg_basebackup will send STOP_BACKUP command.

This also seems reasonable, but surely the matching command should
then be called START_BACKUP, not BASEBACKUP PARALLEL.

> I have done a basic proof of concenpt (POC), which is also attached. I would appreciate some input on this. So far, I
amsimply dividing the list equally and assigning them to worker processes. I intend to fine tune this by taking into
considerationfile sizes. Further to add tar format support, I am considering that each worker process, processes all
filesbelonging to a tablespace in its list (i.e. creates and copies tar file), before it processes the next tablespace.
Asa result, this will create tar files that are disjointed with respect tablespace data. For example: 

Instead of doing this, I suggest that you should just maintain a list
of all the files that need to be fetched and have each worker pull a
file from the head of the list and fetch it when it finishes receiving
the previous file.  That way, if some connections go faster or slower
than others, the distribution of work ends up fairly even.  If you
instead pre-distribute the work, you're guessing what's going to
happen in the future instead of just waiting to see what actually does
happen. Guessing isn't intrinsically bad, but guessing when you could
be sure of doing the right thing *is* bad.

If you want to be really fancy, you could start by sorting the files
in descending order of size, so that big files are fetched before
small ones.  Since the largest possible file is 1GB and any database
where this feature is important is probably hundreds or thousands of
GB, this may not be very important. I suggest not worrying about it
for v1.

> Say, tablespace t1 has 20 files and we have 5 worker processes and tablespace t2 has 10. Ignoring all other factors
forthe sake of this example, each worker process will get a group of 4 files of t1 and 2 files of t2. Each process will
create2 tar files, one for t1 containing 4 files and another for t2 containing 2 files. 

This is one of several possible approaches. If we're doing a
plain-format backup in parallel, we can just write each file where it
needs to go and call it good. But, with a tar-format backup, what
should we do? I can see three options:

1. Error! Tar format parallel backups are not supported.

2. Write multiple tar files. The user might reasonably expect that
they're going to end up with the same files at the end of the backup
regardless of whether they do it in parallel. A user with this
expectation will be disappointed.

3. Write one tar file. In this design, the workers have to take turns
writing to the tar file, so you need some synchronization around that.
Perhaps you'd have N threads that read and buffer a file, and N+1
buffers.  Then you have one additional thread that reads the complete
files from the buffers and writes them to the tar file. There's
obviously some possibility that the writer won't be able to keep up
and writing the backup will therefore be slower than it would be with
approach (2).

There's probably also a possibility that approach (2) would thrash the
disk head back and forth between multiple files that are all being
written at the same time, and approach (3) will therefore win by not
thrashing the disk head. But, since spinning media are becoming less
and less popular and are likely to have multiple disk heads under the
hood when they are used, this is probably not too likely.

I think your choice to go with approach (2) is probably reasonable,
but I'm not sure whether everyone will agree.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

27 September 2019, 16:00:01

Hi Robert,

Thanks for the feedback. Please see the comments below:

On Tue, Sep 24, 2019 at 10:53 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Aug 21, 2019 at 9:53 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> - BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA
> If the parallel option is there, then it will only do pg_start_backup, scans PGDATA and sends a list of file names.

So IIUC, this would mean that BASE_BACKUP without PARALLEL returns
tarfiles, and BASE_BACKUP with PARALLEL returns a result set with a
list of file names. I don't think that's a good approach. It's too
confusing to have one replication command that returns totally
different things depending on whether some option is given.

Sure. I will add a separate command (START_BACKUP) for parallel.

> - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given list.
> pg_basebackup will then send back a list of filenames in this command. This commands will be send by each worker and that worker will be getting the said files.

Seems reasonable, but I think you should just pass one file name and
use the command multiple times, once per file.

I considered this approach initially, however, I adopted the current strategy to avoid multiple round trips between the server and clients and save on query processing time by issuing a single command rather than multiple ones. Further fetching multiple files at once will also aid in supporting the tar format by utilising the existing ReceiveTarFile() function and will be able to create a tarball for per tablespace per worker.

> - STOP_BACKUP
> when all workers finish then, pg_basebackup will send STOP_BACKUP command.

This also seems reasonable, but surely the matching command should
then be called START_BACKUP, not BASEBACKUP PARALLEL.

> I have done a basic proof of concenpt (POC), which is also attached. I would appreciate some input on this. So far, I am simply dividing the list equally and assigning them to worker processes. I intend to fine tune this by taking into consideration file sizes. Further to add tar format support, I am considering that each worker process, processes all files belonging to a tablespace in its list (i.e. creates and copies tar file), before it processes the next tablespace. As a result, this will create tar files that are disjointed with respect tablespace data. For example:

Instead of doing this, I suggest that you should just maintain a list
of all the files that need to be fetched and have each worker pull a
file from the head of the list and fetch it when it finishes receiving
the previous file. That way, if some connections go faster or slower
than others, the distribution of work ends up fairly even. If you
instead pre-distribute the work, you're guessing what's going to
happen in the future instead of just waiting to see what actually does
happen. Guessing isn't intrinsically bad, but guessing when you could
be sure of doing the right thing *is* bad.

If you want to be really fancy, you could start by sorting the files
in descending order of size, so that big files are fetched before
small ones. Since the largest possible file is 1GB and any database
where this feature is important is probably hundreds or thousands of
GB, this may not be very important. I suggest not worrying about it
for v1.

Ideally, I would like to support the tar format as well, which would be much easier to implement when fetching multiple files at once since that would enable using the existent functionality to be used without much change.

Your idea of sorting the files in descending order of size seems very appealing. I think we can do this and have the file divided among the workers one by one i.e. the first file in the list goes to worker 1, the second to process 2, and so on and so forth.

> Say, tablespace t1 has 20 files and we have 5 worker processes and tablespace t2 has 10. Ignoring all other factors for the sake of this example, each worker process will get a group of 4 files of t1 and 2 files of t2. Each process will create 2 tar files, one for t1 containing 4 files and another for t2 containing 2 files.

This is one of several possible approaches. If we're doing a
plain-format backup in parallel, we can just write each file where it
needs to go and call it good. But, with a tar-format backup, what
should we do? I can see three options:

1. Error! Tar format parallel backups are not supported.

2. Write multiple tar files. The user might reasonably expect that
they're going to end up with the same files at the end of the backup
regardless of whether they do it in parallel. A user with this
expectation will be disappointed.

3. Write one tar file. In this design, the workers have to take turns
writing to the tar file, so you need some synchronization around that.
Perhaps you'd have N threads that read and buffer a file, and N+1
buffers. Then you have one additional thread that reads the complete
files from the buffers and writes them to the tar file. There's
obviously some possibility that the writer won't be able to keep up
and writing the backup will therefore be slower than it would be with
approach (2).

There's probably also a possibility that approach (2) would thrash the
disk head back and forth between multiple files that are all being
written at the same time, and approach (3) will therefore win by not
thrashing the disk head. But, since spinning media are becoming less
and less popular and are likely to have multiple disk heads under the
hood when they are used, this is probably not too likely.

I think your choice to go with approach (2) is probably reasonable,
but I'm not sure whether everyone will agree.

Yes for the tar format support, approach (2) is what I had in mind. Currently I'm working on the implementation and will share the patch in a couple of days.

Asif Rehman

Highgo Software (Canada/China/Pakistan)

URL : www.highgo.ca

Re: WIP/PoC for parallel backup

From

Jeevan Chalke

Date:

03 October 2019, 11:47:31

Hi Asif,

I was looking at the patch and tried comipling it. However, got few errors and warnings.

Fixed those in the attached patch.

On Fri, Sep 27, 2019 at 9:30 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

Hi Robert,

Thanks for the feedback. Please see the comments below:

On Tue, Sep 24, 2019 at 10:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Aug 21, 2019 at 9:53 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> - BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA
> If the parallel option is there, then it will only do pg_start_backup, scans PGDATA and sends a list of file names.

So IIUC, this would mean that BASE_BACKUP without PARALLEL returns
tarfiles, and BASE_BACKUP with PARALLEL returns a result set with a
list of file names. I don't think that's a good approach. It's too
confusing to have one replication command that returns totally
different things depending on whether some option is given.

Sure. I will add a separate command (START_BACKUP) for parallel.

> - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given list.
> pg_basebackup will then send back a list of filenames in this command. This commands will be send by each worker and that worker will be getting the said files.

Seems reasonable, but I think you should just pass one file name and
use the command multiple times, once per file.

I considered this approach initially, however, I adopted the current strategy to avoid multiple round trips between the server and clients and save on query processing time by issuing a single command rather than multiple ones. Further fetching multiple files at once will also aid in supporting the tar format by utilising the existing ReceiveTarFile() function and will be able to create a tarball for per tablespace per worker.

> - STOP_BACKUP
> when all workers finish then, pg_basebackup will send STOP_BACKUP command.

This also seems reasonable, but surely the matching command should
then be called START_BACKUP, not BASEBACKUP PARALLEL.

> I have done a basic proof of concenpt (POC), which is also attached. I would appreciate some input on this. So far, I am simply dividing the list equally and assigning them to worker processes. I intend to fine tune this by taking into consideration file sizes. Further to add tar format support, I am considering that each worker process, processes all files belonging to a tablespace in its list (i.e. creates and copies tar file), before it processes the next tablespace. As a result, this will create tar files that are disjointed with respect tablespace data. For example:

Instead of doing this, I suggest that you should just maintain a list
of all the files that need to be fetched and have each worker pull a
file from the head of the list and fetch it when it finishes receiving
the previous file. That way, if some connections go faster or slower
than others, the distribution of work ends up fairly even. If you
instead pre-distribute the work, you're guessing what's going to
happen in the future instead of just waiting to see what actually does
happen. Guessing isn't intrinsically bad, but guessing when you could
be sure of doing the right thing *is* bad.

If you want to be really fancy, you could start by sorting the files
in descending order of size, so that big files are fetched before
small ones. Since the largest possible file is 1GB and any database
where this feature is important is probably hundreds or thousands of
GB, this may not be very important. I suggest not worrying about it
for v1.

Ideally, I would like to support the tar format as well, which would be much easier to implement when fetching multiple files at once since that would enable using the existent functionality to be used without much change.

Your idea of sorting the files in descending order of size seems very appealing. I think we can do this and have the file divided among the workers one by one i.e. the first file in the list goes to worker 1, the second to process 2, and so on and so forth.

> Say, tablespace t1 has 20 files and we have 5 worker processes and tablespace t2 has 10. Ignoring all other factors for the sake of this example, each worker process will get a group of 4 files of t1 and 2 files of t2. Each process will create 2 tar files, one for t1 containing 4 files and another for t2 containing 2 files.

This is one of several possible approaches. If we're doing a
plain-format backup in parallel, we can just write each file where it
needs to go and call it good. But, with a tar-format backup, what
should we do? I can see three options:

1. Error! Tar format parallel backups are not supported.

2. Write multiple tar files. The user might reasonably expect that
they're going to end up with the same files at the end of the backup
regardless of whether they do it in parallel. A user with this
expectation will be disappointed.

3. Write one tar file. In this design, the workers have to take turns
writing to the tar file, so you need some synchronization around that.
Perhaps you'd have N threads that read and buffer a file, and N+1
buffers. Then you have one additional thread that reads the complete
files from the buffers and writes them to the tar file. There's
obviously some possibility that the writer won't be able to keep up
and writing the backup will therefore be slower than it would be with
approach (2).

There's probably also a possibility that approach (2) would thrash the
disk head back and forth between multiple files that are all being
written at the same time, and approach (3) will therefore win by not
thrashing the disk head. But, since spinning media are becoming less
and less popular and are likely to have multiple disk heads under the
hood when they are used, this is probably not too likely.

I think your choice to go with approach (2) is probably reasonable,
but I'm not sure whether everyone will agree.

Yes for the tar format support, approach (2) is what I had in mind. Currently I'm working on the implementation and will share the patch in a couple of days.

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachment

0001-Initial-POC-on-parallel-backup_fix_errors_warnings_delta.patch

Re: WIP/PoC for parallel backup

From

Robert Haas

Date:

03 October 2019, 13:39:56

On Fri, Sep 27, 2019 at 12:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> > - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given list.
>> > pg_basebackup will then send back a list of filenames in this command. This commands will be send by each worker
andthat worker will be getting the said files. 
>>
>> Seems reasonable, but I think you should just pass one file name and
>> use the command multiple times, once per file.
>
> I considered this approach initially,  however, I adopted the current strategy to avoid multiple round trips between
theserver and clients and save on query processing time by issuing a single command rather than multiple ones. Further
fetchingmultiple files at once will also aid in supporting the tar format by utilising the existing ReceiveTarFile()
functionand will be able to create a tarball for per tablespace per worker. 

I think that sending multiple filenames on a line could save some time
when there are lots of very small files, because then the round-trip
overhead could be significant.

However, if you've got mostly big files, I think this is going to be a
loser. It'll be fine if you're able to divide the work exactly evenly,
but that's pretty hard to do, because some workers may succeed in
copying the data faster than others for a variety of reasons: some
data is in memory, some data has to be read from disk, different data
may need to be read from different disks that run at different speeds,
not all the network connections may run at the same speed. Remember
that the backup's not done until the last worker finishes, and so
there may well be a significant advantage in terms of overall speed in
putting some energy into making sure that they finish as close to each
other in time as possible.

To put that another way, the first time all the workers except one get
done while the last one still has 10GB of data to copy, somebody's
going to be unhappy.

> Ideally, I would like to support the tar format as well, which would be much easier to implement when fetching
multiplefiles at once since that would enable using the existent functionality to be used without much change. 

I think we should just have the client generate the tarfile. It'll
require duplicating some code, but it's not actually that much code or
that complicated from what I can see.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

04 October 2019, 11:01:55

On Thu, Oct 3, 2019 at 6:40 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Sep 27, 2019 at 12:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> > - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given list.
>> > pg_basebackup will then send back a list of filenames in this command. This commands will be send by each worker and that worker will be getting the said files.
>>
>> Seems reasonable, but I think you should just pass one file name and
>> use the command multiple times, once per file.
>
> I considered this approach initially, however, I adopted the current strategy to avoid multiple round trips between the server and clients and save on query processing time by issuing a single command rather than multiple ones. Further fetching multiple files at once will also aid in supporting the tar format by utilising the existing ReceiveTarFile() function and will be able to create a tarball for per tablespace per worker.

I think that sending multiple filenames on a line could save some time
when there are lots of very small files, because then the round-trip
overhead could be significant.

However, if you've got mostly big files, I think this is going to be a
loser. It'll be fine if you're able to divide the work exactly evenly,
but that's pretty hard to do, because some workers may succeed in
copying the data faster than others for a variety of reasons: some
data is in memory, some data has to be read from disk, different data
may need to be read from different disks that run at different speeds,
not all the network connections may run at the same speed. Remember
that the backup's not done until the last worker finishes, and so
there may well be a significant advantage in terms of overall speed in
putting some energy into making sure that they finish as close to each
other in time as possible.

To put that another way, the first time all the workers except one get
done while the last one still has 10GB of data to copy, somebody's
going to be unhappy.

I have updated the patch (see the attached patch) to include tablespace support, tar format support and all other backup base backup options to work in parallel mode as well. As previously suggested, I have removed BASE_BACKUP [PARALLEL] and have added START_BACKUP instead to start the backup. The tar format will write multiple tar files depending upon the number of workers specified. Also made all commands (START_BACKUP/SEND_FILES_CONTENT/STOP_BACKUP) to accept the base_backup_opt_list. This way the command-line options can also be provided to these commands. Since the command-line options don't change once the backup initiates, I went this way instead of storing them in shared state.

The START_BACKUP command will now return a sorted list of files in descending order based on file sizes. This way, the larger files will be on top of the list. hence these files will be assigned to workers one by one, making it so that the larger files will be copied before other files.

Based on my understanding your main concern is that the files won't be distributed fairly i.e one worker might get a big file and take more time while others get done early with smaller files? In this approach I have created a list of files in descending order based on there sizes so all the big size files will come at the top. The maximum file size in PG is 1GB so if we have four workers who are picking up file from the list one by one, the worst case scenario is that one worker gets a file of 1GB to process while others get files of smaller size. However with this approach of descending files based on size and handing it out to workers one by one, there is a very high likelihood of workers getting work evenly. does this address your concerns?

Furthermore the patch also includes the regression test. As t/010_pg_basebackup.pl test-case is testing base backup comprehensively, so I have duplicated it to "t/040_pg_basebackup_parallel.pl" and added parallel option in all of its tests, to make sure parallel mode works expectantly. The one thing that differs from base backup is the file checksum reporting. In parallel mode, the total number of checksum failures are not reported correctly however it will abort the backup whenever a checksum failure occurs. This is because processes are not maintaining any shared state. I assume that it's not much important to report total number of failures vs noticing the failure and aborting.

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Attachment

0001-parallel-backup.patch

Re: WIP/PoC for parallel backup

From

Robert Haas

Date:

04 October 2019, 12:07:42

On Fri, Oct 4, 2019 at 7:02 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> Based on my understanding your main concern is that the files won't be distributed fairly i.e one worker might get a
bigfile and take more time while others get done early with smaller files? In this approach I have created a list of
filesin descending order based on there sizes so all the big size files will come at the top. The maximum file size in
PGis 1GB so if we have four workers who are picking up file from the list one by one, the worst case scenario is that
oneworker gets a file of 1GB to process while others get files of smaller size. However with this approach of
descendingfiles based on size and handing it out to workers one by one, there is a very high likelihood of workers
gettingwork evenly. does this address your concerns? 

Somewhat, but I'm not sure it's good enough. There are lots of reasons
why two processes that are started at the same time with the same
amount of work might not finish at the same time.

I'm also not particularly excited about having the server do the
sorting based on file size.  Seems like that ought to be the client's
job, if the client needs the sorting.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

From

Rushabh Lathia

Date:

07 October 2019, 08:52:36

Thanks Asif for the patch. I am opting this for a review. Patch is
bit big, so here are very initial comments to make the review process
easier.

1) Patch seems doing lot of code shuffling, I think it would be easy
to review if you can break the clean up patch separately.

Example:
a: setup_throttle
b: include_wal_files

2) As I can see this patch basically have three major phase.

a) Introducing new commands like START_BACKUP, SEND_FILES_CONTENT and

STOP_BACKUP.

b) Implementation of actual parallel backup.
c) Testcase

I would suggest, if you can break out in three as a separate patch that
would be nice. It will benefit in reviewing the patch.

3) In your patch you are preparing the backup manifest (file which
giving the information about the data files). Robert Haas, submitted
the backup manifests patch on another thread [1], and I think we
should use that patch to get the backup manifests for parallel backup.

Further, I will continue to review patch but meanwhile if you can
break the patches - so that review process be easier.

[1] https://www.postgresql.org/message-id/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com

Thanks,

On Fri, Oct 4, 2019 at 4:32 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Thu, Oct 3, 2019 at 6:40 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Sep 27, 2019 at 12:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> > - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given list.
>> > pg_basebackup will then send back a list of filenames in this command. This commands will be send by each worker and that worker will be getting the said files.
>>
>> Seems reasonable, but I think you should just pass one file name and
>> use the command multiple times, once per file.
>
> I considered this approach initially, however, I adopted the current strategy to avoid multiple round trips between the server and clients and save on query processing time by issuing a single command rather than multiple ones. Further fetching multiple files at once will also aid in supporting the tar format by utilising the existing ReceiveTarFile() function and will be able to create a tarball for per tablespace per worker.

I think that sending multiple filenames on a line could save some time
when there are lots of very small files, because then the round-trip
overhead could be significant.

However, if you've got mostly big files, I think this is going to be a
loser. It'll be fine if you're able to divide the work exactly evenly,
but that's pretty hard to do, because some workers may succeed in
copying the data faster than others for a variety of reasons: some
data is in memory, some data has to be read from disk, different data
may need to be read from different disks that run at different speeds,
not all the network connections may run at the same speed. Remember
that the backup's not done until the last worker finishes, and so
there may well be a significant advantage in terms of overall speed in
putting some energy into making sure that they finish as close to each
other in time as possible.

To put that another way, the first time all the workers except one get
done while the last one still has 10GB of data to copy, somebody's
going to be unhappy.

I have updated the patch (see the attached patch) to include tablespace support, tar format support and all other backup base backup options to work in parallel mode as well. As previously suggested, I have removed BASE_BACKUP [PARALLEL] and have added START_BACKUP instead to start the backup. The tar format will write multiple tar files depending upon the number of workers specified. Also made all commands (START_BACKUP/SEND_FILES_CONTENT/STOP_BACKUP) to accept the base_backup_opt_list. This way the command-line options can also be provided to these commands. Since the command-line options don't change once the backup initiates, I went this way instead of storing them in shared state.

The START_BACKUP command will now return a sorted list of files in descending order based on file sizes. This way, the larger files will be on top of the list. hence these files will be assigned to workers one by one, making it so that the larger files will be copied before other files.

Based on my understanding your main concern is that the files won't be distributed fairly i.e one worker might get a big file and take more time while others get done early with smaller files? In this approach I have created a list of files in descending order based on there sizes so all the big size files will come at the top. The maximum file size in PG is 1GB so if we have four workers who are picking up file from the list one by one, the worst case scenario is that one worker gets a file of 1GB to process while others get files of smaller size. However with this approach of descending files based on size and handing it out to workers one by one, there is a very high likelihood of workers getting work evenly. does this address your concerns?

Furthermore the patch also includes the regression test. As t/010_pg_basebackup.pl test-case is testing base backup comprehensively, so I have duplicated it to "t/040_pg_basebackup_parallel.pl" and added parallel option in all of its tests, to make sure parallel mode works expectantly. The one thing that differs from base backup is the file checksum reporting. In parallel mode, the total number of checksum failures are not reported correctly however it will abort the backup whenever a checksum failure occurs. This is because processes are not maintaining any shared state. I assume that it's not much important to report total number of failures vs noticing the failure and aborting.

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Rushabh Lathia

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

07 October 2019, 12:48:12

On Mon, Oct 7, 2019 at 1:52 PM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:

Thanks Asif for the patch. I am opting this for a review. Patch is
bit big, so here are very initial comments to make the review process
easier.

Thanks Rushabh for reviewing the patch.

1) Patch seems doing lot of code shuffling, I think it would be easy
to review if you can break the clean up patch separately.

Example:
a: setup_throttle
b: include_wal_files

2) As I can see this patch basically have three major phase.

a) Introducing new commands like START_BACKUP, SEND_FILES_CONTENT and
STOP_BACKUP.
b) Implementation of actual parallel backup.
c) Testcase

I would suggest, if you can break out in three as a separate patch that
would be nice. It will benefit in reviewing the patch.

Sure, why not. I will break them into multiple patches.

3) In your patch you are preparing the backup manifest (file which
giving the information about the data files). Robert Haas, submitted
the backup manifests patch on another thread [1], and I think we
should use that patch to get the backup manifests for parallel backup.

Sure. Though the backup manifest patch calculates and includes the checksum of backup files and is done

while the file is being transferred to the frontend-end. The manifest file itself is copied at the

very end of the backup. In parallel backup, I need the list of filenames before file contents are transferred, in

order to divide them into multiple workers. For that, the manifest file has to be available when START_BACKUP

is called.

That means, backup manifest should support its creation while excluding the checksum during START_BACKUP().

I also need the directory information as well for two reasons:

- In plain format, base path has to exist before we can write the file. we can extract the base path from the file

but doing that for all files does not seem a good idea.

- base backup does not include the content of some directories but those directories although empty, are still

expected in PGDATA.

I can make these changes part of parallel backup (which would be on top of backup manifest patch) or

these changes can be done as part of manifest patch and then parallel can use them.

Robert what do you suggest?

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

From

Robert Haas

Date:

07 October 2019, 13:05:34

On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> Sure. Though the backup manifest patch calculates and includes the checksum of backup files and is done
> while the file is being transferred to the frontend-end. The manifest file itself is copied at the
> very end of the backup. In parallel backup, I need the list of filenames before file contents are transferred, in
> order to divide them into multiple workers. For that, the manifest file has to be available when START_BACKUP
>  is called.
>
> That means, backup manifest should support its creation while excluding the checksum during START_BACKUP().
> I also need the directory information as well for two reasons:
>
> - In plain format, base path has to exist before we can write the file. we can extract the base path from the file
> but doing that for all files does not seem a good idea.
> - base backup does not include the content of some directories but those directories although empty, are still
> expected in PGDATA.
>
> I can make these changes part of parallel backup (which would be on top of backup manifest patch) or
> these changes can be done as part of manifest patch and then parallel can use them.
>
> Robert what do you suggest?

I think we should probably not use backup manifests here, actually. I
initially thought that would be a good idea, but after further thought
it seems like it just complicates the code to no real benefit.  I
suggest that the START_BACKUP command just return a result set, like a
query, with perhaps four columns: file name, file type ('d' for
directory or 'f' for file), file size, file mtime. pg_basebackup will
ignore the mtime, but some other tools might find that useful
information.

I wonder if we should also split START_BACKUP (which should enter
non-exclusive backup mode) from GET_FILE_LIST, in case some other
client program wants to use one of those but not the other.  I think
that's probably a good idea, but not sure.

I still think that the files should be requested one at a time, not a
huge long list in a single command.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

07 October 2019, 13:35:19

On Mon, Oct 7, 2019 at 6:05 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> Sure. Though the backup manifest patch calculates and includes the checksum of backup files and is done
> while the file is being transferred to the frontend-end. The manifest file itself is copied at the
> very end of the backup. In parallel backup, I need the list of filenames before file contents are transferred, in
> order to divide them into multiple workers. For that, the manifest file has to be available when START_BACKUP
> is called.
>
> That means, backup manifest should support its creation while excluding the checksum during START_BACKUP().
> I also need the directory information as well for two reasons:
>
> - In plain format, base path has to exist before we can write the file. we can extract the base path from the file
> but doing that for all files does not seem a good idea.
> - base backup does not include the content of some directories but those directories although empty, are still
> expected in PGDATA.
>
> I can make these changes part of parallel backup (which would be on top of backup manifest patch) or
> these changes can be done as part of manifest patch and then parallel can use them.
>
> Robert what do you suggest?

I think we should probably not use backup manifests here, actually. I
initially thought that would be a good idea, but after further thought
it seems like it just complicates the code to no real benefit.

Okay.

I
suggest that the START_BACKUP command just return a result set, like a
query, with perhaps four columns: file name, file type ('d' for
directory or 'f' for file), file size, file mtime. pg_basebackup will
ignore the mtime, but some other tools might find that useful
information.

yes current patch already returns the result set. will add the additional information.

I wonder if we should also split START_BACKUP (which should enter
non-exclusive backup mode) from GET_FILE_LIST, in case some other
client program wants to use one of those but not the other. I think
that's probably a good idea, but not sure.

Currently pg_basebackup does not enter in exclusive backup mode and other tools have to

use pg_start_backup() and pg_stop_backup() functions to achieve that. Since we are breaking

backup into multiple command, I believe it would be a good idea to have this option. I will include

it in next revision of this patch.

I still think that the files should be requested one at a time, not a
huge long list in a single command.

sure, will make the change.

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

From

Ibrar Ahmed

Date:

07 October 2019, 13:43:22

On Mon, Oct 7, 2019 at 6:06 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> Sure. Though the backup manifest patch calculates and includes the checksum of backup files and is done
> while the file is being transferred to the frontend-end. The manifest file itself is copied at the
> very end of the backup. In parallel backup, I need the list of filenames before file contents are transferred, in
> order to divide them into multiple workers. For that, the manifest file has to be available when START_BACKUP
> is called.
>
> That means, backup manifest should support its creation while excluding the checksum during START_BACKUP().
> I also need the directory information as well for two reasons:
>
> - In plain format, base path has to exist before we can write the file. we can extract the base path from the file
> but doing that for all files does not seem a good idea.
> - base backup does not include the content of some directories but those directories although empty, are still
> expected in PGDATA.
>
> I can make these changes part of parallel backup (which would be on top of backup manifest patch) or
> these changes can be done as part of manifest patch and then parallel can use them.
>
> Robert what do you suggest?

I think we should probably not use backup manifests here, actually. I
initially thought that would be a good idea, but after further thought
it seems like it just complicates the code to no real benefit. I
suggest that the START_BACKUP command just return a result set, like a
query, with perhaps four columns: file name, file type ('d' for
directory or 'f' for file), file size, file mtime. pg_basebackup will
ignore the mtime, but some other tools might find that useful
information.

I wonder if we should also split START_BACKUP (which should enter
non-exclusive backup mode) from GET_FILE_LIST, in case some other
client program wants to use one of those but not the other. I think
that's probably a good idea, but not sure.

I still think that the files should be requested one at a time, not a
huge long list in a single command.

What about have an API to get the single file or list of files? We will use a single file in

our application and other tools can get the benefit of list of files.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Ibrar Ahmed

Re: WIP/PoC for parallel backup

From

Robert Haas

Date:

07 October 2019, 13:47:29

On Mon, Oct 7, 2019 at 9:43 AM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:
> What about have an API to get the single file or list of files? We will use a single file in
> our application and other tools can get the benefit of list of files.

That sounds a bit speculative to me. Who is to say that anyone will
find that useful? I mean, I think it's fine and good to build the
functionality that we need in a way that maximizes the likelihood that
other tools can reuse that functionality, and I think we should do
that. But I don't think it's smart to build functionality that we
don't really need in the hope that somebody else will find it useful
unless we're pretty sure that they actually will. I don't see that as
being the case here; YMMV.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

16 October 2019, 13:19:10

On Mon, Oct 7, 2019 at 6:35 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Mon, Oct 7, 2019 at 6:05 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> Sure. Though the backup manifest patch calculates and includes the checksum of backup files and is done
> while the file is being transferred to the frontend-end. The manifest file itself is copied at the
> very end of the backup. In parallel backup, I need the list of filenames before file contents are transferred, in
> order to divide them into multiple workers. For that, the manifest file has to be available when START_BACKUP
> is called.
>
> That means, backup manifest should support its creation while excluding the checksum during START_BACKUP().
> I also need the directory information as well for two reasons:
>
> - In plain format, base path has to exist before we can write the file. we can extract the base path from the file
> but doing that for all files does not seem a good idea.
> - base backup does not include the content of some directories but those directories although empty, are still
> expected in PGDATA.
>
> I can make these changes part of parallel backup (which would be on top of backup manifest patch) or
> these changes can be done as part of manifest patch and then parallel can use them.
>
> Robert what do you suggest?

I think we should probably not use backup manifests here, actually. I
initially thought that would be a good idea, but after further thought
it seems like it just complicates the code to no real benefit.

Okay.

I
suggest that the START_BACKUP command just return a result set, like a
query, with perhaps four columns: file name, file type ('d' for
directory or 'f' for file), file size, file mtime. pg_basebackup will
ignore the mtime, but some other tools might find that useful
information.
yes current patch already returns the result set. will add the additional information.

I wonder if we should also split START_BACKUP (which should enter
non-exclusive backup mode) from GET_FILE_LIST, in case some other
client program wants to use one of those but not the other. I think
that's probably a good idea, but not sure.

Currently pg_basebackup does not enter in exclusive backup mode and other tools have to
use pg_start_backup() and pg_stop_backup() functions to achieve that. Since we are breaking
backup into multiple command, I believe it would be a good idea to have this option. I will include
it in next revision of this patch.

I still think that the files should be requested one at a time, not a
huge long list in a single command.
sure, will make the change.

I have refactored the functionality into multiple smaller patches in order to make the review process easier. I have divided the code into backend changes and pg_basebackup changes. The

backend replication system now supports the following commands:

- START_BACKUP

- SEND_FILE_LIST

- SEND_FILES_CONTENT

- STOP_BACKUP

The START_BACKUP will not return the list of files, instead SEND_FILE_LIST is used for that. The START_BACKUP
now calls pg_start_backup and returns starting WAL position, tablespace header information and content of backup label file.
Initially I was using tmp files to store the backup_label content but that turns out to be bad idea, because there can be multiple
non-exclusive backups running. The backup label information is needed by stop_backup so pg_basebackup will send it as part

of STOP_BACKUP.

The SEND_FILE_LIST will return the list of files. It will be returned as resultset having four columns (filename, type, size, mtime).

The SEND_FILES_CONTENT can now return the single file or multiple files as required. There is not much change required to

support both, so I believe it will be much useable this way if other tools want to utilise it.

As per suggestion from Robert, I am currently working on making changes in pg_basebackup to fetch files one by one. However that's not complete and the attach patch

is still using the old method of multi-file fetching to test the backend commands. I will send an updated patch which will contain the changes on fetching file one by one.

I wanted to share the backend patch to get some feedback in the mean time.

Thanks,

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Attachment

Re: WIP/PoC for parallel backup

From

Jeevan Ladhe

Date:

16 October 2019, 20:32:56

I quickly tried to have a look at your 0001-refactor patch.

Here are some comments:

1. The patch fails to compile.

Sorry if I am missing something, but am not able to understand why in new

function collectTablespaces() you have added an extra parameter NULL while

calling sendTablespace(), it fails the compilation :

+ ti->size = infotbssize ? sendTablespace(fullpath, true, NULL) : -1;

gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -Wno-unused-command-line-argument -g -g -O0 -Wall -Werror -I../../../../src/include -c -o xlog.o xlog.c -MMD -MP -MF .deps/xlog.Po

xlog.c:12253:59: error: too many arguments to function call, expected 2, have 3

ti->size = infotbssize ? sendTablespace(fullpath, true, NULL) : -1;

~~~~~~~~~~~~~~ ^~~~

2. I think the patch needs to run via pg_indent. It does not follow 80 column

width.

e.g.

+void

+collectTablespaces(List **tablespaces, StringInfo tblspcmapfile, bool infotbssize, bool needtblspcmapfile)

The comments in re-factored code appear to be redundant. example:

Following comment:

/* Setup and activate network throttling, if client requested it */

appears thrice in the code, before calling setup_throttle(), in the prologue of

the function setup_throttle(), and above the if() in that function.

Similarly - the comment:

/* Collect information about all tablespaces */

in collectTablespaces().

In function include_wal_files() why is the parameter TimeLineID i.e. endtli

needed. I don't see it being used in the function at all. I think you can safely

get rid of it.

+include_wal_files(XLogRecPtr endptr, TimeLineID endtli)

Regards,

Jeevan Ladhe

On Wed, Oct 16, 2019 at 6:49 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Mon, Oct 7, 2019 at 6:35 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Mon, Oct 7, 2019 at 6:05 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> Sure. Though the backup manifest patch calculates and includes the checksum of backup files and is done
> while the file is being transferred to the frontend-end. The manifest file itself is copied at the
> very end of the backup. In parallel backup, I need the list of filenames before file contents are transferred, in
> order to divide them into multiple workers. For that, the manifest file has to be available when START_BACKUP
> is called.
>
> That means, backup manifest should support its creation while excluding the checksum during START_BACKUP().
> I also need the directory information as well for two reasons:
>
> - In plain format, base path has to exist before we can write the file. we can extract the base path from the file
> but doing that for all files does not seem a good idea.
> - base backup does not include the content of some directories but those directories although empty, are still
> expected in PGDATA.
>
> I can make these changes part of parallel backup (which would be on top of backup manifest patch) or
> these changes can be done as part of manifest patch and then parallel can use them.
>
> Robert what do you suggest?

I think we should probably not use backup manifests here, actually. I
initially thought that would be a good idea, but after further thought
it seems like it just complicates the code to no real benefit.

Okay.

I
suggest that the START_BACKUP command just return a result set, like a
query, with perhaps four columns: file name, file type ('d' for
directory or 'f' for file), file size, file mtime. pg_basebackup will
ignore the mtime, but some other tools might find that useful
information.
yes current patch already returns the result set. will add the additional information.

I wonder if we should also split START_BACKUP (which should enter
non-exclusive backup mode) from GET_FILE_LIST, in case some other
client program wants to use one of those but not the other. I think
that's probably a good idea, but not sure.

Currently pg_basebackup does not enter in exclusive backup mode and other tools have to
use pg_start_backup() and pg_stop_backup() functions to achieve that. Since we are breaking
backup into multiple command, I believe it would be a good idea to have this option. I will include
it in next revision of this patch.

I still think that the files should be requested one at a time, not a
huge long list in a single command.
sure, will make the change.

I have refactored the functionality into multiple smaller patches in order to make the review process easier. I have divided the code into backend changes and pg_basebackup changes. The
backend replication system now supports the following commands:

- START_BACKUP
- SEND_FILE_LIST
- SEND_FILES_CONTENT
- STOP_BACKUP

The START_BACKUP will not return the list of files, instead SEND_FILE_LIST is used for that. The START_BACKUP
now calls pg_start_backup and returns starting WAL position, tablespace header information and content of backup label file.
Initially I was using tmp files to store the backup_label content but that turns out to be bad idea, because there can be multiple
non-exclusive backups running. The backup label information is needed by stop_backup so pg_basebackup will send it as part
of STOP_BACKUP.

The SEND_FILE_LIST will return the list of files. It will be returned as resultset having four columns (filename, type, size, mtime).
The SEND_FILES_CONTENT can now return the single file or multiple files as required. There is not much change required to
support both, so I believe it will be much useable this way if other tools want to utilise it.

As per suggestion from Robert, I am currently working on making changes in pg_basebackup to fetch files one by one. However that's not complete and the attach patch
is still using the old method of multi-file fetching to test the backend commands. I will send an updated patch which will contain the changes on fetching file one by one.

I wanted to share the backend patch to get some feedback in the mean time.

Thanks,

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

17 October 2019, 05:21:15

On Thu, Oct 17, 2019 at 1:33 AM Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:

I quickly tried to have a look at your 0001-refactor patch.
Here are some comments:

1. The patch fails to compile.

Sorry if I am missing something, but am not able to understand why in new
function collectTablespaces() you have added an extra parameter NULL while
calling sendTablespace(), it fails the compilation :

+ ti->size = infotbssize ? sendTablespace(fullpath, true, NULL) : -1;

gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -Wno-unused-command-line-argument -g -g -O0 -Wall -Werror -I../../../../src/include -c -o xlog.o xlog.c -MMD -MP -MF .deps/xlog.Po
xlog.c:12253:59: error: too many arguments to function call, expected 2, have 3
ti->size = infotbssize ? sendTablespace(fullpath, true, NULL) : -1;
~~~~~~~~~~~~~~ ^~~~

2. I think the patch needs to run via pg_indent. It does not follow 80 column
width.
e.g.

+void
+collectTablespaces(List **tablespaces, StringInfo tblspcmapfile, bool infotbssize, bool needtblspcmapfile)
+{

3.
The comments in re-factored code appear to be redundant. example:
Following comment:
/* Setup and activate network throttling, if client requested it */
appears thrice in the code, before calling setup_throttle(), in the prologue of
the function setup_throttle(), and above the if() in that function.
Similarly - the comment:
/* Collect information about all tablespaces */
in collectTablespaces().

4.
In function include_wal_files() why is the parameter TimeLineID i.e. endtli
needed. I don't see it being used in the function at all. I think you can safely
get rid of it.

+include_wal_files(XLogRecPtr endptr, TimeLineID endtli)

Thanks Jeevan. Some changes that should be part of 2nd patch were left in the 1st. I have fixed that and the above mentioned issues as well.

Attached are the updated patches.

Thanks,

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Attachment

Re: WIP/PoC for parallel backup

From

Jeevan Chalke

Date:

18 October 2019, 11:11:53

On Thu, Oct 17, 2019 at 10:51 AM Asif Rehman <asifr.rehman@gmail.com> wrote:

Attached are the updated patches.

I had a quick look over these changes and they look good overall.
However, here are my few review comments I caught while glancing the patches
0002 and 0003.

--- 0002 patch

1.
Can lsn option be renamed to start-wal-location? This will be more clear too.

2.
+typedef struct
+{
+    char        name[MAXPGPATH];
+    char        type;
+    int32        size;
+    time_t        mtime;
+} BackupFile;

I think it will be good if we keep this structure in a common place so that
the client can also use it.

3.
+    SEND_FILE_LIST,
+    SEND_FILES_CONTENT,
Can above two commands renamed to SEND_BACKUP_MANIFEST and SEND_BACKUP_FILE
respectively?
The reason behind the first name change is, we are not getting only file lists
here instead we are getting a few more details with that too. And for others,
it will be inline with START_BACKUP/STOP_BACKUP/SEND_BACKUP_MANIFEST.

4.
Typos:
non-exlusive => non-exclusive
retured => returned
optionaly => optionally
nessery => necessary
totoal => total

--- 0003 patch

1.
+static int
+simple_list_length(SimpleStringList *list)
+{
+    int            len = 0;
+    SimpleStringListCell *cell;
+
+    for (cell = list->head; cell; cell = cell->next, len++)
+        ;
+
+    return len;
+}

I think it will be good if it goes to simple_list.c. That will help in other
usages as well.

2.
Please revert these unnecessary changes:

@@ -1475,6 +1575,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
             */
             snprintf(filename, sizeof(filename), "%s/%s", current_path,
                     copybuf);
+
             if (filename[strlen(filename) - 1] == '/')
             {
                 /*

@@ -1528,8 +1622,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
                     * can map them too.)
                     */
                     filename[strlen(filename) - 1] = '\0';    /* Remove trailing slash */
-
                     mapped_tblspc_path = get_tablespace_mapping(&copybuf[157]);
+
                     if (symlink(mapped_tblspc_path, filename) != 0)
                     {
                         pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m",

3.
Typos:
retrive => retrieve
takecare => take care
tablespae => tablespace

4.
ParallelBackupEnd() function does not do anything for parallelism. Will it be
better to just rename it as EndBackup()?

5.
To pass a tablespace path to the server in SEND_FILES_CONTENT, you are reusing
a LABEL option, that seems odd. How about adding a new option for that?

6.
It will be good if we have some comments explaining what the function is
actually doing in its prologue. For functions like:
GetBackupFilesList()
ReceiveFiles()
create_workers_and_fetch()

Thanks

Thanks,

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

From

Ibrar Ahmed

Date:

24 October 2019, 10:19:08

On Fri, Oct 18, 2019 at 4:12 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:

On Thu, Oct 17, 2019 at 10:51 AM Asif Rehman <asifr.rehman@gmail.com> wrote:

Attached are the updated patches.

I had a quick look over these changes and they look good overall.
However, here are my few review comments I caught while glancing the patches
0002 and 0003.

--- 0002 patch

1.
Can lsn option be renamed to start-wal-location? This will be more clear too.

2.
+typedef struct
+{
+    char        name[MAXPGPATH];
+    char        type;
+    int32        size;
+    time_t        mtime;
+} BackupFile;

I think it will be good if we keep this structure in a common place so that
the client can also use it.

3.
+    SEND_FILE_LIST,
+    SEND_FILES_CONTENT,
Can above two commands renamed to SEND_BACKUP_MANIFEST and SEND_BACKUP_FILE
respectively?
The reason behind the first name change is, we are not getting only file lists
here instead we are getting a few more details with that too. And for others,
it will be inline with START_BACKUP/STOP_BACKUP/SEND_BACKUP_MANIFEST.

4.
Typos:
non-exlusive => non-exclusive
retured => returned
optionaly => optionally
nessery => necessary
totoal => total

--- 0003 patch

1.
+static int
+simple_list_length(SimpleStringList *list)
+{
+    int            len = 0;
+    SimpleStringListCell *cell;
+
+    for (cell = list->head; cell; cell = cell->next, len++)
+        ;
+
+    return len;
+}

I think it will be good if it goes to simple_list.c. That will help in other
usages as well.

2.
Please revert these unnecessary changes:

@@ -1475,6 +1575,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
             */
             snprintf(filename, sizeof(filename), "%s/%s", current_path,
                     copybuf);
+
             if (filename[strlen(filename) - 1] == '/')
             {
                 /*

@@ -1528,8 +1622,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
                     * can map them too.)
                     */
                     filename[strlen(filename) - 1] = '\0';    /* Remove trailing slash */
-
                     mapped_tblspc_path = get_tablespace_mapping(&copybuf[157]);
+
                     if (symlink(mapped_tblspc_path, filename) != 0)
                     {
                         pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m",

3.
Typos:
retrive => retrieve
takecare => take care
tablespae => tablespace

4.
ParallelBackupEnd() function does not do anything for parallelism. Will it be
better to just rename it as EndBackup()?

5.
To pass a tablespace path to the server in SEND_FILES_CONTENT, you are reusing
a LABEL option, that seems odd. How about adding a new option for that?

6.
It will be good if we have some comments explaining what the function is
actually doing in its prologue. For functions like:
GetBackupFilesList()
ReceiveFiles()
create_workers_and_fetch()

Thanks

Thanks,

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

I had a detailed discussion with Robert Haas at PostgreConf Europe about parallel backup.
We discussed the current state of the patch and what needs to be done to get the patch committed.

- The current patch uses a process to implement parallelism. There are many
reasons we need to use threads instead of processes. To start with, as this is a client utility it makes
more sense to use threads. The data needs to be shared amongst different threads and the main process,

handling that is simpler as compared to interprocess communication.

- Fetching a single file or multiple files was also discussed. We concluded in our discussion that we

need to benchmark to see if disk I/O is a bottleneck or not and if parallel writing gives us

any benefit. This benchmark needs to be done on different hardware and different
network to identify which are the real bottlenecks. In general, we agreed that we could start with fetching

one file at a time but that will be revisited after the benchmarks are done.

- There is also an ongoing debate in this thread that we should have one single tar file for all files or one

TAR file per thread. I really want to have a single tar file because the main purpose of the TAR file is to
reduce the management of multiple files, but in case of one file per thread, we end up with many tar
files. Therefore we need to have one master thread which is responsible for writing on tar file and all
the other threads will receive the data from the network and stream to the master thread. This also
supports the idea of using a thread-based model rather than a process-based approach because it
requires too much data sharing between processes. If we cannot achieve this, then we can disable the

TAR option for parallel backup in the first version.

- In the case of data sharing, we need to try to avoid unnecessary locking and more suitable algorithm to
solve the reader-writer problem is required.

Ibrar Ahmed

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

24 October 2019, 11:24:41

On Thu, Oct 24, 2019 at 3:21 PM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:

On Fri, Oct 18, 2019 at 4:12 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:

On Thu, Oct 17, 2019 at 10:51 AM Asif Rehman <asifr.rehman@gmail.com> wrote:

Attached are the updated patches.

I had a quick look over these changes and they look good overall.
However, here are my few review comments I caught while glancing the patches
0002 and 0003.

--- 0002 patch

1.
Can lsn option be renamed to start-wal-location? This will be more clear too.

2.
+typedef struct
+{
+    char        name[MAXPGPATH];
+    char        type;
+    int32        size;
+    time_t        mtime;
+} BackupFile;

I think it will be good if we keep this structure in a common place so that
the client can also use it.

3.
+    SEND_FILE_LIST,
+    SEND_FILES_CONTENT,
Can above two commands renamed to SEND_BACKUP_MANIFEST and SEND_BACKUP_FILE
respectively?
The reason behind the first name change is, we are not getting only file lists
here instead we are getting a few more details with that too. And for others,
it will be inline with START_BACKUP/STOP_BACKUP/SEND_BACKUP_MANIFEST.

4.
Typos:
non-exlusive => non-exclusive
retured => returned
optionaly => optionally
nessery => necessary
totoal => total

--- 0003 patch

1.
+static int
+simple_list_length(SimpleStringList *list)
+{
+    int            len = 0;
+    SimpleStringListCell *cell;
+
+    for (cell = list->head; cell; cell = cell->next, len++)
+        ;
+
+    return len;
+}

I think it will be good if it goes to simple_list.c. That will help in other
usages as well.

2.
Please revert these unnecessary changes:

@@ -1475,6 +1575,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
             */
             snprintf(filename, sizeof(filename), "%s/%s", current_path,
                     copybuf);
+
             if (filename[strlen(filename) - 1] == '/')
             {
                 /*

@@ -1528,8 +1622,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
                     * can map them too.)
                     */
                     filename[strlen(filename) - 1] = '\0';    /* Remove trailing slash */
-
                     mapped_tblspc_path = get_tablespace_mapping(&copybuf[157]);
+
                     if (symlink(mapped_tblspc_path, filename) != 0)
                     {
                         pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m",

3.
Typos:
retrive => retrieve
takecare => take care
tablespae => tablespace

4.
ParallelBackupEnd() function does not do anything for parallelism. Will it be
better to just rename it as EndBackup()?

5.
To pass a tablespace path to the server in SEND_FILES_CONTENT, you are reusing
a LABEL option, that seems odd. How about adding a new option for that?

6.
It will be good if we have some comments explaining what the function is
actually doing in its prologue. For functions like:
GetBackupFilesList()
ReceiveFiles()
create_workers_and_fetch()

Thanks

Thanks,

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

I had a detailed discussion with Robert Haas at PostgreConf Europe about parallel backup.
We discussed the current state of the patch and what needs to be done to get the patch committed.

- The current patch uses a process to implement parallelism. There are many
reasons we need to use threads instead of processes. To start with, as this is a client utility it makes
more sense to use threads. The data needs to be shared amongst different threads and the main process,
handling that is simpler as compared to interprocess communication.

Yes I agree. I have already converted the code to use threads instead of processes. This avoids the overhead

of interprocess communication.

With a single file fetching strategy, this requires communication between competing threads/processes. To handle

that in a multiprocess application, it requires IPC. The current approach of multiple threads instead of processes

avoids this overhead.

- Fetching a single file or multiple files was also discussed. We concluded in our discussion that we
need to benchmark to see if disk I/O is a bottleneck or not and if parallel writing gives us
any benefit. This benchmark needs to be done on different hardware and different
network to identify which are the real bottlenecks. In general, we agreed that we could start with fetching
one file at a time but that will be revisited after the benchmarks are done.

I'll share the updated patch in the next couple of days. After that, I'll work on benchmarking that in

different environments that I have.

- There is also an ongoing debate in this thread that we should have one single tar file for all files or one
TAR file per thread. I really want to have a single tar file because the main purpose of the TAR file is to
reduce the management of multiple files, but in case of one file per thread, we end up with many tar
files. Therefore we need to have one master thread which is responsible for writing on tar file and all
the other threads will receive the data from the network and stream to the master thread. This also
supports the idea of using a thread-based model rather than a process-based approach because it
requires too much data sharing between processes. If we cannot achieve this, then we can disable the
TAR option for parallel backup in the first version.

I am in favour of disabling the tar format for the first version of parallel backup.

- In the case of data sharing, we need to try to avoid unnecessary locking and more suitable algorithm to
solve the reader-writer problem is required.

--
Ibrar Ahmed

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

28 October 2019, 14:03:33

On Thu, Oct 24, 2019 at 4:24 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Thu, Oct 24, 2019 at 3:21 PM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:

On Fri, Oct 18, 2019 at 4:12 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:

On Thu, Oct 17, 2019 at 10:51 AM Asif Rehman <asifr.rehman@gmail.com> wrote:

Attached are the updated patches.

I had a quick look over these changes and they look good overall.
However, here are my few review comments I caught while glancing the patches
0002 and 0003.

--- 0002 patch

1.
Can lsn option be renamed to start-wal-location? This will be more clear too.

2.
+typedef struct
+{
+    char        name[MAXPGPATH];
+    char        type;
+    int32        size;
+    time_t        mtime;
+} BackupFile;

I think it will be good if we keep this structure in a common place so that
the client can also use it.

3.
+    SEND_FILE_LIST,
+    SEND_FILES_CONTENT,
Can above two commands renamed to SEND_BACKUP_MANIFEST and SEND_BACKUP_FILE
respectively?
The reason behind the first name change is, we are not getting only file lists
here instead we are getting a few more details with that too. And for others,
it will be inline with START_BACKUP/STOP_BACKUP/SEND_BACKUP_MANIFEST.

4.
Typos:
non-exlusive => non-exclusive
retured => returned
optionaly => optionally
nessery => necessary
totoal => total

--- 0003 patch

1.
+static int
+simple_list_length(SimpleStringList *list)
+{
+    int            len = 0;
+    SimpleStringListCell *cell;
+
+    for (cell = list->head; cell; cell = cell->next, len++)
+        ;
+
+    return len;
+}

I think it will be good if it goes to simple_list.c. That will help in other
usages as well.

2.
Please revert these unnecessary changes:

@@ -1475,6 +1575,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
             */
             snprintf(filename, sizeof(filename), "%s/%s", current_path,
                     copybuf);
+
             if (filename[strlen(filename) - 1] == '/')
             {
                 /*

@@ -1528,8 +1622,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
                     * can map them too.)
                     */
                     filename[strlen(filename) - 1] = '\0';    /* Remove trailing slash */
-
                     mapped_tblspc_path = get_tablespace_mapping(&copybuf[157]);
+
                     if (symlink(mapped_tblspc_path, filename) != 0)
                     {
                         pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m",

3.
Typos:
retrive => retrieve
takecare => take care
tablespae => tablespace

4.
ParallelBackupEnd() function does not do anything for parallelism. Will it be
better to just rename it as EndBackup()?

5.
To pass a tablespace path to the server in SEND_FILES_CONTENT, you are reusing
a LABEL option, that seems odd. How about adding a new option for that?

6.
It will be good if we have some comments explaining what the function is
actually doing in its prologue. For functions like:
GetBackupFilesList()
ReceiveFiles()
create_workers_and_fetch()

Thanks

Thanks,

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

I had a detailed discussion with Robert Haas at PostgreConf Europe about parallel backup.
We discussed the current state of the patch and what needs to be done to get the patch committed.

- The current patch uses a process to implement parallelism. There are many
reasons we need to use threads instead of processes. To start with, as this is a client utility it makes
more sense to use threads. The data needs to be shared amongst different threads and the main process,
handling that is simpler as compared to interprocess communication.

Yes I agree. I have already converted the code to use threads instead of processes. This avoids the overhead
of interprocess communication.

With a single file fetching strategy, this requires communication between competing threads/processes. To handle
that in a multiprocess application, it requires IPC. The current approach of multiple threads instead of processes
avoids this overhead.

- Fetching a single file or multiple files was also discussed. We concluded in our discussion that we
need to benchmark to see if disk I/O is a bottleneck or not and if parallel writing gives us
any benefit. This benchmark needs to be done on different hardware and different
network to identify which are the real bottlenecks. In general, we agreed that we could start with fetching
one file at a time but that will be revisited after the benchmarks are done.

I'll share the updated patch in the next couple of days. After that, I'll work on benchmarking that in
different environments that I have.

- There is also an ongoing debate in this thread that we should have one single tar file for all files or one
TAR file per thread. I really want to have a single tar file because the main purpose of the TAR file is to
reduce the management of multiple files, but in case of one file per thread, we end up with many tar
files. Therefore we need to have one master thread which is responsible for writing on tar file and all
the other threads will receive the data from the network and stream to the master thread. This also
supports the idea of using a thread-based model rather than a process-based approach because it
requires too much data sharing between processes. If we cannot achieve this, then we can disable the
TAR option for parallel backup in the first version.

I am in favour of disabling the tar format for the first version of parallel backup.

- In the case of data sharing, we need to try to avoid unnecessary locking and more suitable algorithm to
solve the reader-writer problem is required.

--
Ibrar Ahmed

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

I have updated the patch to include the changes suggested by Jeevan. This patch also implements the thread workers instead of

processes and fetches a single file at a time. The tar format has been disabled for first version of parallel backup.

Conversion from the previous process based application to the current thread based one required slight modification in data structure,

addition of a few new functions and progress reporting functionality.

The core data structure remains in tact where table space based file listing is maintained, however, we are now maintaining a list of all

files (maintaining pointers to FileInfo structure; so no duplication of data), so that we can sequentially access these without adding too

much processing in critical section. The current scope of the critical section for thread workers is limited to incrementing the file index

within the list of files.

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Attachment

Re: WIP/PoC for parallel backup

From

Robert Haas

Date:

28 October 2019, 15:28:49

On Mon, Oct 28, 2019 at 10:03 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> I have updated the patch to include the changes suggested by Jeevan. This patch also implements the thread workers
insteadof

> processes and fetches a single file at a time. The tar format has been disabled for first version of parallel
backup.

Looking at 0001-0003:

It's not clear to me what the purpose of the start WAL location is
supposed to be. As far as I can see, SendBackupFiles() stores it in a
variable which is then used for exactly nothing, and nothing else uses
it. It seems like that would be part of a potential incremental
backup feature, but I don't see what it's got to do with parallel full
backup.

The tablespace_path option appears entirely unused, and I don't know
why that should be necessary here, either.

STORE_BACKUPFILE() seems like maybe it should be a function rather
than a macro, and also probably be renamed, because it doesn't store
files and the argument's not necessarily a file.

SendBackupManifest() does not send a backup manifest in the sense
contemplated by the email thread on that subject. It sends a file
list. That seems like the right idea - IMHO, anyway - but you need to
do a thorough renaming.

I think it would be fine to decide that this facility won't support
exclusive-mode backup.

I don't think much of having both sendDir() and sendDir_(). The latter
name is inconsistent with any naming convention we have, and there
seems to be no reason not to just add an argument to sendDir() and
change the callers.

I think we should rename - perhaps as a preparatory patch - the
sizeonly flag to dryrun, or something like that.

The resource cleanup does not look right. You've included calls to
PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, 0) in both StartBackup()
and StopBackup(), but what happens if there is an error or even a
clean shutdown of the connection in between? I think that there needs
to be some change here to ensure that a walsender will always call
base_backup_cleanup() when it exits; I think that'd probably remove
the need for any PG_ENSURE_ERROR_CLEANUP calls at all, including ones
we have already. This might also be something that could be done as a
separate, prepatory refactoring patch.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

30 October 2019, 14:16:11

On Mon, Oct 28, 2019 at 8:29 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Oct 28, 2019 at 10:03 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> I have updated the patch to include the changes suggested by Jeevan. This patch also implements the thread workers instead of
> processes and fetches a single file at a time. The tar format has been disabled for first version of parallel backup.

Looking at 0001-0003:

It's not clear to me what the purpose of the start WAL location is
supposed to be. As far as I can see, SendBackupFiles() stores it in a
variable which is then used for exactly nothing, and nothing else uses
it. It seems like that would be part of a potential incremental
backup feature, but I don't see what it's got to do with parallel full
backup.

'startptr' is used by sendFile() during checksum verification. Since
SendBackupFiles() is using sendFIle we have to set a valid WAL location.

The tablespace_path option appears entirely unused, and I don't know
why that should be necessary here, either.

This is to calculate the basepathlen. We need to exclude the tablespace location (or

base path) from the filename before it is sent to the client with sendFile call. I added

this option primarily to avoid performing string manipulation on filename to extract the

tablespace location and then calculate the basepathlen.

Alternatively we can do it by extracting the base path from the received filename. What

do you suggest?

STORE_BACKUPFILE() seems like maybe it should be a function rather
than a macro, and also probably be renamed, because it doesn't store
files and the argument's not necessarily a file.

Sure.

SendBackupManifest() does not send a backup manifest in the sense
contemplated by the email thread on that subject. It sends a file
list. That seems like the right idea - IMHO, anyway - but you need to
do a thorough renaming.

I'm considering the following command names:

START_BACKUP
- Starts the backup process

SEND_BACKUP_FILELIST (Instead of SEND_BACKUP_MANIFEST)
- Sends the list of all files (along with file information such as filename, file type (directory/file/link),
file size and file mtime for each file) to be backed up.

SEND_BACKUP_FILES
- Sends one or more files to the client.

STOP_BACKUP
- Stops the backup process.

I'll update the function names accordingly after your confirmation. Of course, suggestions for
better names are welcome.

I think it would be fine to decide that this facility won't support
exclusive-mode backup.

Sure. Will drop this patch.

I don't think much of having both sendDir() and sendDir_(). The latter
name is inconsistent with any naming convention we have, and there
seems to be no reason not to just add an argument to sendDir() and
change the callers.

I think we should rename - perhaps as a preparatory patch - the
sizeonly flag to dryrun, or something like that.

Sure, will take care of it.

The resource cleanup does not look right. You've included calls to
PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, 0) in both StartBackup()
and StopBackup(), but what happens if there is an error or even a
clean shutdown of the connection in between? I think that there needs

to be some change here to ensure that a walsender will always call
base_backup_cleanup() when it exits; I think that'd probably remove
the need for any PG_ENSURE_ERROR_CLEANUP calls at all, including ones
we have already. This might also be something that could be done as a
separate, prepatory refactoring patch.

You're right. I didn't handle this case properly. I will removed PG_ENSURE_ERROR_CLEANUP
calls and replace it with before_shmem_exit handler. This way whenever backend process exits,

base_backup_cleanup will be called:

- If it exists before calling the do_pg_stop_backup, base_backup_cleanup will take care of cleanup.
- otherwise in case of a clean shutdown (after calling do_pg_stop_backup) then base_backup_cleanup

will simply return without doing anything.

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

01 November 2019, 15:26:02

On Wed, Oct 30, 2019 at 7:16 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Mon, Oct 28, 2019 at 8:29 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Oct 28, 2019 at 10:03 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> I have updated the patch to include the changes suggested by Jeevan. This patch also implements the thread workers instead of
> processes and fetches a single file at a time. The tar format has been disabled for first version of parallel backup.

Looking at 0001-0003:

It's not clear to me what the purpose of the start WAL location is
supposed to be. As far as I can see, SendBackupFiles() stores it in a
variable which is then used for exactly nothing, and nothing else uses
it. It seems like that would be part of a potential incremental
backup feature, but I don't see what it's got to do with parallel full
backup.

'startptr' is used by sendFile() during checksum verification. Since
SendBackupFiles() is using sendFIle we have to set a valid WAL location.

The tablespace_path option appears entirely unused, and I don't know
why that should be necessary here, either.

This is to calculate the basepathlen. We need to exclude the tablespace location (or
base path) from the filename before it is sent to the client with sendFile call. I added
this option primarily to avoid performing string manipulation on filename to extract the
tablespace location and then calculate the basepathlen.

Alternatively we can do it by extracting the base path from the received filename. What
do you suggest?

STORE_BACKUPFILE() seems like maybe it should be a function rather
than a macro, and also probably be renamed, because it doesn't store
files and the argument's not necessarily a file.
Sure.

SendBackupManifest() does not send a backup manifest in the sense
contemplated by the email thread on that subject. It sends a file
list. That seems like the right idea - IMHO, anyway - but you need to
do a thorough renaming.

I'm considering the following command names:
START_BACKUP
- Starts the backup process

SEND_BACKUP_FILELIST (Instead of SEND_BACKUP_MANIFEST)
- Sends the list of all files (along with file information such as filename, file type (directory/file/link),
file size and file mtime for each file) to be backed up.

SEND_BACKUP_FILES
- Sends one or more files to the client.

STOP_BACKUP
- Stops the backup process.

I'll update the function names accordingly after your confirmation. Of course, suggestions for
better names are welcome.

I think it would be fine to decide that this facility won't support
exclusive-mode backup.

Sure. Will drop this patch.

I don't think much of having both sendDir() and sendDir_(). The latter
name is inconsistent with any naming convention we have, and there
seems to be no reason not to just add an argument to sendDir() and
change the callers.

I think we should rename - perhaps as a preparatory patch - the
sizeonly flag to dryrun, or something like that.

Sure, will take care of it.

The resource cleanup does not look right. You've included calls to
PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, 0) in both StartBackup()
and StopBackup(), but what happens if there is an error or even a
clean shutdown of the connection in between? I think that there needs
to be some change here to ensure that a walsender will always call
base_backup_cleanup() when it exits; I think that'd probably remove
the need for any PG_ENSURE_ERROR_CLEANUP calls at all, including ones
we have already. This might also be something that could be done as a
separate, prepatory refactoring patch.

You're right. I didn't handle this case properly. I will removed PG_ENSURE_ERROR_CLEANUP
calls and replace it with before_shmem_exit handler. This way whenever backend process exits,
base_backup_cleanup will be called:
- If it exists before calling the do_pg_stop_backup, base_backup_cleanup will take care of cleanup.
- otherwise in case of a clean shutdown (after calling do_pg_stop_backup) then base_backup_cleanup
will simply return without doing anything.

The updated patches are attached.

Thanks,

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Attachment

Re: WIP/PoC for parallel backup

From

Robert Haas

Date:

01 November 2019, 15:52:51

On Wed, Oct 30, 2019 at 10:16 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> 'startptr' is used by sendFile() during checksum verification. Since
> SendBackupFiles() is using sendFIle we have to set a valid WAL location.

Ugh, global variables.

Why are START_BACKUP, SEND_BACKUP_FILELIST, SEND_BACKUP_FILES, and
STOP_BACKUP all using the same base_backup_opt_list production as
BASE_BACKUP? Presumably most of those options are not applicable to
most of those commands, and the productions should therefore be
separated.

You should add docs, too.  I wouldn't have to guess what some of this
stuff was for if you wrote documentation explaining what this stuff
was for. :-)

>> The tablespace_path option appears entirely unused, and I don't know
>> why that should be necessary here, either.
>
> This is to calculate the basepathlen. We need to exclude the tablespace location (or
> base path) from the filename before it is sent to the client with sendFile call. I added
> this option primarily to avoid performing string manipulation on filename to extract the
> tablespace location and then calculate the basepathlen.
>
> Alternatively we can do it by extracting the base path from the received filename. What
> do you suggest?

I don't think the server needs any information from the client in
order to be able to exclude the tablespace location from the pathname.
Whatever it needs to know, it should be able to figure out, just as it
would in a non-parallel backup.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

04 November 2019, 13:08:55

On Fri, Nov 1, 2019 at 8:53 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Oct 30, 2019 at 10:16 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> 'startptr' is used by sendFile() during checksum verification. Since
> SendBackupFiles() is using sendFIle we have to set a valid WAL location.

Ugh, global variables.

Why are START_BACKUP, SEND_BACKUP_FILELIST, SEND_BACKUP_FILES, and
STOP_BACKUP all using the same base_backup_opt_list production as
BASE_BACKUP? Presumably most of those options are not applicable to
most of those commands, and the productions should therefore be
separated.

Are you expecting something like the attached patch? Basically I have reorganised the grammar

rules so each command can have the options required by it.

I was feeling a bit reluctant for this change because it may add some unwanted grammar rules in

the replication grammar. Since these commands are using the same options as base backup, may

be we could throw error inside the relevant functions on unwanted options?

You should add docs, too. I wouldn't have to guess what some of this
stuff was for if you wrote documentation explaining what this stuff
was for. :-)

Yes I will add it in the next patch.

>> The tablespace_path option appears entirely unused, and I don't know
>> why that should be necessary here, either.
>
> This is to calculate the basepathlen. We need to exclude the tablespace location (or
> base path) from the filename before it is sent to the client with sendFile call. I added
> this option primarily to avoid performing string manipulation on filename to extract the
> tablespace location and then calculate the basepathlen.
>
> Alternatively we can do it by extracting the base path from the received filename. What
> do you suggest?

I don't think the server needs any information from the client in
order to be able to exclude the tablespace location from the pathname.
Whatever it needs to know, it should be able to figure out, just as it
would in a non-parallel backup.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Attachment

repl_grammar.patch

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

12 November 2019, 12:07:14

On Mon, Nov 4, 2019 at 6:08 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Fri, Nov 1, 2019 at 8:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Oct 30, 2019 at 10:16 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> 'startptr' is used by sendFile() during checksum verification. Since
> SendBackupFiles() is using sendFIle we have to set a valid WAL location.

Ugh, global variables.

Why are START_BACKUP, SEND_BACKUP_FILELIST, SEND_BACKUP_FILES, and
STOP_BACKUP all using the same base_backup_opt_list production as
BASE_BACKUP? Presumably most of those options are not applicable to
most of those commands, and the productions should therefore be
separated.

Are you expecting something like the attached patch? Basically I have reorganised the grammar
rules so each command can have the options required by it.

I was feeling a bit reluctant for this change because it may add some unwanted grammar rules in
the replication grammar. Since these commands are using the same options as base backup, may
be we could throw error inside the relevant functions on unwanted options?

You should add docs, too. I wouldn't have to guess what some of this
stuff was for if you wrote documentation explaining what this stuff
was for. :-)

Yes I will add it in the next patch.

>> The tablespace_path option appears entirely unused, and I don't know
>> why that should be necessary here, either.
>
> This is to calculate the basepathlen. We need to exclude the tablespace location (or
> base path) from the filename before it is sent to the client with sendFile call. I added
> this option primarily to avoid performing string manipulation on filename to extract the
> tablespace location and then calculate the basepathlen.
>
> Alternatively we can do it by extracting the base path from the received filename. What
> do you suggest?

I don't think the server needs any information from the client in
order to be able to exclude the tablespace location from the pathname.
Whatever it needs to know, it should be able to figure out, just as it
would in a non-parallel backup.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

I have updated the replication grammar with some new rules to differentiate the options production

for base backup and newly added commands.

I have also created a separate patch to include the documentation changes. The current syntax is as below:

- START_BACKUP [ LABEL 'label' ] [ PROGRESS ] [ FAST ] [ TABLESPACE_MAP ]
- STOP_BACKUP [ LABEL 'label' ] [ WAL ] [ NOWAIT ]

- SEND_BACKUP_FILELIST

- SEND_BACKUP_FILES ( 'FILE' [, ...] ) [ MAX_RATE rate ] [ NOVERIFY_CHECKSUMS ] [ START_WAL_LOCATION ]

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Attachment

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

13 November 2019, 13:34:04

On Tue, Nov 12, 2019 at 5:07 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Mon, Nov 4, 2019 at 6:08 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Fri, Nov 1, 2019 at 8:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Oct 30, 2019 at 10:16 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> 'startptr' is used by sendFile() during checksum verification. Since
> SendBackupFiles() is using sendFIle we have to set a valid WAL location.

Ugh, global variables.

Why are START_BACKUP, SEND_BACKUP_FILELIST, SEND_BACKUP_FILES, and
STOP_BACKUP all using the same base_backup_opt_list production as
BASE_BACKUP? Presumably most of those options are not applicable to
most of those commands, and the productions should therefore be
separated.

Are you expecting something like the attached patch? Basically I have reorganised the grammar
rules so each command can have the options required by it.

I was feeling a bit reluctant for this change because it may add some unwanted grammar rules in
the replication grammar. Since these commands are using the same options as base backup, may
be we could throw error inside the relevant functions on unwanted options?

You should add docs, too. I wouldn't have to guess what some of this
stuff was for if you wrote documentation explaining what this stuff
was for. :-)

Yes I will add it in the next patch.

>> The tablespace_path option appears entirely unused, and I don't know
>> why that should be necessary here, either.
>
> This is to calculate the basepathlen. We need to exclude the tablespace location (or
> base path) from the filename before it is sent to the client with sendFile call. I added
> this option primarily to avoid performing string manipulation on filename to extract the
> tablespace location and then calculate the basepathlen.
>
> Alternatively we can do it by extracting the base path from the received filename. What
> do you suggest?

I don't think the server needs any information from the client in
order to be able to exclude the tablespace location from the pathname.
Whatever it needs to know, it should be able to figure out, just as it
would in a non-parallel backup.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

I have updated the replication grammar with some new rules to differentiate the options production
for base backup and newly added commands.

I have also created a separate patch to include the documentation changes. The current syntax is as below:

- START_BACKUP [ LABEL 'label' ] [ PROGRESS ] [ FAST ] [ TABLESPACE_MAP ]
- STOP_BACKUP [ LABEL 'label' ] [ WAL ] [ NOWAIT ]
- SEND_BACKUP_FILELIST
- SEND_BACKUP_FILES ( 'FILE' [, ...] ) [ MAX_RATE rate ] [ NOVERIFY_CHECKSUMS ] [ START_WAL_LOCATION ]

Sorry, I sent the wrong patches. Please see the correct version of the patches (_v6).

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Attachment

Re: WIP/PoC for parallel backup

From

Jeevan Chalke

Date:

27 November 2019, 08:38:31

On Wed, Nov 13, 2019 at 7:04 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

Sorry, I sent the wrong patches. Please see the correct version of the patches (_v6).

Review comments on these patches:

1.
+   XLogRecPtr   wal_location;

Looking at the other field names in basebackup_options structure, let's use
wallocation instead. Or better startwallocation to be precise.

2.
+   int32       size;

Should we use size_t here?

3.
I am still not sure why we need SEND_BACKUP_FILELIST as a separate command.
Can't we return the file list with START_BACKUP itself?

4.
+       else if (
+#ifndef WIN32
+               S_ISLNK(statbuf.st_mode)
+#else
+               pgwin32_is_junction(pathbuf)
+#endif
+           )
+       {
+           /*
+           * If symlink, write it as a directory. file symlinks only allowed
+           * in pg_tblspc
+           */
+           statbuf.st_mode = S_IFDIR | pg_dir_create_mode;
+           _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf, false);
+       }

In normal backup mode, we skip the special file which is not a regular file or
a directory or a symlink inside pg_tblspc. But in your patch, above code,
treats it as a directory. Should parallel backup too skip such special files?

5.
Please keep header file inclusions in alphabetical order in basebackup.c and
pg_basebackup.c

6.
+       /*
+       * build query in form of: SEND_BACKUP_FILES ('base/1/1245/32683',
+       * 'base/1/1245/32683', ...) [options]
+       */

Please update these comments as we fetch one file at a time.

7.
+backup_file:
+           SCONST                           { $$ = (Node *) makeString($1); }
+           ;
+

Instead of having this rule with only one constant terminal, we can use
SCONST directly in backup_files_list. However, I don't see any issue with
this approach either, just trying to reduce the rules.

8.
Please indent code within 80 char limit at all applicable places.

9.
Please fix following typos:

identifing => identifying
optionaly => optionally
structre => structure
progrsss => progress
Retrive => Retrieve
direcotries => directories

=====

The other mail thread related to backup manifest [1], is creating a
backup_manifest file and sends that to the client which has optional
checksum and other details including filename, file size, mtime, etc.
There is a patch on the same thread which is then validating the backup too.

Since this patch too gets a file list from the server and has similar
details (except checksum), can somehow parallel backup use the backup-manifest
infrastructure from that patch?

When the parallel backup is in use, will there be a backup_manifest file
created too? I am just visualizing what will be the scenario when both these
features are checked-in.

[1] https://www.postgresql.org/message-id/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Thanks

Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

From

Robert Haas

Date:

27 November 2019, 19:57:27

On Wed, Nov 27, 2019 at 3:38 AM Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
> I am still not sure why we need SEND_BACKUP_FILELIST as a separate command.
> Can't we return the file list with START_BACKUP itself?

I had the same thought, but I think it's better to keep them separate.
Somebody might want to use the SEND_BACKUP_FILELIST command for
something other than a backup (I actually think it should be called
just SEND_FILE_LIST). Somebody might want to start a backup without
getting a file list because they're going to copy the files at the FS
level. Somebody might want to get a list of files to process after
somebody else has started the backup on another connection. Or maybe
nobody wants to do any of those things, but it doesn't seem to cost us
much of anything to split the commands, so I think we should.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

10 December 2019, 14:33:48

On Wed, Nov 27, 2019 at 1:38 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:

On Wed, Nov 13, 2019 at 7:04 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

Sorry, I sent the wrong patches. Please see the correct version of the patches (_v6).

Review comments on these patches:

1.
+   XLogRecPtr   wal_location;

Looking at the other field names in basebackup_options structure, let's use
wallocation instead. Or better startwallocation to be precise.

2.
+   int32       size;

Should we use size_t here?

3.
I am still not sure why we need SEND_BACKUP_FILELIST as a separate command.
Can't we return the file list with START_BACKUP itself?

4.
+       else if (
+#ifndef WIN32
+               S_ISLNK(statbuf.st_mode)
+#else
+               pgwin32_is_junction(pathbuf)
+#endif
+           )
+       {
+           /*
+           * If symlink, write it as a directory. file symlinks only allowed
+           * in pg_tblspc
+           */
+           statbuf.st_mode = S_IFDIR | pg_dir_create_mode;
+           _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf, false);
+       }

In normal backup mode, we skip the special file which is not a regular file or
a directory or a symlink inside pg_tblspc. But in your patch, above code,
treats it as a directory. Should parallel backup too skip such special files?

Yeah going through the code again, I found it a little bit inconsistent. In fact

SendBackupFiles function is supposed to send the files that were requested of

it. However, currently is performing these tasks:

1) If the requested file were to be a directory, it will return a TAR directory entry.

2) If the requested files were to be symlink inside pg_tblspc, it will return the link path.

3) and as you pointed out above, if the requested files were a symlink outside pg_tblspc

and inside PGDATA then it will return TAR directory entry.

I think that this function should not take care of any of the above. Instead, it should

be the client (i.e. pg_basebackup) managing it. The SendBackupFiles should only send the

regular files and ignore the request of any other kind, be it a directory or symlink.

Any thoughts?

5.
Please keep header file inclusions in alphabetical order in basebackup.c and
pg_basebackup.c

6.
+       /*
+       * build query in form of: SEND_BACKUP_FILES ('base/1/1245/32683',
+       * 'base/1/1245/32683', ...) [options]
+       */

Please update these comments as we fetch one file at a time.

7.
+backup_file:
+           SCONST                           { $$ = (Node *) makeString($1); }
+           ;
+

Instead of having this rule with only one constant terminal, we can use
SCONST directly in backup_files_list. However, I don't see any issue with
this approach either, just trying to reduce the rules.

8.
Please indent code within 80 char limit at all applicable places.

9.
Please fix following typos:

identifing => identifying
optionaly => optionally
structre => structure
progrsss => progress
Retrive => Retrieve
direcotries => directories

=====

The other mail thread related to backup manifest [1], is creating a
backup_manifest file and sends that to the client which has optional
checksum and other details including filename, file size, mtime, etc.
There is a patch on the same thread which is then validating the backup too.

Since this patch too gets a file list from the server and has similar
details (except checksum), can somehow parallel backup use the backup-manifest
infrastructure from that patch?

This was discussed earlier in the thread, and as Robert suggested, it would complicate the

code to no real benefit.

When the parallel backup is in use, will there be a backup_manifest file
created too? I am just visualizing what will be the scenario when both these
features are checked-in.

Yes, I think it should. Since the full backup will have a manifest file, there is no

reason for parallel backup to not support it.

I'll share the updated patch in the next couple of days.

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

10 December 2019, 14:34:14

On Thu, Nov 28, 2019 at 12:57 AM Robert Haas <robertmhaas@gmail.com> wrote:

On Wed, Nov 27, 2019 at 3:38 AM Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
> I am still not sure why we need SEND_BACKUP_FILELIST as a separate command.
> Can't we return the file list with START_BACKUP itself?

I had the same thought, but I think it's better to keep them separate.
Somebody might want to use the SEND_BACKUP_FILELIST command for
something other than a backup (I actually think it should be called
just SEND_FILE_LIST)

Sure. Thanks for the recommendation. To keep the function names in sync, I intend to do following the

following renamings:

- SEND_BACKUP_FILES --> SEND_FILES
- SEND_BACKUP_FILELIST --> SEND_FILE_LIST

. Somebody might want to start a backup without
getting a file list because they're going to copy the files at the FS
level. Somebody might want to get a list of files to process after
somebody else has started the backup on another connection. Or maybe
nobody wants to do any of those things, but it doesn't seem to cost us
much of anything to split the commands, so I think we should.

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

12 December 2019, 15:19:57

On Tue, Dec 10, 2019 at 7:34 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Thu, Nov 28, 2019 at 12:57 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Nov 27, 2019 at 3:38 AM Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
> I am still not sure why we need SEND_BACKUP_FILELIST as a separate command.
> Can't we return the file list with START_BACKUP itself?

I had the same thought, but I think it's better to keep them separate.
Somebody might want to use the SEND_BACKUP_FILELIST command for
something other than a backup (I actually think it should be called
just SEND_FILE_LIST)

Sure. Thanks for the recommendation. To keep the function names in sync, I intend to do following the
following renamings:
- SEND_BACKUP_FILES --> SEND_FILES
- SEND_BACKUP_FILELIST --> SEND_FILE_LIST

. Somebody might want to start a backup without
getting a file list because they're going to copy the files at the FS
level. Somebody might want to get a list of files to process after
somebody else has started the backup on another connection. Or maybe
nobody wants to do any of those things, but it doesn't seem to cost us
much of anything to split the commands, so I think we should.

+1

I have updated the patches (v7 attached) and have taken care of all issues pointed by Jeevan, additionally

ran the pgindent on each patch. Furthermore, Command names have been renamed as suggested and I

have simplified the SendFiles function. Client can only request the regular files, any other kind such as

directories or symlinks will be skipped, the client will be responsible for taking care of such.

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Attachment

Re: WIP/PoC for parallel backup

From

Robert Haas

Date:

19 December 2019, 17:47:22

On Thu, Dec 12, 2019 at 10:20 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> I have updated the patches (v7 attached) and have taken care of all issues pointed by Jeevan, additionally
> ran the pgindent on each patch. Furthermore, Command names have been renamed as suggested and I
> have simplified the SendFiles function. Client can only request the regular files, any other kind such as
> directories or symlinks will be skipped, the client will be responsible for taking care of such.

Hi,

Patch 0001 of this series conflicts with my recent commit
303640199d0436c5e7acdf50b837a027b5726594; that commit was actually
inspired by some previous study of 0001. That being said, I think 0001
has the wrong idea. There's no reason that I can see why it should be
correct to remove the PG_ENSURE_ERROR_CLEANUP calls from
perform_base_backup(). It's true that if we register a long-lived
before_shmem_exit hook, then the backup will get cleaned up even
without the PG_ENSURE_ERROR_CLEANUP block, but there's also the
question of the warning message. I think that our goal should be to
emit the warning message about a backup being stopped too early if the
user uses either pg_start_backup() or the new START_BACKUP command and
does not end the backup with either pg_stop_backup() or the new
STOP_BACKUP command -- but not if a single command that both starts
and ends a backup, like BASE_BACKUP, is interrupted. To accomplish
that goal in the wake of 303640199d0436c5e7acdf50b837a027b5726594, we
need to temporarily register do_pg_abort_backup() as a
before_shmem_exit() handler using PG_ENSURE_ERROR_CLEANUP() during
commands like BASE_BACKUP() -- and for things like pg_start_backup()
or the new START_BACKUP command, we just need to add a single call to
register_persistent_abort_backup_handler().

So I think you can drop 0001, and then in the patch that actually
introduces START_BACKUP, add the call to
register_persistent_abort_backup_handler() before calling
do_pg_start_backup(). Also in that patch, also adjust the warning text
that do_pg_abort_backup() emits to be more generic e.g. "aborting
backup due to backend exiting while a non-exclusive backup is in
progress".

0003 creates three new functions, moving code from
do_pg_start_backup() to a new function collectTablespaces() and from
perform_base_backup() to new functions setup_throttle() and
include_wal_files(). I'm skeptical about all of these changes. One
general nitpick is that the way these function names are capitalized
and punctuated does not seem to have been chosen very consistently;
how about name_like_this() throughout? A bit more substantively:

- collectTablespaces() is factored out of do_pg_start_backup() so that
it can also be used by SendFileList(), but that means that a client is
going to invoke START_BACKUP, indirectly calling collectTablespaces(),
and then immediately afterward the client is probably going to call
SEND_FILE_LIST, which will again call collectTablespaces(). That does
not appear to be super-great. For one thing, it's duplicate work,
although because SendFileList() is going to pass infotbssize as false,
it's not a lot of duplicated work. Also, what happens if the two calls
to collectTablespaces() return different answers due to concurrent
CREATE/DROP TABLESPACE commands? Maybe it would all work out fine, but
it seems like there is at least the possibility of bugs if different
parts of the backup have different notions of what tablespaces exist.

- setup_throttle() is factored out of perform_base_backup() so that it
can be called in StartBackup() and StopBackup() and SendFiles(). This
seems extremely odd. Why does it make any sense to give the user an
option to activate throttling when *ending* a backup? Why does it make
sense to give the user a chance to enable throttling *both* at the
startup of a backup *and also* for each individual file. If we're
going to support throttling here, it seems like it should be either a
backup-level property or a file-level property, not both.

- include_wal_files() is factored out of perform_base_backup() so that
it can be called by StopBackup(). This seems like a poor design
decision. The idea behind the BASE_BACKUP command is that you run that
one command, and the server sends you everything. The idea in this new
way of doing business is that the client requests the individual files
it wants -- except for the WAL files, which are for some reason not
requested individually but sent all together as part of the
STOP_BACKUP response. It seems like it would be more consistent if the
client were to decide which WAL files it needs and request them one by
one, just as we do with other files.

I think there's a common theme to all of these complaints, which is
that you haven't done enough to move things that are the
responsibility of the backend in the BASE_BACKUP model to the frontend
in this model. I started wondering, for example, whether it might not
be better to have the client rather than the server construct the
tablespace_map file. After all, the client needs to get the list of
files anyway (hence SEND_FILE_LIST) and if it's got that then it knows
almost enough to construct the tablespace map. The only additional
thing it needs is the full pathname to which the link points. But, it
seems that we could fairly easily extend SEND_FILE_LIST to send, for
files that are symbolic links, the target of the link, using a new
column. Or alternatively, using a separate command, so that instead of
just sending a single SEND_FILE_LIST command, the client might first
ask for a tablespace list and then might ask for a list of files
within each tablespace (e.g. LIST_TABLESPACES, then LIST_FILES <oid>
for each tablespace, with 0 for the main tablespace, perhaps). I'm not
sure which way is better.

Similarly, for throttling, I have a hard time understanding how what
you've got here is going to work reasonably. It looks like each client
is just going to request whatever MAX_RATE the user specifies, but the
result of that will be that the actual transfer rate is probably a
multiple of the specified rate, approximately equal to the specified
rate times the number of clients. That's probably not what the user
wants. You could take the specified rate and divide it by the number
of workers, but limiting each of 4 workers to a quarter of the rate
will probably lead to a combined rate of less than than the specified
rate, because if one worker doesn't use all of the bandwidth to which
it's entitled, or even exits earlier than the others, the other
workers don't get to go any faster as a result. Another problem is
that, in the current approach, throttling applies overall to the
entire backup, but in this approach, it is applied separately to each
SEND_FILE command. In the current approach, if one file finishes a
little faster or slower than anticipated, the next file in the tarball
will be sent a little slower or faster to compensate. But in this
approach, each SEND_FILES command is throttled separately, so this
property is lost. Furthermore, while BASEBACKUP sends data
continuously, this approach naturally involves pauses between
commands. If files are large, that won't matter much, but if they're
small and numerous, it will tend to cause the actual transfer rate to
be less than the throttling rate.

One potential way to solve this problem is... move it to the client
side. Instead of making it the server's job not to send data too fast,
make it the client's job not to receive data too fast. Let the server
backends write as fast as they want, and on the pg_basebackup side,
have the threads coordinate with each other so that they don't read
data faster than the configured rate. That's not quite the same thing,
though, because the server can get ahead by the size of the client's
receive buffers plus whatever data is on the wire. I don't know
whether that's a big enough problem to be worth caring about. If it
is, then I think we need some server infrastructure to "group
throttle" a group of cooperating backends.

A general comment about 0004 is that it seems like you've proceeded by
taking the code from perform_base_backup() and spreading it across
several different functions without, necessarily, as much thought as
is needed there. For instance, StartBackup() looks like just the
beginning of perform_base_backup(). But, why shouldn't it instead look
like pg_start_backup() -- in fact, a simplified version that only
handles the non-exclusive backup case? Is the extra stuff it's doing
really appropriate? I've already complained about the
tablespace-related stuff here and the throttling, but there's more.
Setting statrelpath here will probably break if somebody tries to use
SEND_FILES without first calling START_BACKUP. Sending the
backup_label file here is oddly asymmetric, because that's done by
pg_stop_backup(), not pg_start_backup(). And similarly, StopBackup()
looks like it's just the end of perform_base_backup(), but that's not
pretty strange-looking too. Again, I've already complained about
include_wal_files() being part of this, but there's also:

+ /* ... and pg_control after everything else. */

...which (1) is an odd thing to say when this is the first thing this
particular function is to send and (2) is another example of a sloppy
division of labor between client and server; apparently, the client is
supposed to know not to request pg_control, because the server is
going to send it unsolicited. There's no particular reason to have
this a special case. The client could just request it last. And then
the server code wouldn't need a special case, and you wouldn't have
this odd logic split between the client and the server.

Overall, I think this needs a lot more work. The overall idea's not
wrong, but there seem to be a very large number of details which, at
least to me, do not seem to be correct.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

04 January 2020, 06:53:52

On Thu, Dec 19, 2019 at 10:47 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Dec 12, 2019 at 10:20 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> I have updated the patches (v7 attached) and have taken care of all issues pointed by Jeevan, additionally
> ran the pgindent on each patch. Furthermore, Command names have been renamed as suggested and I
> have simplified the SendFiles function. Client can only request the regular files, any other kind such as
> directories or symlinks will be skipped, the client will be responsible for taking care of such.

Hi,

Patch 0001 of this series conflicts with my recent commit
303640199d0436c5e7acdf50b837a027b5726594; that commit was actually
inspired by some previous study of 0001. That being said, I think 0001
has the wrong idea. There's no reason that I can see why it should be
correct to remove the PG_ENSURE_ERROR_CLEANUP calls from
perform_base_backup(). It's true that if we register a long-lived
before_shmem_exit hook, then the backup will get cleaned up even
without the PG_ENSURE_ERROR_CLEANUP block, but there's also the
question of the warning message. I think that our goal should be to
emit the warning message about a backup being stopped too early if the
user uses either pg_start_backup() or the new START_BACKUP command and
does not end the backup with either pg_stop_backup() or the new
STOP_BACKUP command -- but not if a single command that both starts
and ends a backup, like BASE_BACKUP, is interrupted. To accomplish
that goal in the wake of 303640199d0436c5e7acdf50b837a027b5726594, we
need to temporarily register do_pg_abort_backup() as a
before_shmem_exit() handler using PG_ENSURE_ERROR_CLEANUP() during
commands like BASE_BACKUP() -- and for things like pg_start_backup()
or the new START_BACKUP command, we just need to add a single call to
register_persistent_abort_backup_handler().

So I think you can drop 0001, and then in the patch that actually
introduces START_BACKUP, add the call to
register_persistent_abort_backup_handler() before calling
do_pg_start_backup(). Also in that patch, also adjust the warning text
that do_pg_abort_backup() emits to be more generic e.g. "aborting
backup due to backend exiting while a non-exclusive backup is in
progress".

Sure. will do.

0003 creates three new functions, moving code from
do_pg_start_backup() to a new function collectTablespaces() and from
perform_base_backup() to new functions setup_throttle() and
include_wal_files(). I'm skeptical about all of these changes. One
general nitpick is that the way these function names are capitalized
and punctuated does not seem to have been chosen very consistently;
how about name_like_this() throughout? A bit more substantively:

- collectTablespaces() is factored out of do_pg_start_backup() so that
it can also be used by SendFileList(), but that means that a client is
going to invoke START_BACKUP, indirectly calling collectTablespaces(),
and then immediately afterward the client is probably going to call
SEND_FILE_LIST, which will again call collectTablespaces(). That does
not appear to be super-great. For one thing, it's duplicate work,
although because SendFileList() is going to pass infotbssize as false,
it's not a lot of duplicated work.

I'll remove this duplication by eliminating this call from START_BACKUP and

SEND_FILE_LIST functions. More about this is explained later in this email.

Also, what happens if the two calls
to collectTablespaces() return different answers due to concurrent
CREATE/DROP TABLESPACE commands? Maybe it would all work out fine, but
it seems like there is at least the possibility of bugs if different
parts of the backup have different notions of what tablespaces exist.

The concurrent CREATE/DROP TABLESPACE commands, it can happen and will

be resolved by the WAL files collected for the backup. I don't think we

can do anything when objects are created or dropped in-between start and

stop backup. BASE_BACKUPalso relies on the WAL files to handle such a

scenario and does not error out when some relation files go away.

- setup_throttle() is factored out of perform_base_backup() so that it
can be called in StartBackup() and StopBackup() and SendFiles(). This
seems extremely odd. Why does it make any sense to give the user an
option to activate throttling when *ending* a backup? Why does it make
sense to give the user a chance to enable throttling *both* at the
startup of a backup *and also* for each individual file. If we're
going to support throttling here, it seems like it should be either a
backup-level property or a file-level property, not both.

It's a file-level property only. Throttle functionality relies on global

variables. StartBackup() and StopBackup() are calling setup_throttle

function to disable the throttling.

I should have been more explicit here by using -1 to setup_throttle,

Illustrating that throttling is disabled, instead of using 'opt->maxrate'.

(Although it defaults to -1 for these functions).

I'll remove the setup_throttle() call for both functions.

- include_wal_files() is factored out of perform_base_backup() so that
it can be called by StopBackup(). This seems like a poor design
decision. The idea behind the BASE_BACKUP command is that you run that
one command, and the server sends you everything. The idea in this new
way of doing business is that the client requests the individual files
it wants -- except for the WAL files, which are for some reason not
requested individually but sent all together as part of the
STOP_BACKUP response. It seems like it would be more consistent if the
client were to decide which WAL files it needs and request them one by
one, just as we do with other files.

As I understand you are suggesting to add another command to fetch the

list of WAL files which would be called by the client after executing stop

backup. Once the client gets that list, it starts requesting the WAL files one

by one.

So I will add LIST_WAL_FILES command that will take start_lsn and end_lsn

as arguments and return the list of WAL files between these LSNs.

Something like this :

LIST_WAL_FILES 'start_lsn' 'end_lsn';

I think there's a common theme to all of these complaints, which is
that you haven't done enough to move things that are the
responsibility of the backend in the BASE_BACKUP model to the frontend
in this model. I started wondering, for example, whether it might not
be better to have the client rather than the server construct the
tablespace_map file. After all, the client needs to get the list of
files anyway (hence SEND_FILE_LIST) and if it's got that then it knows
almost enough to construct the tablespace map. The only additional
thing it needs is the full pathname to which the link points. But, it
seems that we could fairly easily extend SEND_FILE_LIST to send, for
files that are symbolic links, the target of the link, using a new
column. Or alternatively, using a separate command, so that instead of
just sending a single SEND_FILE_LIST command, the client might first
ask for a tablespace list and then might ask for a list of files
within each tablespace (e.g. LIST_TABLESPACES, then LIST_FILES <oid>
for each tablespace, with 0 for the main tablespace, perhaps). I'm not
sure which way is better.

do_pg_start_backup is collecting the tablespace information anyway to

build the tablespace_map for BASE_BACKUP. So returning the same seemed

better than adding a new command for the same information. hence multiple

calls to the collectTablespaces() [to be renamed to collect_tablespaces].

tablespace_map can be constructed by the client, but then BASE_BACKUP

is returning it as part of the full backup. If clients in parallel mode

are to construct this themselves, these will seem like two different

approaches. Perhaps this should be done for BASE_BACKUP as

well?

I'll refactor the do_pg_start_backup function to remove the code related

to tablespace information collection (to collect_tablespaces) and

tablespace_map file creation, so that this function does not collect this

information unnecessarily. perform_base_backup function can collect and

send the tablespace information to the client and then the client can

construct the tablespace_map file.

I'll add a new command to fetch the list of tablespaces i.e. LIST_TABLESPACES

which will return the tablespace information to the client for parallel

mode. And will refactor START_BACKUP and STOP_BACKUP commands,

so that they only do the specific job of putting the system in backup mode or

out of it, nothing else.These commands should only return the start and end

LSN to the client.

Similarly, for throttling, I have a hard time understanding how what
you've got here is going to work reasonably. It looks like each client
is just going to request whatever MAX_RATE the user specifies, but the
result of that will be that the actual transfer rate is probably a
multiple of the specified rate, approximately equal to the specified
rate times the number of clients. That's probably not what the user
wants. You could take the specified rate and divide it by the number
of workers, but limiting each of 4 workers to a quarter of the rate
will probably lead to a combined rate of less than than the specified
rate, because if one worker doesn't use all of the bandwidth to which
it's entitled, or even exits earlier than the others, the other
workers don't get to go any faster as a result. Another problem is
that, in the current approach, throttling applies overall to the
entire backup, but in this approach, it is applied separately to each
SEND_FILE command. In the current approach, if one file finishes a
little faster or slower than anticipated, the next file in the tarball
will be sent a little slower or faster to compensate. But in this
approach, each SEND_FILES command is throttled separately, so this
property is lost. Furthermore, while BASEBACKUP sends data
continuously, this approach naturally involves pauses between
commands. If files are large, that won't matter much, but if they're
small and numerous, it will tend to cause the actual transfer rate to
be less than the throttling rate.

One potential way to solve this problem is... move it to the client
side. Instead of making it the server's job not to send data too fast,
make it the client's job not to receive data too fast. Let the server
backends write as fast as they want, and on the pg_basebackup side,
have the threads coordinate with each other so that they don't read
data faster than the configured rate. That's not quite the same thing,
though, because the server can get ahead by the size of the client's
receive buffers plus whatever data is on the wire. I don't know
whether that's a big enough problem to be worth caring about. If it
is, then I think we need some server infrastructure to "group
throttle" a group of cooperating backends.

That was a mistake in my code. maxrate should've been equally divided

amongst all threads. I agree that we should move this to the client-side.

When a thread exits, its share should also be equally divided amongst

the remaining threads (i.e. recalculate maxrate for each remaining thread).

Say we have 4 running threads with each allocation 25% of the bandwidth.

Thread 1 exits. We recalculate bandwidth and assign the remaining 3 threads

33.33% each. This solves one problem that you had identified. However,

it doesn't solve where one (or more) thread is not fully consuming their

allocated share. I'm not really sure how we can solve it properly. Suggestions

are welcome.

A general comment about 0004 is that it seems like you've proceeded by
taking the code from perform_base_backup() and spreading it across
several different functions without, necessarily, as much thought as
is needed there. For instance, StartBackup() looks like just the
beginning of perform_base_backup(). But, why shouldn't it instead look
like pg_start_backup() -- in fact, a simplified version that only
handles the non-exclusive backup case? Is the extra stuff it's doing
really appropriate? I've already complained about the
tablespace-related stuff here and the throttling, but there's more.
Setting statrelpath here will probably break if somebody tries to use
SEND_FILES without first calling START_BACKUP. Sending the
backup_label file here is oddly asymmetric, because that's done by
pg_stop_backup(), not pg_start_backup(). And similarly, StopBackup()
looks like it's just the end of perform_base_backup(), but that's not
pretty strange-looking too. Again, I've already complained about
include_wal_files() being part of this, but there's also:

+ /* ... and pg_control after everything else. */

...which (1) is an odd thing to say when this is the first thing this
particular function is to send and (2) is another example of a sloppy
division of labor between client and server; apparently, the client is
supposed to know not to request pg_control, because the server is
going to send it unsolicited. There's no particular reason to have
this a special case. The client could just request it last. And then
the server code wouldn't need a special case, and you wouldn't have
this odd logic split between the client and the server.

Overall, I think this needs a lot more work. The overall idea's not
wrong, but there seem to be a very large number of details which, at
least to me, do not seem to be correct.

Thank you Robert for the detailed review. I really appreciate your insights

and very precise feedback.

After the changes suggested above, the design on a high level will look something

like this:

=== SEQUENTIAL EXECUTION ===

START_BACKUP [LABEL | FAST]

- Starts backup on the server

- Returns the start LSN to client

LIST_TABLESPACES

- Sends a list of all tables spaces to client

Loops over LIST_TABLESPACES

- LIST_FILES [tablespace]

- Sends file list for the given tablespace

- Create a list of all files

=== PARALLEL EXECUTION ===

Thread loop until the list of files is exhausted

SEND_FILE <file(s)> [CHECKSUM | WAL_START_LOCATION]

- If the checksum is enabled then WAL_START_LOCATION is required.

- Can request server to send one or more files but we are requesting one at a time

- Pick next file from list of files

- Threads sleep after the list is exhausted

- All threads are sleeping

=== SEQUENTIAL EXECUTION ===

STOP_BACKUP [NOWAIT]

- Stops backup mode

- Return end LSN

If --wal-method=fetch then

LIST_WAL_FILES 'start_lsn' 'end_lsn'

- Sends a list of WAL files between start LSN and end LSN

=== PARALLEL EXECUTION ===

Thread loop until the list of WAL files is exhausted

SEND_FILE <WAL file>

- Can request server to send one or more files but we are requesting one WAL file at a time

- Pick next file from list of WAL files

- Threads terminate and set their status as completed/terminated

=== SEQUENTIAL EXECUTION ===

Cleanup

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

30 January 2020, 13:39:51

On Sat, Jan 4, 2020 at 11:53 AM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Thu, Dec 19, 2019 at 10:47 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Dec 12, 2019 at 10:20 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> I have updated the patches (v7 attached) and have taken care of all issues pointed by Jeevan, additionally
> ran the pgindent on each patch. Furthermore, Command names have been renamed as suggested and I
> have simplified the SendFiles function. Client can only request the regular files, any other kind such as
> directories or symlinks will be skipped, the client will be responsible for taking care of such.

Hi,

Patch 0001 of this series conflicts with my recent commit
303640199d0436c5e7acdf50b837a027b5726594; that commit was actually
inspired by some previous study of 0001. That being said, I think 0001
has the wrong idea. There's no reason that I can see why it should be
correct to remove the PG_ENSURE_ERROR_CLEANUP calls from
perform_base_backup(). It's true that if we register a long-lived
before_shmem_exit hook, then the backup will get cleaned up even
without the PG_ENSURE_ERROR_CLEANUP block, but there's also the
question of the warning message. I think that our goal should be to
emit the warning message about a backup being stopped too early if the
user uses either pg_start_backup() or the new START_BACKUP command and
does not end the backup with either pg_stop_backup() or the new
STOP_BACKUP command -- but not if a single command that both starts
and ends a backup, like BASE_BACKUP, is interrupted. To accomplish
that goal in the wake of 303640199d0436c5e7acdf50b837a027b5726594, we
need to temporarily register do_pg_abort_backup() as a
before_shmem_exit() handler using PG_ENSURE_ERROR_CLEANUP() during
commands like BASE_BACKUP() -- and for things like pg_start_backup()
or the new START_BACKUP command, we just need to add a single call to
register_persistent_abort_backup_handler().

So I think you can drop 0001, and then in the patch that actually
introduces START_BACKUP, add the call to
register_persistent_abort_backup_handler() before calling
do_pg_start_backup(). Also in that patch, also adjust the warning text
that do_pg_abort_backup() emits to be more generic e.g. "aborting
backup due to backend exiting while a non-exclusive backup is in
progress".

Sure. will do.

0003 creates three new functions, moving code from
do_pg_start_backup() to a new function collectTablespaces() and from
perform_base_backup() to new functions setup_throttle() and
include_wal_files(). I'm skeptical about all of these changes. One
general nitpick is that the way these function names are capitalized
and punctuated does not seem to have been chosen very consistently;
how about name_like_this() throughout? A bit more substantively:

- collectTablespaces() is factored out of do_pg_start_backup() so that
it can also be used by SendFileList(), but that means that a client is
going to invoke START_BACKUP, indirectly calling collectTablespaces(),
and then immediately afterward the client is probably going to call
SEND_FILE_LIST, which will again call collectTablespaces(). That does
not appear to be super-great. For one thing, it's duplicate work,
although because SendFileList() is going to pass infotbssize as false,
it's not a lot of duplicated work.

I'll remove this duplication by eliminating this call from START_BACKUP and
SEND_FILE_LIST functions. More about this is explained later in this email.

Also, what happens if the two calls
to collectTablespaces() return different answers due to concurrent
CREATE/DROP TABLESPACE commands? Maybe it would all work out fine, but
it seems like there is at least the possibility of bugs if different
parts of the backup have different notions of what tablespaces exist.

The concurrent CREATE/DROP TABLESPACE commands, it can happen and will
be resolved by the WAL files collected for the backup. I don't think we
can do anything when objects are created or dropped in-between start and
stop backup. BASE_BACKUPalso relies on the WAL files to handle such a
scenario and does not error out when some relation files go away.

- setup_throttle() is factored out of perform_base_backup() so that it
can be called in StartBackup() and StopBackup() and SendFiles(). This
seems extremely odd. Why does it make any sense to give the user an
option to activate throttling when *ending* a backup? Why does it make
sense to give the user a chance to enable throttling *both* at the
startup of a backup *and also* for each individual file. If we're
going to support throttling here, it seems like it should be either a
backup-level property or a file-level property, not both.

It's a file-level property only. Throttle functionality relies on global
variables. StartBackup() and StopBackup() are calling setup_throttle
function to disable the throttling.

I should have been more explicit here by using -1 to setup_throttle,
Illustrating that throttling is disabled, instead of using 'opt->maxrate'.
(Although it defaults to -1 for these functions).

I'll remove the setup_throttle() call for both functions.

- include_wal_files() is factored out of perform_base_backup() so that
it can be called by StopBackup(). This seems like a poor design
decision. The idea behind the BASE_BACKUP command is that you run that
one command, and the server sends you everything. The idea in this new
way of doing business is that the client requests the individual files
it wants -- except for the WAL files, which are for some reason not
requested individually but sent all together as part of the
STOP_BACKUP response. It seems like it would be more consistent if the
client were to decide which WAL files it needs and request them one by
one, just as we do with other files.

As I understand you are suggesting to add another command to fetch the
list of WAL files which would be called by the client after executing stop
backup. Once the client gets that list, it starts requesting the WAL files one
by one.

So I will add LIST_WAL_FILES command that will take start_lsn and end_lsn
as arguments and return the list of WAL files between these LSNs.

Something like this :
LIST_WAL_FILES 'start_lsn' 'end_lsn';

I think there's a common theme to all of these complaints, which is
that you haven't done enough to move things that are the
responsibility of the backend in the BASE_BACKUP model to the frontend
in this model. I started wondering, for example, whether it might not
be better to have the client rather than the server construct the
tablespace_map file. After all, the client needs to get the list of
files anyway (hence SEND_FILE_LIST) and if it's got that then it knows
almost enough to construct the tablespace map. The only additional
thing it needs is the full pathname to which the link points. But, it
seems that we could fairly easily extend SEND_FILE_LIST to send, for
files that are symbolic links, the target of the link, using a new
column. Or alternatively, using a separate command, so that instead of
just sending a single SEND_FILE_LIST command, the client might first
ask for a tablespace list and then might ask for a list of files
within each tablespace (e.g. LIST_TABLESPACES, then LIST_FILES <oid>
for each tablespace, with 0 for the main tablespace, perhaps). I'm not
sure which way is better.

do_pg_start_backup is collecting the tablespace information anyway to
build the tablespace_map for BASE_BACKUP. So returning the same seemed
better than adding a new command for the same information. hence multiple
calls to the collectTablespaces() [to be renamed to collect_tablespaces].

tablespace_map can be constructed by the client, but then BASE_BACKUP
is returning it as part of the full backup. If clients in parallel mode
are to construct this themselves, these will seem like two different
approaches. Perhaps this should be done for BASE_BACKUP as
well?

I'll refactor the do_pg_start_backup function to remove the code related
to tablespace information collection (to collect_tablespaces) and
tablespace_map file creation, so that this function does not collect this
information unnecessarily. perform_base_backup function can collect and
send the tablespace information to the client and then the client can
construct the tablespace_map file.

I'll add a new command to fetch the list of tablespaces i.e. LIST_TABLESPACES
which will return the tablespace information to the client for parallel
mode. And will refactor START_BACKUP and STOP_BACKUP commands,
so that they only do the specific job of putting the system in backup mode or
out of it, nothing else.These commands should only return the start and end
LSN to the client.

Similarly, for throttling, I have a hard time understanding how what
you've got here is going to work reasonably. It looks like each client
is just going to request whatever MAX_RATE the user specifies, but the
result of that will be that the actual transfer rate is probably a
multiple of the specified rate, approximately equal to the specified
rate times the number of clients. That's probably not what the user
wants. You could take the specified rate and divide it by the number
of workers, but limiting each of 4 workers to a quarter of the rate
will probably lead to a combined rate of less than than the specified
rate, because if one worker doesn't use all of the bandwidth to which
it's entitled, or even exits earlier than the others, the other
workers don't get to go any faster as a result. Another problem is
that, in the current approach, throttling applies overall to the
entire backup, but in this approach, it is applied separately to each
SEND_FILE command. In the current approach, if one file finishes a
little faster or slower than anticipated, the next file in the tarball
will be sent a little slower or faster to compensate. But in this
approach, each SEND_FILES command is throttled separately, so this
property is lost. Furthermore, while BASEBACKUP sends data
continuously, this approach naturally involves pauses between
commands. If files are large, that won't matter much, but if they're
small and numerous, it will tend to cause the actual transfer rate to
be less than the throttling rate.

One potential way to solve this problem is... move it to the client
side. Instead of making it the server's job not to send data too fast,
make it the client's job not to receive data too fast. Let the server
backends write as fast as they want, and on the pg_basebackup side,
have the threads coordinate with each other so that they don't read
data faster than the configured rate. That's not quite the same thing,
though, because the server can get ahead by the size of the client's
receive buffers plus whatever data is on the wire. I don't know
whether that's a big enough problem to be worth caring about. If it
is, then I think we need some server infrastructure to "group
throttle" a group of cooperating backends.

That was a mistake in my code. maxrate should've been equally divided
amongst all threads. I agree that we should move this to the client-side.
When a thread exits, its share should also be equally divided amongst
the remaining threads (i.e. recalculate maxrate for each remaining thread).

Say we have 4 running threads with each allocation 25% of the bandwidth.
Thread 1 exits. We recalculate bandwidth and assign the remaining 3 threads
33.33% each. This solves one problem that you had identified. However,
it doesn't solve where one (or more) thread is not fully consuming their
allocated share. I'm not really sure how we can solve it properly. Suggestions
are welcome.

A general comment about 0004 is that it seems like you've proceeded by
taking the code from perform_base_backup() and spreading it across
several different functions without, necessarily, as much thought as
is needed there. For instance, StartBackup() looks like just the
beginning of perform_base_backup(). But, why shouldn't it instead look
like pg_start_backup() -- in fact, a simplified version that only
handles the non-exclusive backup case? Is the extra stuff it's doing
really appropriate? I've already complained about the
tablespace-related stuff here and the throttling, but there's more.
Setting statrelpath here will probably break if somebody tries to use
SEND_FILES without first calling START_BACKUP. Sending the
backup_label file here is oddly asymmetric, because that's done by
pg_stop_backup(), not pg_start_backup(). And similarly, StopBackup()
looks like it's just the end of perform_base_backup(), but that's not
pretty strange-looking too. Again, I've already complained about
include_wal_files() being part of this, but there's also:

+ /* ... and pg_control after everything else. */

...which (1) is an odd thing to say when this is the first thing this
particular function is to send and (2) is another example of a sloppy
division of labor between client and server; apparently, the client is
supposed to know not to request pg_control, because the server is
going to send it unsolicited. There's no particular reason to have
this a special case. The client could just request it last. And then
the server code wouldn't need a special case, and you wouldn't have
this odd logic split between the client and the server.

Overall, I think this needs a lot more work. The overall idea's not
wrong, but there seem to be a very large number of details which, at
least to me, do not seem to be correct.

Thank you Robert for the detailed review. I really appreciate your insights
and very precise feedback.

After the changes suggested above, the design on a high level will look something
like this:

=== SEQUENTIAL EXECUTION ===
START_BACKUP [LABEL | FAST]
- Starts backup on the server
- Returns the start LSN to client

LIST_TABLESPACES
- Sends a list of all tables spaces to client

Loops over LIST_TABLESPACES
- LIST_FILES [tablespace]
- Sends file list for the given tablespace
- Create a list of all files

=== PARALLEL EXECUTION ===
Thread loop until the list of files is exhausted
SEND_FILE <file(s)> [CHECKSUM | WAL_START_LOCATION]
- If the checksum is enabled then WAL_START_LOCATION is required.
- Can request server to send one or more files but we are requesting one at a time
- Pick next file from list of files

- Threads sleep after the list is exhausted
- All threads are sleeping

=== SEQUENTIAL EXECUTION ===
STOP_BACKUP [NOWAIT]
- Stops backup mode
- Return end LSN

If --wal-method=fetch then
LIST_WAL_FILES 'start_lsn' 'end_lsn'
- Sends a list of WAL files between start LSN and end LSN

=== PARALLEL EXECUTION ===
Thread loop until the list of WAL files is exhausted
SEND_FILE <WAL file>
- Can request server to send one or more files but we are requesting one WAL file at a time
- Pick next file from list of WAL files

- Threads terminate and set their status as completed/terminated

=== SEQUENTIAL EXECUTION ===
Cleanup

Here are the the updated patches, taking care of the issues pointed

earlier. This patch adds the following commands (with specified option):

START_BACKUP [LABEL '<label>'] [FAST]

STOP_BACKUP [NOWAIT]

LIST_TABLESPACES [PROGRESS]

LIST_FILES [TABLESPACE]

LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']

SEND_FILES '(' FILE, FILE... ')' [START_WAL_LOCATION 'X/X']

[NOVERIFY_CHECKSUMS]

Parallel backup is not making any use of tablespace map, so I have

removed that option from the above commands. There is a patch pending

to remove the exclusive backup; we can further refactor the do_pg_start_backup

function at that time, to remove the tablespace information and move the

creation of tablespace_map file to the client.

I have disabled the maxrate option for parallel backup. I intend to send

out a separate patch for it. Robert previously suggested to implement

throttling on the client-side. I found the original email thread [1]

where throttling was proposed and added to the server. In that thread,

it was originally implemented on the client-side, but per many suggestions,

it was moved to server-side.

So, I have a few suggestions on how we can implement this:

1- have another option for pg_basebackup (i.e. per-worker-maxrate) where

the user could choose the bandwidth allocation for each worker. This approach

can be implemented on the client-side as well as on the server-side.

2- have the maxrate, be divided among workers equally at first. and the

let the main thread keep adjusting it whenever one of the workers finishes.

I believe this would only be possible if we handle throttling on the client.

Also, as I understand it, implementing this will introduce additional mutex

for handling of bandwidth consumption data so that rate may be adjusted

according to data received by threads.

[1] https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Attachment

Re: WIP/PoC for parallel backup

From

Jeevan Chalke

Date:

10 February 2020, 13:48:37

Hi Asif,

On Thu, Jan 30, 2020 at 7:10 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

Here are the the updated patches, taking care of the issues pointed
earlier. This patch adds the following commands (with specified option):

START_BACKUP [LABEL '<label>'] [FAST]
STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILES '(' FILE, FILE... ')' [START_WAL_LOCATION 'X/X']
[NOVERIFY_CHECKSUMS]

Parallel backup is not making any use of tablespace map, so I have
removed that option from the above commands. There is a patch pending
to remove the exclusive backup; we can further refactor the do_pg_start_backup
function at that time, to remove the tablespace information and move the
creation of tablespace_map file to the client.

I have disabled the maxrate option for parallel backup. I intend to send
out a separate patch for it. Robert previously suggested to implement
throttling on the client-side. I found the original email thread [1]
where throttling was proposed and added to the server. In that thread,
it was originally implemented on the client-side, but per many suggestions,
it was moved to server-side.

So, I have a few suggestions on how we can implement this:

1- have another option for pg_basebackup (i.e. per-worker-maxrate) where
the user could choose the bandwidth allocation for each worker. This approach
can be implemented on the client-side as well as on the server-side.

2- have the maxrate, be divided among workers equally at first. and the
let the main thread keep adjusting it whenever one of the workers finishes.
I believe this would only be possible if we handle throttling on the client.
Also, as I understand it, implementing this will introduce additional mutex
for handling of bandwidth consumption data so that rate may be adjusted
according to data received by threads.

[1] https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

The latest changes look good to me. However, the patch set is missing the documentation.

Please add those.

Thanks

Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

17 February 2020, 08:39:08

Thanks Jeevan. Here is the documentation patch.

On Mon, Feb 10, 2020 at 6:49 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:

Hi Asif,

On Thu, Jan 30, 2020 at 7:10 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

Here are the the updated patches, taking care of the issues pointed
earlier. This patch adds the following commands (with specified option):

START_BACKUP [LABEL '<label>'] [FAST]
STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILES '(' FILE, FILE... ')' [START_WAL_LOCATION 'X/X']
[NOVERIFY_CHECKSUMS]

Parallel backup is not making any use of tablespace map, so I have
removed that option from the above commands. There is a patch pending
to remove the exclusive backup; we can further refactor the do_pg_start_backup
function at that time, to remove the tablespace information and move the
creation of tablespace_map file to the client.

I have disabled the maxrate option for parallel backup. I intend to send
out a separate patch for it. Robert previously suggested to implement
throttling on the client-side. I found the original email thread [1]
where throttling was proposed and added to the server. In that thread,
it was originally implemented on the client-side, but per many suggestions,
it was moved to server-side.

So, I have a few suggestions on how we can implement this:

1- have another option for pg_basebackup (i.e. per-worker-maxrate) where
the user could choose the bandwidth allocation for each worker. This approach
can be implemented on the client-side as well as on the server-side.

2- have the maxrate, be divided among workers equally at first. and the
let the main thread keep adjusting it whenever one of the workers finishes.
I believe this would only be possible if we handle throttling on the client.
Also, as I understand it, implementing this will introduce additional mutex
for handling of bandwidth consumption data so that rate may be adjusted
according to data received by threads.

[1] https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

The latest changes look good to me. However, the patch set is missing the documentation.
Please add those.

Thanks

--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Attachment

0006-parallel-backup-documentation.patch

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

25 February 2020, 14:18:42

Hi,

I have created a commitfest entry.

https://commitfest.postgresql.org/27/2472/

On Mon, Feb 17, 2020 at 1:39 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

Thanks Jeevan. Here is the documentation patch.

On Mon, Feb 10, 2020 at 6:49 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:
Hi Asif,

On Thu, Jan 30, 2020 at 7:10 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

Here are the the updated patches, taking care of the issues pointed
earlier. This patch adds the following commands (with specified option):

START_BACKUP [LABEL '<label>'] [FAST]
STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILES '(' FILE, FILE... ')' [START_WAL_LOCATION 'X/X']
[NOVERIFY_CHECKSUMS]

Parallel backup is not making any use of tablespace map, so I have
removed that option from the above commands. There is a patch pending
to remove the exclusive backup; we can further refactor the do_pg_start_backup
function at that time, to remove the tablespace information and move the
creation of tablespace_map file to the client.

I have disabled the maxrate option for parallel backup. I intend to send
out a separate patch for it. Robert previously suggested to implement
throttling on the client-side. I found the original email thread [1]
where throttling was proposed and added to the server. In that thread,
it was originally implemented on the client-side, but per many suggestions,
it was moved to server-side.

So, I have a few suggestions on how we can implement this:

1- have another option for pg_basebackup (i.e. per-worker-maxrate) where
the user could choose the bandwidth allocation for each worker. This approach
can be implemented on the client-side as well as on the server-side.

2- have the maxrate, be divided among workers equally at first. and the
let the main thread keep adjusting it whenever one of the workers finishes.
I believe this would only be possible if we handle throttling on the client.
Also, as I understand it, implementing this will introduce additional mutex
for handling of bandwidth consumption data so that rate may be adjusted
according to data received by threads.

[1] https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

The latest changes look good to me. However, the patch set is missing the documentation.
Please add those.

Thanks

--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

--
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

From

Rajkumar Raghuwanshi

Date:

11 March 2020, 09:38:20

Hi Asif

I have started testing this feature. I have applied v6 patch on commit a069218163704c44a8996e7e98e765c56e2b9c8e (30 Jan).

I got few observations, please take a look.

--if backup failed, backup directory is not getting removed.
[edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=9 -D /tmp/test_bkp/bkp6
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
[edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=8 -D /tmp/test_bkp/bkp6
pg_basebackup: error: directory "/tmp/test_bkp/bkp6" exists but is not empty

--giving large number of jobs leading segmentation fault.
./pg_basebackup -p 5432 --jobs=1000 -D /tmp/t3
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
.
.
.
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: could not fork new process for connection: Resource temporarily unavailable

could not fork new process for connection: Resource temporarily unavailable
pg_basebackup: error: failed to create thread: Resource temporarily unavailable
Segmentation fault (core dumped)

--stack-trace
gdb -q -c core.11824 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 --jobs=1000 -D /tmp/test_bkp/bkp10'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=140503120623360, thread_return=0x0) at pthread_join.c:46
46 if (INVALID_NOT_TERMINATED_TD_P (pd))
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 pthread_join (threadid=140503120623360, thread_return=0x0) at pthread_join.c:46
#1 0x0000000000408e21 in cleanup_workers () at pg_basebackup.c:2840
#2 0x0000000000403846 in disconnect_atexit () at pg_basebackup.c:316
#3 0x0000003921235a02 in __run_exit_handlers (status=1) at exit.c:78
#4 exit (status=1) at exit.c:100
#5 0x0000000000408aa6 in create_parallel_workers (backupinfo=0x1a4b8c0) at pg_basebackup.c:2713
#6 0x0000000000407946 in BaseBackup () at pg_basebackup.c:2127
#7 0x000000000040895c in main (argc=6, argv=0x7ffd566f4718) at pg_basebackup.c:2668

--with tablespace is in the same directory as data, parallel_backup crashed
[edb@localhost bin]$ ./initdb -D /tmp/data
[edb@localhost bin]$ ./pg_ctl -D /tmp/data -l /tmp/logfile start
[edb@localhost bin]$ mkdir /tmp/ts
[edb@localhost bin]$ ./psql postgres
psql (13devel)
Type "help" for help.

postgres=# create tablespace ts location '/tmp/ts';
CREATE TABLESPACE
postgres=# create table tx (a int) tablespace ts;
CREATE TABLE
postgres=# \q
[edb@localhost bin]$ ./pg_basebackup -j 2 -D /tmp/tts -T /tmp/ts=/tmp/ts1
Segmentation fault (core dumped)

--stack-trace
[edb@localhost bin]$ gdb -q -c core.15778 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -j 2 -D /tmp/tts -T /tmp/ts=/tmp/ts1'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000409442 in get_backup_filelist (conn=0x140cb20, backupInfo=0x14210a0) at pg_basebackup.c:3000
3000 backupInfo->curr->next = file;
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x0000000000409442 in get_backup_filelist (conn=0x140cb20, backupInfo=0x14210a0) at pg_basebackup.c:3000
#1 0x0000000000408b56 in parallel_backup_run (backupinfo=0x14210a0) at pg_basebackup.c:2739
#2 0x0000000000407955 in BaseBackup () at pg_basebackup.c:2128
#3 0x000000000040895c in main (argc=7, argv=0x7ffca2910c58) at pg_basebackup.c:2668
(gdb)

Thanks & Regards,

Rajkumar Raghuwanshi

On Tue, Feb 25, 2020 at 7:49 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

Hi,

I have created a commitfest entry.
https://commitfest.postgresql.org/27/2472/

On Mon, Feb 17, 2020 at 1:39 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
Thanks Jeevan. Here is the documentation patch.

On Mon, Feb 10, 2020 at 6:49 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:
Hi Asif,

On Thu, Jan 30, 2020 at 7:10 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

Here are the the updated patches, taking care of the issues pointed
earlier. This patch adds the following commands (with specified option):

START_BACKUP [LABEL '<label>'] [FAST]
STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILES '(' FILE, FILE... ')' [START_WAL_LOCATION 'X/X']
[NOVERIFY_CHECKSUMS]

Parallel backup is not making any use of tablespace map, so I have
removed that option from the above commands. There is a patch pending
to remove the exclusive backup; we can further refactor the do_pg_start_backup
function at that time, to remove the tablespace information and move the
creation of tablespace_map file to the client.

I have disabled the maxrate option for parallel backup. I intend to send
out a separate patch for it. Robert previously suggested to implement
throttling on the client-side. I found the original email thread [1]
where throttling was proposed and added to the server. In that thread,
it was originally implemented on the client-side, but per many suggestions,
it was moved to server-side.

So, I have a few suggestions on how we can implement this:

1- have another option for pg_basebackup (i.e. per-worker-maxrate) where
the user could choose the bandwidth allocation for each worker. This approach
can be implemented on the client-side as well as on the server-side.

2- have the maxrate, be divided among workers equally at first. and the
let the main thread keep adjusting it whenever one of the workers finishes.
I believe this would only be possible if we handle throttling on the client.
Also, as I understand it, implementing this will introduce additional mutex
for handling of bandwidth consumption data so that rate may be adjusted
according to data received by threads.

[1] https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

The latest changes look good to me. However, the patch set is missing the documentation.
Please add those.

Thanks

--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

--
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

--
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

13 March 2020, 16:21:09

On Wed, Mar 11, 2020 at 2:38 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:

Hi Asif

I have started testing this feature. I have applied v6 patch on commit a069218163704c44a8996e7e98e765c56e2b9c8e (30 Jan).
I got few observations, please take a look.

--if backup failed, backup directory is not getting removed.
[edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=9 -D /tmp/test_bkp/bkp6
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
[edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=8 -D /tmp/test_bkp/bkp6
pg_basebackup: error: directory "/tmp/test_bkp/bkp6" exists but is not empty

--giving large number of jobs leading segmentation fault.
./pg_basebackup -p 5432 --jobs=1000 -D /tmp/t3
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
.
.
.
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: could not fork new process for connection: Resource temporarily unavailable

could not fork new process for connection: Resource temporarily unavailable
pg_basebackup: error: failed to create thread: Resource temporarily unavailable
Segmentation fault (core dumped)

--stack-trace
gdb -q -c core.11824 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 --jobs=1000 -D /tmp/test_bkp/bkp10'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=140503120623360, thread_return=0x0) at pthread_join.c:46
46 if (INVALID_NOT_TERMINATED_TD_P (pd))
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 pthread_join (threadid=140503120623360, thread_return=0x0) at pthread_join.c:46
#1 0x0000000000408e21 in cleanup_workers () at pg_basebackup.c:2840
#2 0x0000000000403846 in disconnect_atexit () at pg_basebackup.c:316
#3 0x0000003921235a02 in __run_exit_handlers (status=1) at exit.c:78
#4 exit (status=1) at exit.c:100
#5 0x0000000000408aa6 in create_parallel_workers (backupinfo=0x1a4b8c0) at pg_basebackup.c:2713
#6 0x0000000000407946 in BaseBackup () at pg_basebackup.c:2127
#7 0x000000000040895c in main (argc=6, argv=0x7ffd566f4718) at pg_basebackup.c:2668

--with tablespace is in the same directory as data, parallel_backup crashed
[edb@localhost bin]$ ./initdb -D /tmp/data
[edb@localhost bin]$ ./pg_ctl -D /tmp/data -l /tmp/logfile start
[edb@localhost bin]$ mkdir /tmp/ts
[edb@localhost bin]$ ./psql postgres
psql (13devel)
Type "help" for help.

postgres=# create tablespace ts location '/tmp/ts';
CREATE TABLESPACE
postgres=# create table tx (a int) tablespace ts;
CREATE TABLE
postgres=# \q
[edb@localhost bin]$ ./pg_basebackup -j 2 -D /tmp/tts -T /tmp/ts=/tmp/ts1
Segmentation fault (core dumped)

--stack-trace
[edb@localhost bin]$ gdb -q -c core.15778 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -j 2 -D /tmp/tts -T /tmp/ts=/tmp/ts1'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000409442 in get_backup_filelist (conn=0x140cb20, backupInfo=0x14210a0) at pg_basebackup.c:3000
3000 backupInfo->curr->next = file;
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x0000000000409442 in get_backup_filelist (conn=0x140cb20, backupInfo=0x14210a0) at pg_basebackup.c:3000
#1 0x0000000000408b56 in parallel_backup_run (backupinfo=0x14210a0) at pg_basebackup.c:2739
#2 0x0000000000407955 in BaseBackup () at pg_basebackup.c:2128
#3 0x000000000040895c in main (argc=7, argv=0x7ffca2910c58) at pg_basebackup.c:2668
(gdb)

Thanks Rajkumar. I have fixed the above issues and have rebased the patch to the latest master (b7f64c64).

(V9 of the patches are attached).

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Attachment

Re: WIP/PoC for parallel backup

From

Rajkumar Raghuwanshi

Date:

16 March 2020, 06:08:31

Thanks for the patches.

I have verified reported issues with new patches, issues are fixed now.

I got another observation where If a new slot name given without -C option, it leads to server crash error.

[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 4 -D /tmp/bkp --slot test_bkp_slot

pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test_bkp_slot" does not exist

pg_basebackup: error: could not list backup files: server closed the connection unexpectedly

This probably means the server terminated abnormally

before or while processing the request.

pg_basebackup: removing data directory "/tmp/bkp"

Thanks & Regards,

Rajkumar Raghuwanshi

On Fri, Mar 13, 2020 at 9:51 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Wed, Mar 11, 2020 at 2:38 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
Hi Asif

I have started testing this feature. I have applied v6 patch on commit a069218163704c44a8996e7e98e765c56e2b9c8e (30 Jan).
I got few observations, please take a look.

--if backup failed, backup directory is not getting removed.
[edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=9 -D /tmp/test_bkp/bkp6
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
[edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=8 -D /tmp/test_bkp/bkp6
pg_basebackup: error: directory "/tmp/test_bkp/bkp6" exists but is not empty

--giving large number of jobs leading segmentation fault.
./pg_basebackup -p 5432 --jobs=1000 -D /tmp/t3
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
.
.
.
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: could not fork new process for connection: Resource temporarily unavailable

could not fork new process for connection: Resource temporarily unavailable
pg_basebackup: error: failed to create thread: Resource temporarily unavailable
Segmentation fault (core dumped)

--stack-trace
gdb -q -c core.11824 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 --jobs=1000 -D /tmp/test_bkp/bkp10'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=140503120623360, thread_return=0x0) at pthread_join.c:46
46 if (INVALID_NOT_TERMINATED_TD_P (pd))
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 pthread_join (threadid=140503120623360, thread_return=0x0) at pthread_join.c:46
#1 0x0000000000408e21 in cleanup_workers () at pg_basebackup.c:2840
#2 0x0000000000403846 in disconnect_atexit () at pg_basebackup.c:316
#3 0x0000003921235a02 in __run_exit_handlers (status=1) at exit.c:78
#4 exit (status=1) at exit.c:100
#5 0x0000000000408aa6 in create_parallel_workers (backupinfo=0x1a4b8c0) at pg_basebackup.c:2713
#6 0x0000000000407946 in BaseBackup () at pg_basebackup.c:2127
#7 0x000000000040895c in main (argc=6, argv=0x7ffd566f4718) at pg_basebackup.c:2668

--with tablespace is in the same directory as data, parallel_backup crashed
[edb@localhost bin]$ ./initdb -D /tmp/data
[edb@localhost bin]$ ./pg_ctl -D /tmp/data -l /tmp/logfile start
[edb@localhost bin]$ mkdir /tmp/ts
[edb@localhost bin]$ ./psql postgres
psql (13devel)
Type "help" for help.

postgres=# create tablespace ts location '/tmp/ts';
CREATE TABLESPACE
postgres=# create table tx (a int) tablespace ts;
CREATE TABLE
postgres=# \q
[edb@localhost bin]$ ./pg_basebackup -j 2 -D /tmp/tts -T /tmp/ts=/tmp/ts1
Segmentation fault (core dumped)

--stack-trace
[edb@localhost bin]$ gdb -q -c core.15778 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -j 2 -D /tmp/tts -T /tmp/ts=/tmp/ts1'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000409442 in get_backup_filelist (conn=0x140cb20, backupInfo=0x14210a0) at pg_basebackup.c:3000
3000 backupInfo->curr->next = file;
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x0000000000409442 in get_backup_filelist (conn=0x140cb20, backupInfo=0x14210a0) at pg_basebackup.c:3000
#1 0x0000000000408b56 in parallel_backup_run (backupinfo=0x14210a0) at pg_basebackup.c:2739
#2 0x0000000000407955 in BaseBackup () at pg_basebackup.c:2128
#3 0x000000000040895c in main (argc=7, argv=0x7ffca2910c58) at pg_basebackup.c:2668
(gdb)

Thanks Rajkumar. I have fixed the above issues and have rebased the patch to the latest master (b7f64c64).
(V9 of the patches are attached).

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

16 March 2020, 06:21:49

On Mon, Mar 16, 2020 at 11:08 AM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:

Thanks for the patches.

I have verified reported issues with new patches, issues are fixed now.

I got another observation where If a new slot name given without -C option, it leads to server crash error.

[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 4 -D /tmp/bkp --slot test_bkp_slot
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test_bkp_slot" does not exist
pg_basebackup: error: could not list backup files: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing data directory "/tmp/bkp"

It seems to be an expected behavior. The START_BACKUP command has been executed, and

pg_basebackup tries to start a WAL streaming process with a non-existent slot, which results in

an error. So the backup is aborted while terminating all other processes.

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

From

Rajkumar Raghuwanshi

Date:

16 March 2020, 06:26:16

On Mon, Mar 16, 2020 at 11:52 AM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Mon, Mar 16, 2020 at 11:08 AM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
Thanks for the patches.

I have verified reported issues with new patches, issues are fixed now.

I got another observation where If a new slot name given without -C option, it leads to server crash error.

[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 4 -D /tmp/bkp --slot test_bkp_slot
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test_bkp_slot" does not exist
pg_basebackup: error: could not list backup files: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing data directory "/tmp/bkp"

It seems to be an expected behavior. The START_BACKUP command has been executed, and
pg_basebackup tries to start a WAL streaming process with a non-existent slot, which results in
an error. So the backup is aborted while terminating all other processes.

I think error message can be improved. current error message looks like database server is crashed.

on PG same is existing with exit 1.

[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp --slot test_bkp_slot
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test_bkp_slot" does not exist
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory "/tmp/bkp"

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

From

Jeevan Chalke

Date:

16 March 2020, 08:43:48

Hi Asif,

Thanks Rajkumar. I have fixed the above issues and have rebased the patch to the latest master (b7f64c64).
(V9 of the patches are attached).

I had a further review of the patches and here are my few observations:

1.
+/*
+ * stop_backup() - ends an online backup
+ *
+ * The function is called at the end of an online backup. It sends out pg_control
+ * file, optionally WAL segments and ending WAL location.
+ */

Comments seem out-dated.

2. With parallel jobs, maxrate is now not supported. Since we are now asking
data in multiple threads throttling seems important here. Can you please
explain why have you disabled that?

3. As we are always fetching a single file and as Robert suggested, let rename
SEND_FILES to SEND_FILE instead.

4. Does this work on Windows? I mean does pthread_create() work on Windows?
I asked this as I see that pgbench has its own implementation for
pthread_create() for WIN32 but this patch doesn't.

5. Typos:
tablspace => tablespace
safly => safely

6. parallel_backup_run() needs some comments explaining the states it goes
through PB_* states.

7.
+            case PB_FETCH_REL_FILES:    /* fetch files from server */
+                if (backupinfo->activeworkers == 0)
+                {
+                    backupinfo->backupstate = PB_STOP_BACKUP;
+                    free_filelist(backupinfo);
+                }
+                break;
+            case PB_FETCH_WAL_FILES:    /* fetch WAL files from server */
+                if (backupinfo->activeworkers == 0)
+                {
+                    backupinfo->backupstate = PB_BACKUP_COMPLETE;
+                }
+                break;

Why free_filelist() is not called in PB_FETCH_WAL_FILES case?

Thanks

Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 66449694

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message.

Re: WIP/PoC for parallel backup

From

Rajkumar Raghuwanshi

Date:

16 March 2020, 12:49:44

Hi Asif,

On testing further, I found when taking backup with -R, pg_basebackup crashed

this crash is not consistently reproducible.

[edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a text);"

CREATE TABLE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values ('parallel_backup with -R recovery-conf');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R
Segmentation fault (core dumped)

stack trace looks the same as it was on earlier reported crash with tablespace.
--stack trace
[edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
3175 backupinfo->curr = fetchfile->next;
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
#1 0x0000000000408a9e in worker_run (arg=0xc1e458) at pg_basebackup.c:2715
#2 0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at pthread_create.c:301
#3 0x00000039212e8c4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

(gdb)

Thanks & Regards,

Rajkumar Raghuwanshi

On Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:

Hi Asif,

Thanks Rajkumar. I have fixed the above issues and have rebased the patch to the latest master (b7f64c64).
(V9 of the patches are attached).

I had a further review of the patches and here are my few observations:

1.
+/*
+ * stop_backup() - ends an online backup
+ *
+ * The function is called at the end of an online backup. It sends out pg_control
+ * file, optionally WAL segments and ending WAL location.
+ */

Comments seem out-dated.

2. With parallel jobs, maxrate is now not supported. Since we are now asking
data in multiple threads throttling seems important here. Can you please
explain why have you disabled that?

3. As we are always fetching a single file and as Robert suggested, let rename
SEND_FILES to SEND_FILE instead.

4. Does this work on Windows? I mean does pthread_create() work on Windows?
I asked this as I see that pgbench has its own implementation for
pthread_create() for WIN32 but this patch doesn't.

5. Typos:
tablspace => tablespace
safly => safely

6. parallel_backup_run() needs some comments explaining the states it goes
through PB_* states.

7.
+            case PB_FETCH_REL_FILES:    /* fetch files from server */
+                if (backupinfo->activeworkers == 0)
+                {
+                    backupinfo->backupstate = PB_STOP_BACKUP;
+                    free_filelist(backupinfo);
+                }
+                break;
+            case PB_FETCH_WAL_FILES:    /* fetch WAL files from server */
+                if (backupinfo->activeworkers == 0)
+                {
+                    backupinfo->backupstate = PB_BACKUP_COMPLETE;
+                }
+                break;

Why free_filelist() is not called in PB_FETCH_WAL_FILES case?

Thanks
--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 66449694

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message.

Re: WIP/PoC for parallel backup

From

Rajkumar Raghuwanshi

Date:

19 March 2020, 10:41:24

Hi Asif,

In another scenarios, bkp data is corrupted for tablespace. again this is not reproducible everytime,

but If I am running the same set of commands I am getting the same error.

[edb@localhost bin]$ ./pg_ctl -D data -l logfile start
waiting for server to start.... done
server started
[edb@localhost bin]$
[edb@localhost bin]$ mkdir /tmp/tblsp
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp location '/tmp/tblsp';"
CREATE TABLESPACE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create database testdb tablespace tblsp;"
CREATE DATABASE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a text);"
CREATE TABLE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl values ('parallel_backup with tablespace');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T /tmp/tblsp=/tmp/tblsp_bkp --jobs 2
[edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p 5555" start
waiting for server to start.... done
server started
[edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
oid | spcname | spcowner | spcacl | spcoptions
-------+------------+----------+--------+------------
1663 | pg_default | 10 | |
16384 | tblsp | 10 | |
(2 rows)

[edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
psql: error: could not connect to server: FATAL: "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
DETAIL: File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is missing.
[edb@localhost bin]$
[edb@localhost bin]$ ls data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
[edb@localhost bin]$ ls /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
ls: cannot access /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or directory

Thanks & Regards,

Rajkumar Raghuwanshi

On Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:

Hi Asif,

On testing further, I found when taking backup with -R, pg_basebackup crashed
this crash is not consistently reproducible.

[edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a text);"
CREATE TABLE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values ('parallel_backup with -R recovery-conf');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R
Segmentation fault (core dumped)

stack trace looks the same as it was on earlier reported crash with tablespace.
--stack trace
[edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
3175 backupinfo->curr = fetchfile->next;
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
#1 0x0000000000408a9e in worker_run (arg=0xc1e458) at pg_basebackup.c:2715
#2 0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at pthread_create.c:301
#3 0x00000039212e8c4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb)

Thanks & Regards,
Rajkumar Raghuwanshi

On Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:
Hi Asif,

Thanks Rajkumar. I have fixed the above issues and have rebased the patch to the latest master (b7f64c64).
(V9 of the patches are attached).

I had a further review of the patches and here are my few observations:

1.
+/*
+ * stop_backup() - ends an online backup
+ *
+ * The function is called at the end of an online backup. It sends out pg_control
+ * file, optionally WAL segments and ending WAL location.
+ */

Comments seem out-dated.

2. With parallel jobs, maxrate is now not supported. Since we are now asking
data in multiple threads throttling seems important here. Can you please
explain why have you disabled that?

3. As we are always fetching a single file and as Robert suggested, let rename
SEND_FILES to SEND_FILE instead.

4. Does this work on Windows? I mean does pthread_create() work on Windows?
I asked this as I see that pgbench has its own implementation for
pthread_create() for WIN32 but this patch doesn't.

5. Typos:
tablspace => tablespace
safly => safely

6. parallel_backup_run() needs some comments explaining the states it goes
through PB_* states.

7.
+            case PB_FETCH_REL_FILES:    /* fetch files from server */
+                if (backupinfo->activeworkers == 0)
+                {
+                    backupinfo->backupstate = PB_STOP_BACKUP;
+                    free_filelist(backupinfo);
+                }
+                break;
+            case PB_FETCH_WAL_FILES:    /* fetch WAL files from server */
+                if (backupinfo->activeworkers == 0)
+                {
+                    backupinfo->backupstate = PB_BACKUP_COMPLETE;
+                }
+                break;

Why free_filelist() is not called in PB_FETCH_WAL_FILES case?

Thanks
--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 66449694

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message.

Re: WIP/PoC for parallel backup

From

Rajkumar Raghuwanshi

Date:

25 March 2020, 07:22:11

Hi Asif,

While testing further I observed parallel backup is not able to take backup of standby server.

mkdir /tmp/archive_dir
echo "archive_mode='on'">> data/postgresql.conf
echo "archive_command='cp %p /tmp/archive_dir/%f'">> data/postgresql.conf

./pg_ctl -D data -l logs start
./pg_basebackup -p 5432 -Fp -R -D /tmp/slave

echo "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">> /tmp/slave/postgresql.conf
echo "restore_command='cp /tmp/archive_dir/%f %p'">> /tmp/slave/postgresql.conf
echo "promote_trigger_file='/tmp/failover.log'">> /tmp/slave/postgresql.conf

./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c

[edb@localhost bin]$ ./psql postgres -p 5432 -c "select pg_is_in_recovery();"
pg_is_in_recovery
-------------------
f
(1 row)

[edb@localhost bin]$ ./psql postgres -p 5433 -c "select pg_is_in_recovery();"
pg_is_in_recovery
-------------------
t
(1 row)

[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 6
pg_basebackup: error: could not list backup files: ERROR: the standby was promoted during online backup
HINT: This means that the backup being taken is corrupt and should not be used. Try taking another online backup.
pg_basebackup: removing data directory "/tmp/bkp_s"

#same is working fine without parallel backup
[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
[edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION
/tmp/bkp_s/PG_VERSION

Thanks & Regards,

Rajkumar Raghuwanshi

On Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:

Hi Asif,

In another scenarios, bkp data is corrupted for tablespace. again this is not reproducible everytime,
but If I am running the same set of commands I am getting the same error.

[edb@localhost bin]$ ./pg_ctl -D data -l logfile start
waiting for server to start.... done
server started
[edb@localhost bin]$
[edb@localhost bin]$ mkdir /tmp/tblsp
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp location '/tmp/tblsp';"
CREATE TABLESPACE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create database testdb tablespace tblsp;"
CREATE DATABASE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a text);"
CREATE TABLE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl values ('parallel_backup with tablespace');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T /tmp/tblsp=/tmp/tblsp_bkp --jobs 2
[edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p 5555" start
waiting for server to start.... done
server started
[edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
oid | spcname | spcowner | spcacl | spcoptions
-------+------------+----------+--------+------------
1663 | pg_default | 10 | |
16384 | tblsp | 10 | |
(2 rows)

[edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
psql: error: could not connect to server: FATAL: "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
DETAIL: File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is missing.
[edb@localhost bin]$
[edb@localhost bin]$ ls data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
[edb@localhost bin]$ ls /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
ls: cannot access /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or directory

Thanks & Regards,
Rajkumar Raghuwanshi

On Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
Hi Asif,

On testing further, I found when taking backup with -R, pg_basebackup crashed
this crash is not consistently reproducible.

[edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a text);"
CREATE TABLE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values ('parallel_backup with -R recovery-conf');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R
Segmentation fault (core dumped)

stack trace looks the same as it was on earlier reported crash with tablespace.
--stack trace
[edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
3175 backupinfo->curr = fetchfile->next;
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
#1 0x0000000000408a9e in worker_run (arg=0xc1e458) at pg_basebackup.c:2715
#2 0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at pthread_create.c:301
#3 0x00000039212e8c4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb)

Thanks & Regards,
Rajkumar Raghuwanshi

On Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:
Hi Asif,

Thanks Rajkumar. I have fixed the above issues and have rebased the patch to the latest master (b7f64c64).
(V9 of the patches are attached).

I had a further review of the patches and here are my few observations:

1.
+/*
+ * stop_backup() - ends an online backup
+ *
+ * The function is called at the end of an online backup. It sends out pg_control
+ * file, optionally WAL segments and ending WAL location.
+ */

Comments seem out-dated.

2. With parallel jobs, maxrate is now not supported. Since we are now asking
data in multiple threads throttling seems important here. Can you please
explain why have you disabled that?

3. As we are always fetching a single file and as Robert suggested, let rename
SEND_FILES to SEND_FILE instead.

4. Does this work on Windows? I mean does pthread_create() work on Windows?
I asked this as I see that pgbench has its own implementation for
pthread_create() for WIN32 but this patch doesn't.

5. Typos:
tablspace => tablespace
safly => safely

6. parallel_backup_run() needs some comments explaining the states it goes
through PB_* states.

7.
+            case PB_FETCH_REL_FILES:    /* fetch files from server */
+                if (backupinfo->activeworkers == 0)
+                {
+                    backupinfo->backupstate = PB_STOP_BACKUP;
+                    free_filelist(backupinfo);
+                }
+                break;
+            case PB_FETCH_WAL_FILES:    /* fetch WAL files from server */
+                if (backupinfo->activeworkers == 0)
+                {
+                    backupinfo->backupstate = PB_BACKUP_COMPLETE;
+                }
+                break;

Why free_filelist() is not called in PB_FETCH_WAL_FILES case?

Thanks
--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Phone: +91 20 66449694

Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb

This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message.

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

27 March 2020, 17:33:28

On Wed, Mar 25, 2020 at 12:22 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:

Hi Asif,

While testing further I observed parallel backup is not able to take backup of standby server.

mkdir /tmp/archive_dir
echo "archive_mode='on'">> data/postgresql.conf
echo "archive_command='cp %p /tmp/archive_dir/%f'">> data/postgresql.conf

./pg_ctl -D data -l logs start
./pg_basebackup -p 5432 -Fp -R -D /tmp/slave

echo "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">> /tmp/slave/postgresql.conf
echo "restore_command='cp /tmp/archive_dir/%f %p'">> /tmp/slave/postgresql.conf
echo "promote_trigger_file='/tmp/failover.log'">> /tmp/slave/postgresql.conf

./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c

[edb@localhost bin]$ ./psql postgres -p 5432 -c "select pg_is_in_recovery();"
pg_is_in_recovery
-------------------
f
(1 row)

[edb@localhost bin]$ ./psql postgres -p 5433 -c "select pg_is_in_recovery();"
pg_is_in_recovery
-------------------
t
(1 row)

[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 6
pg_basebackup: error: could not list backup files: ERROR: the standby was promoted during online backup
HINT: This means that the backup being taken is corrupt and should not be used. Try taking another online backup.
pg_basebackup: removing data directory "/tmp/bkp_s"

#same is working fine without parallel backup
[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
[edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION
/tmp/bkp_s/PG_VERSION

Thanks & Regards,
Rajkumar Raghuwanshi

On Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
Hi Asif,

In another scenarios, bkp data is corrupted for tablespace. again this is not reproducible everytime,
but If I am running the same set of commands I am getting the same error.

[edb@localhost bin]$ ./pg_ctl -D data -l logfile start
waiting for server to start.... done
server started
[edb@localhost bin]$
[edb@localhost bin]$ mkdir /tmp/tblsp
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp location '/tmp/tblsp';"
CREATE TABLESPACE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create database testdb tablespace tblsp;"
CREATE DATABASE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a text);"
CREATE TABLE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl values ('parallel_backup with tablespace');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T /tmp/tblsp=/tmp/tblsp_bkp --jobs 2
[edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p 5555" start
waiting for server to start.... done
server started
[edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
oid | spcname | spcowner | spcacl | spcoptions
-------+------------+----------+--------+------------
1663 | pg_default | 10 | |
16384 | tblsp | 10 | |
(2 rows)

[edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
psql: error: could not connect to server: FATAL: "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
DETAIL: File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is missing.
[edb@localhost bin]$
[edb@localhost bin]$ ls data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
[edb@localhost bin]$ ls /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
ls: cannot access /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or directory

Thanks & Regards,
Rajkumar Raghuwanshi

On Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
Hi Asif,

On testing further, I found when taking backup with -R, pg_basebackup crashed
this crash is not consistently reproducible.

[edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a text);"
CREATE TABLE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values ('parallel_backup with -R recovery-conf');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R
Segmentation fault (core dumped)

stack trace looks the same as it was on earlier reported crash with tablespace.
--stack trace
[edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
3175 backupinfo->curr = fetchfile->next;
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
#1 0x0000000000408a9e in worker_run (arg=0xc1e458) at pg_basebackup.c:2715
#2 0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at pthread_create.c:301
#3 0x00000039212e8c4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb)

Thanks & Regards,
Rajkumar Raghuwanshi

On Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:
Hi Asif,

Thanks Rajkumar. I have fixed the above issues and have rebased the patch to the latest master (b7f64c64).
(V9 of the patches are attached).

I had a further review of the patches and here are my few observations:

1.
+/*
+ * stop_backup() - ends an online backup
+ *
+ * The function is called at the end of an online backup. It sends out pg_control
+ * file, optionally WAL segments and ending WAL location.
+ */

Comments seem out-dated.

Fixed.

2. With parallel jobs, maxrate is now not supported. Since we are now asking
data in multiple threads throttling seems important here. Can you please
explain why have you disabled that?

3. As we are always fetching a single file and as Robert suggested, let rename
SEND_FILES to SEND_FILE instead.

Yes, we are fetching a single file. However, SEND_FILES is still capable of fetching multiple files in one

go, that's why the name.

4. Does this work on Windows? I mean does pthread_create() work on Windows?
I asked this as I see that pgbench has its own implementation for
pthread_create() for WIN32 but this patch doesn't.

patch is updated to add support for the Windows platform.

5. Typos:
tablspace => tablespace
safly => safely

Done.

6. parallel_backup_run() needs some comments explaining the states it goes
through PB_* states.

7.
+            case PB_FETCH_REL_FILES:    /* fetch files from server */
+                if (backupinfo->activeworkers == 0)
+                {
+                    backupinfo->backupstate = PB_STOP_BACKUP;
+                    free_filelist(backupinfo);
+                }
+                break;
+            case PB_FETCH_WAL_FILES:    /* fetch WAL files from server */
+                if (backupinfo->activeworkers == 0)
+                {
+                    backupinfo->backupstate = PB_BACKUP_COMPLETE;
+                }
+                break;

Done.

Why free_filelist() is not called in PB_FETCH_WAL_FILES case?

Done.

The corrupted tablespace and crash, reported by Rajkumar, have been fixed. A pointer

variable remained uninitialized which in turn caused the system to misbehave.

Attached is the updated set of patches. AFAIK, to complete parallel backup feature

set, there remain three sub-features:

1- parallel backup does not work with a standby server. In parallel backup, the server

spawns multiple processes and there is no shared state being maintained. So currently,

no way to tell multiple processes if the standby was promoted during the backup since

the START_BACKUP was called.

2- throttling. Robert previously suggested that we implement throttling on the client-side.

However, I found a previous discussion where it was advocated to be added to the

backend instead[1].

So, it was better to have a consensus before moving the throttle function to the client.

That’s why for the time being I have disabled it and have asked for suggestions on it

to move forward.

It seems to me that we have to maintain a shared state in order to support taking backup

from standby. Also, there is a new feature recently committed for backup progress

reporting in the backend (pg_stat_progress_basebackup). This functionality was recently

added via this commit ID: e65497df. For parallel backup to update these stats, a shared

state will be required.

Since multiple pg_basebackup can be running at the same time, maintaining a shared state

can become a little complex, unless we disallow taking multiple parallel backups.

So proceeding on with this patch, I will be working on:

- throttling to be implemented on the client-side.
- adding a shared state to handle backup from the standby.

[1] https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Attachment

v10-parallel-backup.zip

Re: WIP/PoC for parallel backup

From

Rajkumar Raghuwanshi

Date:

30 March 2020, 10:43:47

Thanks Asif,

I have re-verified reported issue. expect standby backup, others are fixed.

Thanks & Regards,

Rajkumar Raghuwanshi

On Fri, Mar 27, 2020 at 11:04 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Wed, Mar 25, 2020 at 12:22 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
Hi Asif,

While testing further I observed parallel backup is not able to take backup of standby server.

mkdir /tmp/archive_dir
echo "archive_mode='on'">> data/postgresql.conf
echo "archive_command='cp %p /tmp/archive_dir/%f'">> data/postgresql.conf

./pg_ctl -D data -l logs start
./pg_basebackup -p 5432 -Fp -R -D /tmp/slave

echo "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">> /tmp/slave/postgresql.conf
echo "restore_command='cp /tmp/archive_dir/%f %p'">> /tmp/slave/postgresql.conf
echo "promote_trigger_file='/tmp/failover.log'">> /tmp/slave/postgresql.conf

./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c

[edb@localhost bin]$ ./psql postgres -p 5432 -c "select pg_is_in_recovery();"
pg_is_in_recovery
-------------------
f
(1 row)

[edb@localhost bin]$ ./psql postgres -p 5433 -c "select pg_is_in_recovery();"
pg_is_in_recovery
-------------------
t
(1 row)

[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 6
pg_basebackup: error: could not list backup files: ERROR: the standby was promoted during online backup
HINT: This means that the backup being taken is corrupt and should not be used. Try taking another online backup.
pg_basebackup: removing data directory "/tmp/bkp_s"

#same is working fine without parallel backup
[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
[edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION
/tmp/bkp_s/PG_VERSION

Thanks & Regards,
Rajkumar Raghuwanshi

On Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
Hi Asif,

In another scenarios, bkp data is corrupted for tablespace. again this is not reproducible everytime,
but If I am running the same set of commands I am getting the same error.

[edb@localhost bin]$ ./pg_ctl -D data -l logfile start
waiting for server to start.... done
server started
[edb@localhost bin]$
[edb@localhost bin]$ mkdir /tmp/tblsp
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp location '/tmp/tblsp';"
CREATE TABLESPACE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create database testdb tablespace tblsp;"
CREATE DATABASE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a text);"
CREATE TABLE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl values ('parallel_backup with tablespace');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T /tmp/tblsp=/tmp/tblsp_bkp --jobs 2
[edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p 5555" start
waiting for server to start.... done
server started
[edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
oid | spcname | spcowner | spcacl | spcoptions
-------+------------+----------+--------+------------
1663 | pg_default | 10 | |
16384 | tblsp | 10 | |
(2 rows)

[edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
psql: error: could not connect to server: FATAL: "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
DETAIL: File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is missing.
[edb@localhost bin]$
[edb@localhost bin]$ ls data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
[edb@localhost bin]$ ls /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
ls: cannot access /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or directory

Thanks & Regards,
Rajkumar Raghuwanshi

On Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
Hi Asif,

On testing further, I found when taking backup with -R, pg_basebackup crashed
this crash is not consistently reproducible.

[edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a text);"
CREATE TABLE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values ('parallel_backup with -R recovery-conf');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R
Segmentation fault (core dumped)

stack trace looks the same as it was on earlier reported crash with tablespace.
--stack trace
[edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
3175 backupinfo->curr = fetchfile->next;
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
#1 0x0000000000408a9e in worker_run (arg=0xc1e458) at pg_basebackup.c:2715
#2 0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at pthread_create.c:301
#3 0x00000039212e8c4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb)

Thanks & Regards,
Rajkumar Raghuwanshi

On Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:
Hi Asif,

Thanks Rajkumar. I have fixed the above issues and have rebased the patch to the latest master (b7f64c64).
(V9 of the patches are attached).

I had a further review of the patches and here are my few observations:

1.
+/*
+ * stop_backup() - ends an online backup
+ *
+ * The function is called at the end of an online backup. It sends out pg_control
+ * file, optionally WAL segments and ending WAL location.
+ */

Comments seem out-dated.

Fixed.

2. With parallel jobs, maxrate is now not supported. Since we are now asking
data in multiple threads throttling seems important here. Can you please
explain why have you disabled that?

3. As we are always fetching a single file and as Robert suggested, let rename
SEND_FILES to SEND_FILE instead.

Yes, we are fetching a single file. However, SEND_FILES is still capable of fetching multiple files in one
go, that's why the name.

4. Does this work on Windows? I mean does pthread_create() work on Windows?
I asked this as I see that pgbench has its own implementation for
pthread_create() for WIN32 but this patch doesn't.

patch is updated to add support for the Windows platform.

5. Typos:
tablspace => tablespace
safly => safely

Done.

6. parallel_backup_run() needs some comments explaining the states it goes
through PB_* states.

7.
+            case PB_FETCH_REL_FILES:    /* fetch files from server */
+                if (backupinfo->activeworkers == 0)
+                {
+                    backupinfo->backupstate = PB_STOP_BACKUP;
+                    free_filelist(backupinfo);
+                }
+                break;
+            case PB_FETCH_WAL_FILES:    /* fetch WAL files from server */
+                if (backupinfo->activeworkers == 0)
+                {
+                    backupinfo->backupstate = PB_BACKUP_COMPLETE;
+                }
+                break;
Done.

Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
Done.

The corrupted tablespace and crash, reported by Rajkumar, have been fixed. A pointer
variable remained uninitialized which in turn caused the system to misbehave.

Attached is the updated set of patches. AFAIK, to complete parallel backup feature
set, there remain three sub-features:

1- parallel backup does not work with a standby server. In parallel backup, the server
spawns multiple processes and there is no shared state being maintained. So currently,
no way to tell multiple processes if the standby was promoted during the backup since
the START_BACKUP was called.

2- throttling. Robert previously suggested that we implement throttling on the client-side.
However, I found a previous discussion where it was advocated to be added to the
backend instead[1].

So, it was better to have a consensus before moving the throttle function to the client.
That’s why for the time being I have disabled it and have asked for suggestions on it
to move forward.

It seems to me that we have to maintain a shared state in order to support taking backup
from standby. Also, there is a new feature recently committed for backup progress
reporting in the backend (pg_stat_progress_basebackup). This functionality was recently
added via this commit ID: e65497df. For parallel backup to update these stats, a shared
state will be required.

Since multiple pg_basebackup can be running at the same time, maintaining a shared state
can become a little complex, unless we disallow taking multiple parallel backups.

So proceeding on with this patch, I will be working on:
- throttling to be implemented on the client-side.
- adding a shared state to handle backup from the standby.

[1] https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

From

Ahsan Hadi

Date:

30 March 2020, 12:58:18

On Mon, Mar 30, 2020 at 3:44 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:

Thanks Asif,

I have re-verified reported issue. expect standby backup, others are fixed.

Yes As Asif mentioned he is working on the standby issue and adding bandwidth throttling functionality to parallel backup.

It would be good to get some feedback on Asif previous email from Robert on the design considerations for stand-by server support and throttling. I believe all the other points mentioned by Robert in this thread are addressed by Asif so it would be good to hear about any other concerns that are not addressed.

Thanks,

-- Ahsan

Thanks & Regards,
Rajkumar Raghuwanshi

On Fri, Mar 27, 2020 at 11:04 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Wed, Mar 25, 2020 at 12:22 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
Hi Asif,

While testing further I observed parallel backup is not able to take backup of standby server.

mkdir /tmp/archive_dir
echo "archive_mode='on'">> data/postgresql.conf
echo "archive_command='cp %p /tmp/archive_dir/%f'">> data/postgresql.conf

./pg_ctl -D data -l logs start
./pg_basebackup -p 5432 -Fp -R -D /tmp/slave

echo "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">> /tmp/slave/postgresql.conf
echo "restore_command='cp /tmp/archive_dir/%f %p'">> /tmp/slave/postgresql.conf
echo "promote_trigger_file='/tmp/failover.log'">> /tmp/slave/postgresql.conf

./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c

[edb@localhost bin]$ ./psql postgres -p 5432 -c "select pg_is_in_recovery();"
pg_is_in_recovery
-------------------
f
(1 row)

[edb@localhost bin]$ ./psql postgres -p 5433 -c "select pg_is_in_recovery();"
pg_is_in_recovery
-------------------
t
(1 row)

[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 6
pg_basebackup: error: could not list backup files: ERROR: the standby was promoted during online backup
HINT: This means that the backup being taken is corrupt and should not be used. Try taking another online backup.
pg_basebackup: removing data directory "/tmp/bkp_s"

#same is working fine without parallel backup
[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
[edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION
/tmp/bkp_s/PG_VERSION

Thanks & Regards,
Rajkumar Raghuwanshi

On Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
Hi Asif,

In another scenarios, bkp data is corrupted for tablespace. again this is not reproducible everytime,
but If I am running the same set of commands I am getting the same error.

[edb@localhost bin]$ ./pg_ctl -D data -l logfile start
waiting for server to start.... done
server started
[edb@localhost bin]$
[edb@localhost bin]$ mkdir /tmp/tblsp
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp location '/tmp/tblsp';"
CREATE TABLESPACE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create database testdb tablespace tblsp;"
CREATE DATABASE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a text);"
CREATE TABLE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl values ('parallel_backup with tablespace');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T /tmp/tblsp=/tmp/tblsp_bkp --jobs 2
[edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p 5555" start
waiting for server to start.... done
server started
[edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
oid | spcname | spcowner | spcacl | spcoptions
-------+------------+----------+--------+------------
1663 | pg_default | 10 | |
16384 | tblsp | 10 | |
(2 rows)

[edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
psql: error: could not connect to server: FATAL: "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
DETAIL: File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is missing.
[edb@localhost bin]$
[edb@localhost bin]$ ls data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
[edb@localhost bin]$ ls /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
ls: cannot access /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or directory

Thanks & Regards,
Rajkumar Raghuwanshi

On Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
Hi Asif,

On testing further, I found when taking backup with -R, pg_basebackup crashed
this crash is not consistently reproducible.

[edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a text);"
CREATE TABLE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values ('parallel_backup with -R recovery-conf');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R
Segmentation fault (core dumped)

stack trace looks the same as it was on earlier reported crash with tablespace.
--stack trace
[edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
3175 backupinfo->curr = fetchfile->next;
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
#1 0x0000000000408a9e in worker_run (arg=0xc1e458) at pg_basebackup.c:2715
#2 0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at pthread_create.c:301
#3 0x00000039212e8c4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb)

Thanks & Regards,
Rajkumar Raghuwanshi

On Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:
Hi Asif,

Thanks Rajkumar. I have fixed the above issues and have rebased the patch to the latest master (b7f64c64).
(V9 of the patches are attached).

I had a further review of the patches and here are my few observations:

1.
+/*
+ * stop_backup() - ends an online backup
+ *
+ * The function is called at the end of an online backup. It sends out pg_control
+ * file, optionally WAL segments and ending WAL location.
+ */

Comments seem out-dated.

Fixed.

2. With parallel jobs, maxrate is now not supported. Since we are now asking
data in multiple threads throttling seems important here. Can you please
explain why have you disabled that?

3. As we are always fetching a single file and as Robert suggested, let rename
SEND_FILES to SEND_FILE instead.

Yes, we are fetching a single file. However, SEND_FILES is still capable of fetching multiple files in one
go, that's why the name.

4. Does this work on Windows? I mean does pthread_create() work on Windows?
I asked this as I see that pgbench has its own implementation for
pthread_create() for WIN32 but this patch doesn't.

patch is updated to add support for the Windows platform.

5. Typos:
tablspace => tablespace
safly => safely

Done.

6. parallel_backup_run() needs some comments explaining the states it goes
through PB_* states.

7.
+            case PB_FETCH_REL_FILES:    /* fetch files from server */
+                if (backupinfo->activeworkers == 0)
+                {
+                    backupinfo->backupstate = PB_STOP_BACKUP;
+                    free_filelist(backupinfo);
+                }
+                break;
+            case PB_FETCH_WAL_FILES:    /* fetch WAL files from server */
+                if (backupinfo->activeworkers == 0)
+                {
+                    backupinfo->backupstate = PB_BACKUP_COMPLETE;
+                }
+                break;
Done.

Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
Done.

The corrupted tablespace and crash, reported by Rajkumar, have been fixed. A pointer
variable remained uninitialized which in turn caused the system to misbehave.

Attached is the updated set of patches. AFAIK, to complete parallel backup feature
set, there remain three sub-features:

1- parallel backup does not work with a standby server. In parallel backup, the server
spawns multiple processes and there is no shared state being maintained. So currently,
no way to tell multiple processes if the standby was promoted during the backup since
the START_BACKUP was called.

2- throttling. Robert previously suggested that we implement throttling on the client-side.
However, I found a previous discussion where it was advocated to be added to the
backend instead[1].

So, it was better to have a consensus before moving the throttle function to the client.
That’s why for the time being I have disabled it and have asked for suggestions on it
to move forward.

It seems to me that we have to maintain a shared state in order to support taking backup
from standby. Also, there is a new feature recently committed for backup progress
reporting in the backend (pg_stat_progress_basebackup). This functionality was recently
added via this commit ID: e65497df. For parallel backup to update these stats, a shared
state will be required.

Since multiple pg_basebackup can be running at the same time, maintaining a shared state
can become a little complex, unless we disallow taking multiple parallel backups.

So proceeding on with this patch, I will be working on:
- throttling to be implemented on the client-side.
- adding a shared state to handle backup from the standby.

[1] https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.ca

Re: WIP/PoC for parallel backup

From

Rajkumar Raghuwanshi

Date:

02 April 2020, 09:57:52

Hi Asif,

My colleague Kashif Zeeshan reported an issue off-list, posting here, please take a look.

When executing two backups at the same time, getting FATAL error due to max_wal_senders and instead of exit Backup got completed

And when tried to start the server from the backup cluster, getting error.

[edb@localhost bin]$ ./pgbench -i -s 200 -h localhost -p 5432 postgres
[edb@localhost bin]$ ./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C2000270 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57849"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (3) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (4) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (5) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (6) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (7) created
pg_basebackup: write-ahead log end point: 0/C3000050
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: base backup completed
[edb@localhost bin]$ ./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C20001C0 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57848"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (3) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (4) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (5) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (6) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (7) created
pg_basebackup: write-ahead log end point: 0/C2000348
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: base backup completed

[edb@localhost bin]$ ./pg_ctl -D /home/edb/Desktop/backup1/ -o "-p 5438" start
pg_ctl: directory "/home/edb/Desktop/backup1" is not a database cluster directory

Thanks & Regards,

Rajkumar Raghuwanshi

On Mon, Mar 30, 2020 at 6:28 PM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:

On Mon, Mar 30, 2020 at 3:44 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
Thanks Asif,

I have re-verified reported issue. expect standby backup, others are fixed.

Yes As Asif mentioned he is working on the standby issue and adding bandwidth throttling functionality to parallel backup.

It would be good to get some feedback on Asif previous email from Robert on the design considerations for stand-by server support and throttling. I believe all the other points mentioned by Robert in this thread are addressed by Asif so it would be good to hear about any other concerns that are not addressed.

Thanks,

-- Ahsan

Thanks & Regards,
Rajkumar Raghuwanshi

On Fri, Mar 27, 2020 at 11:04 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Wed, Mar 25, 2020 at 12:22 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
Hi Asif,

While testing further I observed parallel backup is not able to take backup of standby server.

mkdir /tmp/archive_dir
echo "archive_mode='on'">> data/postgresql.conf
echo "archive_command='cp %p /tmp/archive_dir/%f'">> data/postgresql.conf

./pg_ctl -D data -l logs start
./pg_basebackup -p 5432 -Fp -R -D /tmp/slave

echo "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">> /tmp/slave/postgresql.conf
echo "restore_command='cp /tmp/archive_dir/%f %p'">> /tmp/slave/postgresql.conf
echo "promote_trigger_file='/tmp/failover.log'">> /tmp/slave/postgresql.conf

./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c

[edb@localhost bin]$ ./psql postgres -p 5432 -c "select pg_is_in_recovery();"
pg_is_in_recovery
-------------------
f
(1 row)

[edb@localhost bin]$ ./psql postgres -p 5433 -c "select pg_is_in_recovery();"
pg_is_in_recovery
-------------------
t
(1 row)

[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 6
pg_basebackup: error: could not list backup files: ERROR: the standby was promoted during online backup
HINT: This means that the backup being taken is corrupt and should not be used. Try taking another online backup.
pg_basebackup: removing data directory "/tmp/bkp_s"

#same is working fine without parallel backup
[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
[edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION
/tmp/bkp_s/PG_VERSION

Thanks & Regards,
Rajkumar Raghuwanshi

On Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
Hi Asif,

In another scenarios, bkp data is corrupted for tablespace. again this is not reproducible everytime,
but If I am running the same set of commands I am getting the same error.

[edb@localhost bin]$ ./pg_ctl -D data -l logfile start
waiting for server to start.... done
server started
[edb@localhost bin]$
[edb@localhost bin]$ mkdir /tmp/tblsp
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp location '/tmp/tblsp';"
CREATE TABLESPACE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create database testdb tablespace tblsp;"
CREATE DATABASE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a text);"
CREATE TABLE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl values ('parallel_backup with tablespace');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T /tmp/tblsp=/tmp/tblsp_bkp --jobs 2
[edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p 5555" start
waiting for server to start.... done
server started
[edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
oid | spcname | spcowner | spcacl | spcoptions
-------+------------+----------+--------+------------
1663 | pg_default | 10 | |
16384 | tblsp | 10 | |
(2 rows)

[edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
psql: error: could not connect to server: FATAL: "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
DETAIL: File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is missing.
[edb@localhost bin]$
[edb@localhost bin]$ ls data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
[edb@localhost bin]$ ls /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
ls: cannot access /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or directory

Thanks & Regards,
Rajkumar Raghuwanshi

On Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
Hi Asif,

On testing further, I found when taking backup with -R, pg_basebackup crashed
this crash is not consistently reproducible.

[edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a text);"
CREATE TABLE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values ('parallel_backup with -R recovery-conf');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R
Segmentation fault (core dumped)

stack trace looks the same as it was on earlier reported crash with tablespace.
--stack trace
[edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
3175 backupinfo->curr = fetchfile->next;
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
#1 0x0000000000408a9e in worker_run (arg=0xc1e458) at pg_basebackup.c:2715
#2 0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at pthread_create.c:301
#3 0x00000039212e8c4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb)

Thanks & Regards,
Rajkumar Raghuwanshi

On Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:
Hi Asif,

Thanks Rajkumar. I have fixed the above issues and have rebased the patch to the latest master (b7f64c64).
(V9 of the patches are attached).

I had a further review of the patches and here are my few observations:

1.
+/*
+ * stop_backup() - ends an online backup
+ *
+ * The function is called at the end of an online backup. It sends out pg_control
+ * file, optionally WAL segments and ending WAL location.
+ */

Comments seem out-dated.

Fixed.

2. With parallel jobs, maxrate is now not supported. Since we are now asking
data in multiple threads throttling seems important here. Can you please
explain why have you disabled that?

3. As we are always fetching a single file and as Robert suggested, let rename
SEND_FILES to SEND_FILE instead.

Yes, we are fetching a single file. However, SEND_FILES is still capable of fetching multiple files in one
go, that's why the name.

4. Does this work on Windows? I mean does pthread_create() work on Windows?
I asked this as I see that pgbench has its own implementation for
pthread_create() for WIN32 but this patch doesn't.

patch is updated to add support for the Windows platform.

5. Typos:
tablspace => tablespace
safly => safely

Done.

6. parallel_backup_run() needs some comments explaining the states it goes
through PB_* states.

7.
+            case PB_FETCH_REL_FILES:    /* fetch files from server */
+                if (backupinfo->activeworkers == 0)
+                {
+                    backupinfo->backupstate = PB_STOP_BACKUP;
+                    free_filelist(backupinfo);
+                }
+                break;
+            case PB_FETCH_WAL_FILES:    /* fetch WAL files from server */
+                if (backupinfo->activeworkers == 0)
+                {
+                    backupinfo->backupstate = PB_BACKUP_COMPLETE;
+                }
+                break;
Done.

Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
Done.

The corrupted tablespace and crash, reported by Rajkumar, have been fixed. A pointer
variable remained uninitialized which in turn caused the system to misbehave.

Attached is the updated set of patches. AFAIK, to complete parallel backup feature
set, there remain three sub-features:

1- parallel backup does not work with a standby server. In parallel backup, the server
spawns multiple processes and there is no shared state being maintained. So currently,
no way to tell multiple processes if the standby was promoted during the backup since
the START_BACKUP was called.

2- throttling. Robert previously suggested that we implement throttling on the client-side.
However, I found a previous discussion where it was advocated to be added to the
backend instead[1].

So, it was better to have a consensus before moving the throttle function to the client.
That’s why for the time being I have disabled it and have asked for suggestions on it
to move forward.

It seems to me that we have to maintain a shared state in order to support taking backup
from standby. Also, there is a new feature recently committed for backup progress
reporting in the backend (pg_stat_progress_basebackup). This functionality was recently
added via this commit ID: e65497df. For parallel backup to update these stats, a shared
state will be required.

Since multiple pg_basebackup can be running at the same time, maintaining a shared state
can become a little complex, unless we disallow taking multiple parallel backups.

So proceeding on with this patch, I will be working on:
- throttling to be implemented on the client-side.
- adding a shared state to handle backup from the standby.

[1] https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

--
Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.ca

Re: WIP/PoC for parallel backup

From

Kashif Zeeshan

Date:

02 April 2020, 11:29:47

Hi Asif

The backup failed with errors "error: could not connect to server: could not look up local user ID 1000: Too many open files" when the max_wal_senders was set to 2000.

The errors generated for the workers starting from backup worke=1017.

Please note that the backup directory was also not cleaned after the backup was failed.

Steps

=======

1) Generate data in DB

./pgbench -i -s 600 -h localhost -p 5432 postgres

2) Set max_wal_senders = 2000 in postgresql.

3) Generate the backup

[edb@localhost bin]$
^[[A[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 1/F1000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_58692"
pg_basebackup: backup worker (0) created
….
…..
…..
pg_basebackup: backup worker (1017) created
pg_basebackup: error: could not connect to server: could not look up local user ID 1000: Too many open files
pg_basebackup: backup worker (1018) created
pg_basebackup: error: could not connect to server: could not look up local user ID 1000: Too many open files
…
…
…
pg_basebackup: error: could not connect to server: could not look up local user ID 1000: Too many open files
pg_basebackup: backup worker (1989) created
pg_basebackup: error: could not create file "/home/edb/Desktop/backup//global/4183": Too many open files
pg_basebackup: error: could not create file "/home/edb/Desktop/backup//global/3592": Too many open files
pg_basebackup: error: could not create file "/home/edb/Desktop/backup//global/4177": Too many open files
[edb@localhost bin]$

4) The backup directory is not cleaned

[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
base pg_commit_ts pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_xact
global pg_dynshmem pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc pg_wal
[edb@localhost bin]$

Kashif Zeeshan

EnterpriseDB

On Thu, Apr 2, 2020 at 2:58 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:

Hi Asif,

My colleague Kashif Zeeshan reported an issue off-list, posting here, please take a look.

When executing two backups at the same time, getting FATAL error due to max_wal_senders and instead of exit Backup got completed
And when tried to start the server from the backup cluster, getting error.

[edb@localhost bin]$ ./pgbench -i -s 200 -h localhost -p 5432 postgres
[edb@localhost bin]$ ./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C2000270 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57849"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (3) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (4) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (5) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (6) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (7) created
pg_basebackup: write-ahead log end point: 0/C3000050
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: base backup completed
[edb@localhost bin]$ ./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C20001C0 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57848"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (3) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (4) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (5) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (6) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (7) created
pg_basebackup: write-ahead log end point: 0/C2000348
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: base backup completed

[edb@localhost bin]$ ./pg_ctl -D /home/edb/Desktop/backup1/ -o "-p 5438" start
pg_ctl: directory "/home/edb/Desktop/backup1" is not a database cluster directory

Thanks & Regards,
Rajkumar Raghuwanshi

On Mon, Mar 30, 2020 at 6:28 PM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:

On Mon, Mar 30, 2020 at 3:44 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
Thanks Asif,

I have re-verified reported issue. expect standby backup, others are fixed.

Yes As Asif mentioned he is working on the standby issue and adding bandwidth throttling functionality to parallel backup.

It would be good to get some feedback on Asif previous email from Robert on the design considerations for stand-by server support and throttling. I believe all the other points mentioned by Robert in this thread are addressed by Asif so it would be good to hear about any other concerns that are not addressed.

Thanks,

-- Ahsan

Thanks & Regards,
Rajkumar Raghuwanshi

On Fri, Mar 27, 2020 at 11:04 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Wed, Mar 25, 2020 at 12:22 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
Hi Asif,

While testing further I observed parallel backup is not able to take backup of standby server.

mkdir /tmp/archive_dir
echo "archive_mode='on'">> data/postgresql.conf
echo "archive_command='cp %p /tmp/archive_dir/%f'">> data/postgresql.conf

./pg_ctl -D data -l logs start
./pg_basebackup -p 5432 -Fp -R -D /tmp/slave

echo "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">> /tmp/slave/postgresql.conf
echo "restore_command='cp /tmp/archive_dir/%f %p'">> /tmp/slave/postgresql.conf
echo "promote_trigger_file='/tmp/failover.log'">> /tmp/slave/postgresql.conf

./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c

[edb@localhost bin]$ ./psql postgres -p 5432 -c "select pg_is_in_recovery();"
pg_is_in_recovery
-------------------
f
(1 row)

[edb@localhost bin]$ ./psql postgres -p 5433 -c "select pg_is_in_recovery();"
pg_is_in_recovery
-------------------
t
(1 row)

[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 6
pg_basebackup: error: could not list backup files: ERROR: the standby was promoted during online backup
HINT: This means that the backup being taken is corrupt and should not be used. Try taking another online backup.
pg_basebackup: removing data directory "/tmp/bkp_s"

#same is working fine without parallel backup
[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
[edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION
/tmp/bkp_s/PG_VERSION

Thanks & Regards,
Rajkumar Raghuwanshi

On Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
Hi Asif,

In another scenarios, bkp data is corrupted for tablespace. again this is not reproducible everytime,
but If I am running the same set of commands I am getting the same error.

[edb@localhost bin]$ ./pg_ctl -D data -l logfile start
waiting for server to start.... done
server started
[edb@localhost bin]$
[edb@localhost bin]$ mkdir /tmp/tblsp
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp location '/tmp/tblsp';"
CREATE TABLESPACE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create database testdb tablespace tblsp;"
CREATE DATABASE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a text);"
CREATE TABLE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl values ('parallel_backup with tablespace');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T /tmp/tblsp=/tmp/tblsp_bkp --jobs 2
[edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p 5555" start
waiting for server to start.... done
server started
[edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
oid | spcname | spcowner | spcacl | spcoptions
-------+------------+----------+--------+------------
1663 | pg_default | 10 | |
16384 | tblsp | 10 | |
(2 rows)

[edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
psql: error: could not connect to server: FATAL: "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
DETAIL: File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is missing.
[edb@localhost bin]$
[edb@localhost bin]$ ls data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
[edb@localhost bin]$ ls /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
ls: cannot access /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or directory

Thanks & Regards,
Rajkumar Raghuwanshi

On Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:
Hi Asif,

On testing further, I found when taking backup with -R, pg_basebackup crashed
this crash is not consistently reproducible.

[edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a text);"
CREATE TABLE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values ('parallel_backup with -R recovery-conf');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R
Segmentation fault (core dumped)

stack trace looks the same as it was on earlier reported crash with tablespace.
--stack trace
[edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
3175 backupinfo->curr = fetchfile->next;
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
#1 0x0000000000408a9e in worker_run (arg=0xc1e458) at pg_basebackup.c:2715
#2 0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at pthread_create.c:301
#3 0x00000039212e8c4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb)

Thanks & Regards,
Rajkumar Raghuwanshi

On Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:
Hi Asif,

Thanks Rajkumar. I have fixed the above issues and have rebased the patch to the latest master (b7f64c64).
(V9 of the patches are attached).

I had a further review of the patches and here are my few observations:

1.
+/*
+ * stop_backup() - ends an online backup
+ *
+ * The function is called at the end of an online backup. It sends out pg_control
+ * file, optionally WAL segments and ending WAL location.
+ */

Comments seem out-dated.

Fixed.

2. With parallel jobs, maxrate is now not supported. Since we are now asking
data in multiple threads throttling seems important here. Can you please
explain why have you disabled that?

3. As we are always fetching a single file and as Robert suggested, let rename
SEND_FILES to SEND_FILE instead.

Yes, we are fetching a single file. However, SEND_FILES is still capable of fetching multiple files in one
go, that's why the name.

4. Does this work on Windows? I mean does pthread_create() work on Windows?
I asked this as I see that pgbench has its own implementation for
pthread_create() for WIN32 but this patch doesn't.

patch is updated to add support for the Windows platform.

5. Typos:
tablspace => tablespace
safly => safely

Done.

6. parallel_backup_run() needs some comments explaining the states it goes
through PB_* states.

7.
+            case PB_FETCH_REL_FILES:    /* fetch files from server */
+                if (backupinfo->activeworkers == 0)
+                {
+                    backupinfo->backupstate = PB_STOP_BACKUP;
+                    free_filelist(backupinfo);
+                }
+                break;
+            case PB_FETCH_WAL_FILES:    /* fetch WAL files from server */
+                if (backupinfo->activeworkers == 0)
+                {
+                    backupinfo->backupstate = PB_BACKUP_COMPLETE;
+                }
+                break;
Done.

Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
Done.

The corrupted tablespace and crash, reported by Rajkumar, have been fixed. A pointer
variable remained uninitialized which in turn caused the system to misbehave.

Attached is the updated set of patches. AFAIK, to complete parallel backup feature
set, there remain three sub-features:

1- parallel backup does not work with a standby server. In parallel backup, the server
spawns multiple processes and there is no shared state being maintained. So currently,
no way to tell multiple processes if the standby was promoted during the backup since
the START_BACKUP was called.

2- throttling. Robert previously suggested that we implement throttling on the client-side.
However, I found a previous discussion where it was advocated to be added to the
backend instead[1].

So, it was better to have a consensus before moving the throttle function to the client.
That’s why for the time being I have disabled it and have asked for suggestions on it
to move forward.

It seems to me that we have to maintain a shared state in order to support taking backup
from standby. Also, there is a new feature recently committed for backup progress
reporting in the backend (pg_stat_progress_basebackup). This functionality was recently
added via this commit ID: e65497df. For parallel backup to update these stats, a shared
state will be required.

Since multiple pg_basebackup can be running at the same time, maintaining a shared state
can become a little complex, unless we disallow taking multiple parallel backups.

So proceeding on with this patch, I will be working on:
- throttling to be implemented on the client-side.
- adding a shared state to handle backup from the standby.

[1] https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

--
Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.ca

Regards

====================================

Kashif Zeeshan

Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation

The Enterprise Postgres Company

Re: WIP/PoC for parallel backup

From

Robert Haas

Date:

02 April 2020, 11:46:51

On Fri, Mar 27, 2020 at 1:34 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
> Yes, we are fetching a single file. However, SEND_FILES is still capable of fetching multiple files in one
> go, that's why the name.

I don't see why it should work that way. If we're fetching individual
files, why have an unused capability to fetch multiple files?

> 1- parallel backup does not work with a standby server. In parallel backup, the server
> spawns multiple processes and there is no shared state being maintained. So currently,
> no way to tell multiple processes if the standby was promoted during the backup since
> the START_BACKUP was called.

Why would you need to do that? As long as the process where
STOP_BACKUP can do the check, that seems good enough.

> 2- throttling. Robert previously suggested that we implement throttling on the client-side.
> However, I found a previous discussion where it was advocated to be added to the
> backend instead[1].
>
> So, it was better to have a consensus before moving the throttle function to the client.
> That’s why for the time being I have disabled it and have asked for suggestions on it
> to move forward.
>
> It seems to me that we have to maintain a shared state in order to support taking backup
> from standby. Also, there is a new feature recently committed for backup progress
> reporting in the backend (pg_stat_progress_basebackup). This functionality was recently
> added via this commit ID: e65497df. For parallel backup to update these stats, a shared
> state will be required.

I've come around to the view that a shared state is a good idea and
that throttling on the server-side makes more sense. I'm not clear on
whether we need shared state only for throttling or whether we need it
for more than that. Another possible reason might be for the
progress-reporting stuff that just got added.

> Since multiple pg_basebackup can be running at the same time, maintaining a shared state
> can become a little complex, unless we disallow taking multiple parallel backups.

I do not see why it would be necessary to disallow taking multiple
parallel backups. You just need to have multiple copies of the shared
state and a way to decide which one to use for any particular backup.
I guess that is a little complex, but only a little.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

From

Robert Haas

Date:

02 April 2020, 11:48:33

On Thu, Apr 2, 2020 at 7:30 AM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:

The backup failed with errors "error: could not connect to server: could not look up local user ID 1000: Too many open files" when the max_wal_senders was set to 2000.
The errors generated for the workers starting from backup worke=1017.

It wasn't the fact that you set max_wal_senders to 2000. It was the fact that you specified 1990 parallel workers. By so doing, you overloaded the machine, which is why everything failed. That's to be expected.

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

From

Kashif Zeeshan

Date:

02 April 2020, 11:54:49

On Thu, Apr 2, 2020 at 4:48 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Apr 2, 2020 at 7:30 AM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:
The backup failed with errors "error: could not connect to server: could not look up local user ID 1000: Too many open files" when the max_wal_senders was set to 2000.
The errors generated for the workers starting from backup worke=1017.

It wasn't the fact that you set max_wal_senders to 2000. It was the fact that you specified 1990 parallel workers. By so doing, you overloaded the machine, which is why everything failed. That's to be expected.

Thanks alot Robert,

In this case the backup folder was not being emptied as the backup was failed, the cleanup should be done in this case too.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Regards

====================================

Kashif Zeeshan

Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation

The Enterprise Postgres Company

Re: WIP/PoC for parallel backup

From

Robert Haas

Date:

02 April 2020, 13:23:21

On Thu, Apr 2, 2020 at 7:55 AM Kashif Zeeshan
<kashif.zeeshan@enterprisedb.com> wrote:
> Thanks alot Robert,
> In this case the backup folder was not being emptied as the backup was failed, the cleanup should be done in this
casetoo.
 

Does it fail to clean up the backup folder in all cases where the
backup failed, or just in this case?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

From

Kashif Zeeshan

Date:

02 April 2020, 13:46:15

On Thu, Apr 2, 2020 at 6:23 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Apr 2, 2020 at 7:55 AM Kashif Zeeshan
<kashif.zeeshan@enterprisedb.com> wrote:
> Thanks alot Robert,
> In this case the backup folder was not being emptied as the backup was failed, the cleanup should be done in this case too.

Does it fail to clean up the backup folder in all cases where the
backup failed, or just in this case?

The cleanup is done in the cases I have seen so far with base pg_basebackup functionality (not including the parallel backup feature) with the message "pg_basebackup: removing contents of data directory"

A similar case was also fixed for parallel backup reported by Rajkumar where the contents of the backup folder were not cleaned up after the error.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Regards

====================================

Kashif Zeeshan

Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation

The Enterprise Postgres Company

Re: WIP/PoC for parallel backup

From

Robert Haas

Date:

02 April 2020, 14:20:15

On Thu, Apr 2, 2020 at 9:46 AM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:

Does it fail to clean up the backup folder in all cases where the
backup failed, or just in this case?
The cleanup is done in the cases I have seen so far with base pg_basebackup functionality (not including the parallel backup feature) with the message "pg_basebackup: removing contents of data directory"
A similar case was also fixed for parallel backup reported by Rajkumar where the contents of the backup folder were not cleaned up after the error.

What I'm saying is that it's unclear whether there's a bug here or whether it just failed because of the very extreme test scenario you created. Spawning >1000 processes on a small machine can easily make a lot of things fail.

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

02 April 2020, 15:16:57

On Thu, Apr 2, 2020 at 4:47 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Fri, Mar 27, 2020 at 1:34 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
> Yes, we are fetching a single file. However, SEND_FILES is still capable of fetching multiple files in one
> go, that's why the name.

I don't see why it should work that way. If we're fetching individual
files, why have an unused capability to fetch multiple files?

Okay will rename and will modify the function to send a single file as well.

> 1- parallel backup does not work with a standby server. In parallel backup, the server
> spawns multiple processes and there is no shared state being maintained. So currently,
> no way to tell multiple processes if the standby was promoted during the backup since
> the START_BACKUP was called.

Why would you need to do that? As long as the process where
STOP_BACKUP can do the check, that seems good enough.

Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is

in progress. So if the backup is a large one, early error detection would be much beneficial.

This is the current behavior of non-parallel backup as well.

> 2- throttling. Robert previously suggested that we implement throttling on the client-side.
> However, I found a previous discussion where it was advocated to be added to the
> backend instead[1].
>
> So, it was better to have a consensus before moving the throttle function to the client.
> That’s why for the time being I have disabled it and have asked for suggestions on it
> to move forward.
>
> It seems to me that we have to maintain a shared state in order to support taking backup
> from standby. Also, there is a new feature recently committed for backup progress
> reporting in the backend (pg_stat_progress_basebackup). This functionality was recently
> added via this commit ID: e65497df. For parallel backup to update these stats, a shared
> state will be required.

I've come around to the view that a shared state is a good idea and
that throttling on the server-side makes more sense. I'm not clear on
whether we need shared state only for throttling or whether we need it
for more than that. Another possible reason might be for the
progress-reporting stuff that just got added.

Okay, then I will add the shared state. And since we are adding the shared state, we can use

that for throttling, progress-reporting and standby early error checking.

> Since multiple pg_basebackup can be running at the same time, maintaining a shared state
> can become a little complex, unless we disallow taking multiple parallel backups.

I do not see why it would be necessary to disallow taking multiple
parallel backups. You just need to have multiple copies of the shared
state and a way to decide which one to use for any particular backup.
I guess that is a little complex, but only a little.

There are two possible options:

(1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
(2) (Preferred Option) Use the WAL start location as the BackupID.

This BackupID should be given back as a response to start backup command. All client workers

must append this ID to all parallel backup replication commands. So that we can use this identifier

to search for that particular backup. Does that sound good?

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

From

Robert Haas

Date:

02 April 2020, 15:44:59

On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> Why would you need to do that? As long as the process where
>> STOP_BACKUP can do the check, that seems good enough.
>
> Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
> in progress. So if the backup is a large one, early error detection would be much beneficial.
> This is the current behavior of non-parallel backup as well.

Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.

TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.

> Okay, then I will add the shared state. And since we are adding the shared state, we can use
> that for throttling, progress-reporting and standby early error checking.

Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.

> There are two possible options:
>
> (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> (2) (Preferred Option) Use the WAL start location as the BackupID.
>
> This BackupID should be given back as a response to start backup command. All client workers
> must append this ID to all parallel backup replication commands. So that we can use this identifier
> to search for that particular backup. Does that sound good?

Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.

I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:

START_BACKUP unique_backup_id [options]...

And then others do:

JOIN_BACKUP unique_backup_id

My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.

If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

03 April 2020, 08:45:23

On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> Why would you need to do that? As long as the process where
>> STOP_BACKUP can do the check, that seems good enough.
>
> Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
> in progress. So if the backup is a large one, early error detection would be much beneficial.
> This is the current behavior of non-parallel backup as well.

Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.

TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.

Non-parallel backup already does the early error checking. I only intended

to make parallel behave the same as non-parallel here. So, I agree with

you that the behavior of parallel backup should be consistent with the

non-parallel one. Please see the code snippet below from

basebackup.c:sendDir()

/*
* Check if the postmaster has signaled us to exit, and abort with an
* error in that case. The error handler further up will call
* do_pg_abort_backup() for us. Also check that if the backup was
* started while still in recovery, the server wasn't promoted.
* do_pg_stop_backup() will check that too, but it's better to stop
* the backup early than continue to the end and fail there.
*/
CHECK_FOR_INTERRUPTS();
if (RecoveryInProgress() != backup_started_in_recovery)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("the standby was promoted during online backup"),
errhint("This means that the backup being taken is corrupt "
"and should not be used. "
"Try taking another online backup.")));

> Okay, then I will add the shared state. And since we are adding the shared state, we can use
> that for throttling, progress-reporting and standby early error checking.

Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.

> There are two possible options:
>
> (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> (2) (Preferred Option) Use the WAL start location as the BackupID.
>
> This BackupID should be given back as a response to start backup command. All client workers
> must append this ID to all parallel backup replication commands. So that we can use this identifier
> to search for that particular backup. Does that sound good?

Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.

I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:

START_BACKUP unique_backup_id [options]...

And then others do:

JOIN_BACKUP unique_backup_id

My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.

If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.

Okay so If we are to add another replication command ‘JOIN_BACKUP unique_backup_id’
to make workers find the relevant shared state. There won't be any need for changing
the grammar for any other command. The START_BACKUP can return the unique_backup_id
in the result set.

I am thinking of the following struct for shared state:

typedef struct
{
char backupid[NAMEDATALEN];
XLogRecPtr startptr;

slock_t lock;
int64 throttling_counter;
bool backup_started_in_recovery;
} BackupSharedState;

The shared state structure entries would be maintained by a shared hash table.

There will be one structure per parallel backup. Since a single parallel backup

can engage more than one wal sender, so I think max_wal_senders might be a little

too much; perhaps max_wal_senders/2 since there will be at least 2 connections

per parallel backup? Alternatively, we can set a new GUC that defines the maximum

number of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’

perhaps, or we can make it user-configurable.

The key would be “backupid=hex_encode(pg_random_strong(16))”

Checking for Standby Promotion:

At the START_BACKUP command, we initialize BackupSharedState.backup_started_in_recovery

and keep checking it whenever send_file () is called to send a new file.

Throttling:

BackupSharedState.throttling_counter - The throttling logic remains the same

as for non-parallel backup with the exception that multiple threads will now be

updating it. So in parallel backup, this will represent the overall bytes that

have been transferred. So the workers would sleep if they have exceeded the

limit. Hence, the shared state carries a lock to safely update the throttling

value atomically.

Progress Reporting:

Although I think we should add progress-reporting for parallel backup as a

separate patch. The relevant entries for progress-reporting such as

‘backup_total’ and ‘backup_streamed’ would be then added to this structure

as well.

Grammar:

There is a change in the resultset being returned for START_BACKUP command;

unique_backup_id is added. Additionally, JOIN_BACKUP replication command is

added. SEND_FILES has been renamed to SEND_FILE. There are no other changes

to the grammar.

START_BACKUP [LABEL '<label>'] [FAST]

- returns startptr, tli, backup_label, unique_backup_id

STOP_BACKUP [NOWAIT]

- returns startptr, tli, backup_label

JOIN_BACKUP ‘unique_backup_id’

- attaches a shared state identified by ‘unique_backup_id’ to a backend process.

LIST_TABLESPACES [PROGRESS]

LIST_FILES [TABLESPACE]

LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']

SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

From

Kashif Zeeshan

Date:

03 April 2020, 10:01:17

Hi Asif

When a non-existent slot is used with tablespace then correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.

Steps

=======

edb@localhost bin]$
[edb@localhost bin]$ mkdir /home/edb/tbl1
[edb@localhost bin]$ mkdir /home/edb/tbl_res
[edb@localhost bin]$
postgres=# create tablespace tbl1 location '/home/edb/tbl1';
CREATE TABLESPACE
postgres=#
postgres=# create table t1 (a int) tablespace tbl1;
CREATE TABLE
postgres=# insert into t1 values(100);
INSERT 0 1
postgres=# insert into t1 values(200);
INSERT 0 1
postgres=# insert into t1 values(300);
INSERT 0 1
postgres=#

[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 2 -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test" does not exist
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: write-ahead log end point: 0/2E000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child thread exited with error 1
[edb@localhost bin]$

backup folder not cleaned

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label global pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION pg_xact postgresql.conf
base pg_commit_ts pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_wal postgresql.auto.conf
[edb@localhost bin]$

If the same case is executed without the parallel backup patch then the backup folder is cleaned after the error is displayed.

[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test999
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test999" does not exist
pg_basebackup: write-ahead log end point: 0/2B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory " /home/edb/Desktop/backup"
pg_basebackup: changes to tablespace directories will not be undone

On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> Why would you need to do that? As long as the process where
>> STOP_BACKUP can do the check, that seems good enough.
>
> Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
> in progress. So if the backup is a large one, early error detection would be much beneficial.
> This is the current behavior of non-parallel backup as well.

Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.

TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.

Non-parallel backup already does the early error checking. I only intended
to make parallel behave the same as non-parallel here. So, I agree with
you that the behavior of parallel backup should be consistent with the
non-parallel one. Please see the code snippet below from
basebackup.c:sendDir()

/*
* Check if the postmaster has signaled us to exit, and abort with an
* error in that case. The error handler further up will call
* do_pg_abort_backup() for us. Also check that if the backup was
* started while still in recovery, the server wasn't promoted.
* do_pg_stop_backup() will check that too, but it's better to stop
* the backup early than continue to the end and fail there.
*/
CHECK_FOR_INTERRUPTS();
if (RecoveryInProgress() != backup_started_in_recovery)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("the standby was promoted during online backup"),
errhint("This means that the backup being taken is corrupt "
"and should not be used. "
"Try taking another online backup.")));

> Okay, then I will add the shared state. And since we are adding the shared state, we can use
> that for throttling, progress-reporting and standby early error checking.

Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.

> There are two possible options:
>
> (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> (2) (Preferred Option) Use the WAL start location as the BackupID.
>
> This BackupID should be given back as a response to start backup command. All client workers
> must append this ID to all parallel backup replication commands. So that we can use this identifier
> to search for that particular backup. Does that sound good?

Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.

I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:

START_BACKUP unique_backup_id [options]...

And then others do:

JOIN_BACKUP unique_backup_id

My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.

If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.

Okay so If we are to add another replication command ‘JOIN_BACKUP unique_backup_id’
to make workers find the relevant shared state. There won't be any need for changing
the grammar for any other command. The START_BACKUP can return the unique_backup_id
in the result set.

I am thinking of the following struct for shared state:
typedef struct
{
char backupid[NAMEDATALEN];
XLogRecPtr startptr;

slock_t lock;
int64 throttling_counter;
bool backup_started_in_recovery;
} BackupSharedState;

The shared state structure entries would be maintained by a shared hash table.
There will be one structure per parallel backup. Since a single parallel backup
can engage more than one wal sender, so I think max_wal_senders might be a little
too much; perhaps max_wal_senders/2 since there will be at least 2 connections
per parallel backup? Alternatively, we can set a new GUC that defines the maximum
number of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’
perhaps, or we can make it user-configurable.

The key would be “backupid=hex_encode(pg_random_strong(16))”

Checking for Standby Promotion:
At the START_BACKUP command, we initialize BackupSharedState.backup_started_in_recovery
and keep checking it whenever send_file () is called to send a new file.

Throttling:
BackupSharedState.throttling_counter - The throttling logic remains the same
as for non-parallel backup with the exception that multiple threads will now be
updating it. So in parallel backup, this will represent the overall bytes that
have been transferred. So the workers would sleep if they have exceeded the
limit. Hence, the shared state carries a lock to safely update the throttling
value atomically.

Progress Reporting:
Although I think we should add progress-reporting for parallel backup as a
separate patch. The relevant entries for progress-reporting such as
‘backup_total’ and ‘backup_streamed’ would be then added to this structure
as well.

Grammar:
There is a change in the resultset being returned for START_BACKUP command;
unique_backup_id is added. Additionally, JOIN_BACKUP replication command is
added. SEND_FILES has been renamed to SEND_FILE. There are no other changes
to the grammar.

START_BACKUP [LABEL '<label>'] [FAST]
- returns startptr, tli, backup_label, unique_backup_id
STOP_BACKUP [NOWAIT]
- returns startptr, tli, backup_label
JOIN_BACKUP ‘unique_backup_id’
- attaches a shared state identified by ‘unique_backup_id’ to a backend process.

LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Regards

====================================

Kashif Zeeshan

Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation

The Enterprise Postgres Company

Re: WIP/PoC for parallel backup

From

Jeevan Chalke

Date:

07 April 2020, 04:15:00

Asif,

After recent backup manifest addition, patches needed to rebase and
reconsideration of a few things like making sure that parallel backup creates
a manifest file correctly or not etc.

Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

From

Kashif Zeeshan

Date:

07 April 2020, 11:03:46

On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:

Hi Asif

When a non-existent slot is used with tablespace then correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.

Steps
=======

edb@localhost bin]$
[edb@localhost bin]$ mkdir /home/edb/tbl1
[edb@localhost bin]$ mkdir /home/edb/tbl_res
[edb@localhost bin]$
postgres=# create tablespace tbl1 location '/home/edb/tbl1';
CREATE TABLESPACE
postgres=#
postgres=# create table t1 (a int) tablespace tbl1;
CREATE TABLE
postgres=# insert into t1 values(100);
INSERT 0 1
postgres=# insert into t1 values(200);
INSERT 0 1
postgres=# insert into t1 values(300);
INSERT 0 1
postgres=#

[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 2 -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test" does not exist
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: write-ahead log end point: 0/2E000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child thread exited with error 1
[edb@localhost bin]$

backup folder not cleaned

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label global pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION pg_xact postgresql.conf
base pg_commit_ts pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_wal postgresql.auto.conf
[edb@localhost bin]$

If the same case is executed without the parallel backup patch then the backup folder is cleaned after the error is displayed.

[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test999
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test999" does not exist
pg_basebackup: write-ahead log end point: 0/2B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory " /home/edb/Desktop/backup"
pg_basebackup: changes to tablespace directories will not be undone

Hi Asif

A similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup. I think one bug fix will solve all these cases where clean up is not done when parallel backup is failed.

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57337"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$

Same case when executed on pg_basebackup without the Parallel backup patch then proper clean up is done.

[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_5590"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
[edb@localhost bin]$

Thanks

On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> Why would you need to do that? As long as the process where
>> STOP_BACKUP can do the check, that seems good enough.
>
> Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
> in progress. So if the backup is a large one, early error detection would be much beneficial.
> This is the current behavior of non-parallel backup as well.

Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.

TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.

Non-parallel backup already does the early error checking. I only intended
to make parallel behave the same as non-parallel here. So, I agree with
you that the behavior of parallel backup should be consistent with the
non-parallel one. Please see the code snippet below from
basebackup.c:sendDir()

/*
* Check if the postmaster has signaled us to exit, and abort with an
* error in that case. The error handler further up will call
* do_pg_abort_backup() for us. Also check that if the backup was
* started while still in recovery, the server wasn't promoted.
* do_pg_stop_backup() will check that too, but it's better to stop
* the backup early than continue to the end and fail there.
*/
CHECK_FOR_INTERRUPTS();
if (RecoveryInProgress() != backup_started_in_recovery)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("the standby was promoted during online backup"),
errhint("This means that the backup being taken is corrupt "
"and should not be used. "
"Try taking another online backup.")));

> Okay, then I will add the shared state. And since we are adding the shared state, we can use
> that for throttling, progress-reporting and standby early error checking.

Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.

> There are two possible options:
>
> (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> (2) (Preferred Option) Use the WAL start location as the BackupID.
>
> This BackupID should be given back as a response to start backup command. All client workers
> must append this ID to all parallel backup replication commands. So that we can use this identifier
> to search for that particular backup. Does that sound good?

Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.

I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:

START_BACKUP unique_backup_id [options]...

And then others do:

JOIN_BACKUP unique_backup_id

My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.

If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.

Okay so If we are to add another replication command ‘JOIN_BACKUP unique_backup_id’
to make workers find the relevant shared state. There won't be any need for changing
the grammar for any other command. The START_BACKUP can return the unique_backup_id
in the result set.

I am thinking of the following struct for shared state:
typedef struct
{
char backupid[NAMEDATALEN];
XLogRecPtr startptr;

slock_t lock;
int64 throttling_counter;
bool backup_started_in_recovery;
} BackupSharedState;

The shared state structure entries would be maintained by a shared hash table.
There will be one structure per parallel backup. Since a single parallel backup
can engage more than one wal sender, so I think max_wal_senders might be a little
too much; perhaps max_wal_senders/2 since there will be at least 2 connections
per parallel backup? Alternatively, we can set a new GUC that defines the maximum
number of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’
perhaps, or we can make it user-configurable.

The key would be “backupid=hex_encode(pg_random_strong(16))”

Checking for Standby Promotion:
At the START_BACKUP command, we initialize BackupSharedState.backup_started_in_recovery
and keep checking it whenever send_file () is called to send a new file.

Throttling:
BackupSharedState.throttling_counter - The throttling logic remains the same
as for non-parallel backup with the exception that multiple threads will now be
updating it. So in parallel backup, this will represent the overall bytes that
have been transferred. So the workers would sleep if they have exceeded the
limit. Hence, the shared state carries a lock to safely update the throttling
value atomically.

Progress Reporting:
Although I think we should add progress-reporting for parallel backup as a
separate patch. The relevant entries for progress-reporting such as
‘backup_total’ and ‘backup_streamed’ would be then added to this structure
as well.

Grammar:
There is a change in the resultset being returned for START_BACKUP command;
unique_backup_id is added. Additionally, JOIN_BACKUP replication command is
added. SEND_FILES has been renamed to SEND_FILE. There are no other changes
to the grammar.

START_BACKUP [LABEL '<label>'] [FAST]
- returns startptr, tli, backup_label, unique_backup_id
STOP_BACKUP [NOWAIT]
- returns startptr, tli, backup_label
JOIN_BACKUP ‘unique_backup_id’
- attaches a shared state identified by ‘unique_backup_id’ to a backend process.

LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

--
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation
The Enterprise Postgres Company

Regards

====================================

Kashif Zeeshan

Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation

The Enterprise Postgres Company

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

07 April 2020, 16:43:51

Hi,

Thanks, Kashif and Rajkumar. I have fixed the reported issues.

I have added the shared state as previously described. The new grammar changes

are as follows:

START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]

- This will generate a unique backupid using pg_strong_random(16) and hex-encoded

it. which is then returned as the result set.

- It will also create a shared state and add it to the hashtable. The hash table size is set

to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically, I think it's

sufficient initial size. max_wal_senders is not used, because it can be set to quite a

large values.

JOIN_BACKUP 'backup_id'

- finds 'backup_id' in hashtable and attaches it to server process.

SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]

- renamed SEND_FILES to SEND_FILE

- removed START_WAL_LOCATION from this because 'startptr' is now accessible through

shared state.

There is no change in other commands:

STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']

The current patches (v11) have been rebased to the latest master. The backup manifest is enabled

by default, so I have disabled it for parallel backup mode and have generated a warning so that

user is aware of it and not expect it in the backup.

On Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:

On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:
Hi Asif

When a non-existent slot is used with tablespace then correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.

Steps
=======

edb@localhost bin]$
[edb@localhost bin]$ mkdir /home/edb/tbl1
[edb@localhost bin]$ mkdir /home/edb/tbl_res
[edb@localhost bin]$
postgres=# create tablespace tbl1 location '/home/edb/tbl1';
CREATE TABLESPACE
postgres=#
postgres=# create table t1 (a int) tablespace tbl1;
CREATE TABLE
postgres=# insert into t1 values(100);
INSERT 0 1
postgres=# insert into t1 values(200);
INSERT 0 1
postgres=# insert into t1 values(300);
INSERT 0 1
postgres=#

[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 2 -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test" does not exist
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: write-ahead log end point: 0/2E000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child thread exited with error 1
[edb@localhost bin]$

backup folder not cleaned

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label global pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION pg_xact postgresql.conf
base pg_commit_ts pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_wal postgresql.auto.conf
[edb@localhost bin]$

If the same case is executed without the parallel backup patch then the backup folder is cleaned after the error is displayed.

[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test999
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test999" does not exist
pg_basebackup: write-ahead log end point: 0/2B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory " /home/edb/Desktop/backup"
pg_basebackup: changes to tablespace directories will not be undone

Hi Asif

A similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup. I think one bug fix will solve all these cases where clean up is not done when parallel backup is failed.

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57337"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$

Same case when executed on pg_basebackup without the Parallel backup patch then proper clean up is done.

[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_5590"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
[edb@localhost bin]$

Thanks

On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> Why would you need to do that? As long as the process where
>> STOP_BACKUP can do the check, that seems good enough.
>
> Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
> in progress. So if the backup is a large one, early error detection would be much beneficial.
> This is the current behavior of non-parallel backup as well.

Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.

TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.

Non-parallel backup already does the early error checking. I only intended
to make parallel behave the same as non-parallel here. So, I agree with
you that the behavior of parallel backup should be consistent with the
non-parallel one. Please see the code snippet below from
basebackup.c:sendDir()

/*
* Check if the postmaster has signaled us to exit, and abort with an
* error in that case. The error handler further up will call
* do_pg_abort_backup() for us. Also check that if the backup was
* started while still in recovery, the server wasn't promoted.
* do_pg_stop_backup() will check that too, but it's better to stop
* the backup early than continue to the end and fail there.
*/
CHECK_FOR_INTERRUPTS();
if (RecoveryInProgress() != backup_started_in_recovery)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("the standby was promoted during online backup"),
errhint("This means that the backup being taken is corrupt "
"and should not be used. "
"Try taking another online backup.")));

> Okay, then I will add the shared state. And since we are adding the shared state, we can use
> that for throttling, progress-reporting and standby early error checking.

Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.

> There are two possible options:
>
> (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> (2) (Preferred Option) Use the WAL start location as the BackupID.
>
> This BackupID should be given back as a response to start backup command. All client workers
> must append this ID to all parallel backup replication commands. So that we can use this identifier
> to search for that particular backup. Does that sound good?

Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.

I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:

START_BACKUP unique_backup_id [options]...

And then others do:

JOIN_BACKUP unique_backup_id

My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.

If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.

Okay so If we are to add another replication command ‘JOIN_BACKUP unique_backup_id’
to make workers find the relevant shared state. There won't be any need for changing
the grammar for any other command. The START_BACKUP can return the unique_backup_id
in the result set.

I am thinking of the following struct for shared state:
typedef struct
{
char backupid[NAMEDATALEN];
XLogRecPtr startptr;

slock_t lock;
int64 throttling_counter;
bool backup_started_in_recovery;
} BackupSharedState;

The shared state structure entries would be maintained by a shared hash table.
There will be one structure per parallel backup. Since a single parallel backup
can engage more than one wal sender, so I think max_wal_senders might be a little
too much; perhaps max_wal_senders/2 since there will be at least 2 connections
per parallel backup? Alternatively, we can set a new GUC that defines the maximum
number of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’
perhaps, or we can make it user-configurable.

The key would be “backupid=hex_encode(pg_random_strong(16))”

Checking for Standby Promotion:
At the START_BACKUP command, we initialize BackupSharedState.backup_started_in_recovery
and keep checking it whenever send_file () is called to send a new file.

Throttling:
BackupSharedState.throttling_counter - The throttling logic remains the same
as for non-parallel backup with the exception that multiple threads will now be
updating it. So in parallel backup, this will represent the overall bytes that
have been transferred. So the workers would sleep if they have exceeded the
limit. Hence, the shared state carries a lock to safely update the throttling
value atomically.

Progress Reporting:
Although I think we should add progress-reporting for parallel backup as a
separate patch. The relevant entries for progress-reporting such as
‘backup_total’ and ‘backup_streamed’ would be then added to this structure
as well.

Grammar:
There is a change in the resultset being returned for START_BACKUP command;
unique_backup_id is added. Additionally, JOIN_BACKUP replication command is
added. SEND_FILES has been renamed to SEND_FILE. There are no other changes
to the grammar.

START_BACKUP [LABEL '<label>'] [FAST]
- returns startptr, tli, backup_label, unique_backup_id
STOP_BACKUP [NOWAIT]
- returns startptr, tli, backup_label
JOIN_BACKUP ‘unique_backup_id’
- attaches a shared state identified by ‘unique_backup_id’ to a backend process.

LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

--
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation
The Enterprise Postgres Company

--
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation
The Enterprise Postgres Company

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Attachment

parallel_backup_v11.zip

Re: WIP/PoC for parallel backup

From

Jeevan Chalke

Date:

07 April 2020, 17:02:55

On Tue, Apr 7, 2020 at 10:14 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

Hi,

Thanks, Kashif and Rajkumar. I have fixed the reported issues.

I have added the shared state as previously described. The new grammar changes
are as follows:

START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
- This will generate a unique backupid using pg_strong_random(16) and hex-encoded
it. which is then returned as the result set.
- It will also create a shared state and add it to the hashtable. The hash table size is set
to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically, I think it's
sufficient initial size. max_wal_senders is not used, because it can be set to quite a
large values.

JOIN_BACKUP 'backup_id'
- finds 'backup_id' in hashtable and attaches it to server process.

SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
- renamed SEND_FILES to SEND_FILE
- removed START_WAL_LOCATION from this because 'startptr' is now accessible through
shared state.

There is no change in other commands:
STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']

The current patches (v11) have been rebased to the latest master. The backup manifest is enabled
by default, so I have disabled it for parallel backup mode and have generated a warning so that
user is aware of it and not expect it in the backup.

So, are you working on to make it work? I don't think a parallel backup feature should be creating a backup with no manifest.

--
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

07 April 2020, 17:25:05

On Tue, Apr 7, 2020 at 10:03 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:

On Tue, Apr 7, 2020 at 10:14 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
Hi,

Thanks, Kashif and Rajkumar. I have fixed the reported issues.

I have added the shared state as previously described. The new grammar changes
are as follows:

START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
- This will generate a unique backupid using pg_strong_random(16) and hex-encoded
it. which is then returned as the result set.
- It will also create a shared state and add it to the hashtable. The hash table size is set
to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically, I think it's
sufficient initial size. max_wal_senders is not used, because it can be set to quite a
large values.

JOIN_BACKUP 'backup_id'
- finds 'backup_id' in hashtable and attaches it to server process.

SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
- renamed SEND_FILES to SEND_FILE
- removed START_WAL_LOCATION from this because 'startptr' is now accessible through
shared state.

There is no change in other commands:
STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']

The current patches (v11) have been rebased to the latest master. The backup manifest is enabled
by default, so I have disabled it for parallel backup mode and have generated a warning so that
user is aware of it and not expect it in the backup.

So, are you working on to make it work? I don't think a parallel backup feature should be creating a backup with no manifest.

I will, however parallel backup is already quite a large patch. So I think we should first

agree on the current work before adding a backup manifest and progress-reporting support.

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

From

Robert Haas

Date:

07 April 2020, 17:36:21

On Fri, Apr 3, 2020 at 4:46 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> Non-parallel backup already does the early error checking. I only intended
> to make parallel behave the same as non-parallel here. So, I agree with
> you that the behavior of parallel backup should be consistent with the
> non-parallel one.  Please see the code snippet below from
> basebackup.c:sendDir()

Oh, OK. So then we need to preserve that behavior, I think. Sorry, I
didn't realize the check was happening there.

> I am thinking of the following struct for shared state:
>> typedef struct
>> {
>> char backupid[NAMEDATALEN];
>> XLogRecPtr startptr;
>> slock_t lock;
>> int64 throttling_counter;
>> bool backup_started_in_recovery;
>> } BackupSharedState;

Looks broadly reasonable. Can anything other than lock and
throttling_counter change while it's running? If not, how about using
pg_atomic_uint64 for the throttling counter, and dropping lock? If
that gets too complicated it's OK to keep it as you have it.

> The shared state structure entries would be maintained by a shared hash table.
> There will be one structure per parallel backup. Since a single parallel backup
> can engage more than one wal sender, so I think max_wal_senders might be a little
> too much; perhaps max_wal_senders/2 since there will be at least 2 connections
> per parallel backup? Alternatively, we can set a new GUC that defines the maximum
> number of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’
> perhaps, or we can make it user-configurable.

I don't think you need a hash table. Linear search should be fine. And
I see no point in dividing max_wal_senders by 2 either. The default is
*10*. You'd need to increase that by more than an order of magnitude
for a hash table to be needed, and more than that for the shared
memory consumption to matter.

> The key would be “backupid=hex_encode(pg_random_strong(16))”

wfm

> Progress Reporting:
> Although I think we should add progress-reporting for parallel backup as a
> separate patch. The relevant entries for progress-reporting such as
> ‘backup_total’ and ‘backup_streamed’ would be then added to this structure
> as well.

I mean, you can separate it for review if you wish, but it would need
to be committed together.

> START_BACKUP [LABEL '<label>'] [FAST]
>   - returns startptr, tli, backup_label, unique_backup_id

OK. But what if I want to use this interface for a non-parallel backup?

> STOP_BACKUP [NOWAIT]
>   - returns startptr, tli, backup_label

I don't think it makes sense for STOP_BACKUP to return the same values
that START_BACKUP already returned. Presumably STOP_BACKUP should
return the end LSN. It could also return the backup label and
tablespace map files, as the corresponding SQL function does, unless
there's some better way of returning those in this case.

> JOIN_BACKUP ‘unique_backup_id’
>   - attaches a shared state identified by ‘unique_backup_id’ to a backend process.

OK.

> LIST_TABLESPACES [PROGRESS]

OK.

> LIST_FILES [TABLESPACE]

OK.

> LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']

Why not just LIST_WAL_FILES 'startptr' 'endptr'?

> SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]

Why parens? That seems useless.

Maybe it would make sense to have SEND_DATA_FILE 'datafilename' and
SEND_WAL_FILE 'walfilename' as separate commands. But not sure.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

From

Robert Haas

Date:

07 April 2020, 17:37:30

On Tue, Apr 7, 2020 at 1:25 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
> I will, however parallel backup is already quite a large patch. So I think we should first
> agree on the current work before adding a backup manifest and progress-reporting support.

It's going to be needed for commit, but it may make sense for us to do
more review of what you've got here before we worry about it.

I'm gonna try to find some time for that as soon as I can.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

From

Rajkumar Raghuwanshi

Date:

08 April 2020, 05:48:31

Hi Asif,

Thanks for new patches.

Patches need to be rebased on head. Getting a failure while applying the 0003 patch.

edb@localhost postgresql]$ git apply v11/0003-Parallel-Backup-Backend-Replication-commands.patch
error: patch failed: src/backend/storage/ipc/ipci.c:147
error: src/backend/storage/ipc/ipci.c: patch does not apply

I have applied v11 patches on commit - 23ba3b5ee278847e4fad913b80950edb2838fd35 to test further.

pg_basebackup has a new option "--no-estimate-size", pg_basebackup crashes when using this option.

[edb@localhost bin]$ ./pg_basebackup -D /tmp/bkp --no-estimate-size --jobs=2
Segmentation fault (core dumped)

--stacktrace

[edb@localhost bin]$ gdb -q -c core.80438 pg_basebackup
Loaded symbols for /lib64/libselinux.so.1
Core was generated by `./pg_basebackup -D /tmp/bkp --no-estimate-size --jobs=2'.
Program terminated with signal 11, Segmentation fault.
#0 ____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=<value optimized out>, loc=0x392158ee40) at ../stdlib/strtol_l.c:298
298 while (ISSPACE (*s))
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 ____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=<value optimized out>, loc=0x392158ee40) at ../stdlib/strtol_l.c:298
#1 0x0000003921233b30 in atoi (nptr=<value optimized out>) at atoi.c:28
#2 0x000000000040841e in main (argc=5, argv=0x7ffeaa6fb968) at pg_basebackup.c:2526

Thanks & Regards,

Rajkumar Raghuwanshi

On Tue, Apr 7, 2020 at 11:07 PM Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Apr 7, 2020 at 1:25 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
> I will, however parallel backup is already quite a large patch. So I think we should first
> agree on the current work before adding a backup manifest and progress-reporting support.

It's going to be needed for commit, but it may make sense for us to do
more review of what you've got here before we worry about it.

I'm gonna try to find some time for that as soon as I can.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

08 April 2020, 07:09:20

rebased and updated to current master (d025cf88ba). v12 is attahced.

Also, changed the grammar for LIST_WAL_FILES and SEND_FILE to:

- LIST_WAL_FILES 'startptr' 'endptr'

- SEND_FILE 'FILE' [NOVERIFY_CHECKSUMS]

On Wed, Apr 8, 2020 at 10:48 AM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:

Hi Asif,

Thanks for new patches.

Patches need to be rebased on head. Getting a failure while applying the 0003 patch.
edb@localhost postgresql]$ git apply v11/0003-Parallel-Backup-Backend-Replication-commands.patch
error: patch failed: src/backend/storage/ipc/ipci.c:147
error: src/backend/storage/ipc/ipci.c: patch does not apply

I have applied v11 patches on commit - 23ba3b5ee278847e4fad913b80950edb2838fd35 to test further.

pg_basebackup has a new option "--no-estimate-size", pg_basebackup crashes when using this option.

[edb@localhost bin]$ ./pg_basebackup -D /tmp/bkp --no-estimate-size --jobs=2
Segmentation fault (core dumped)

--stacktrace
[edb@localhost bin]$ gdb -q -c core.80438 pg_basebackup
Loaded symbols for /lib64/libselinux.so.1
Core was generated by `./pg_basebackup -D /tmp/bkp --no-estimate-size --jobs=2'.
Program terminated with signal 11, Segmentation fault.
#0 ____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=<value optimized out>, loc=0x392158ee40) at ../stdlib/strtol_l.c:298
298 while (ISSPACE (*s))
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 ____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=<value optimized out>, loc=0x392158ee40) at ../stdlib/strtol_l.c:298
#1 0x0000003921233b30 in atoi (nptr=<value optimized out>) at atoi.c:28
#2 0x000000000040841e in main (argc=5, argv=0x7ffeaa6fb968) at pg_basebackup.c:2526

Thanks & Regards,
Rajkumar Raghuwanshi

On Tue, Apr 7, 2020 at 11:07 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Apr 7, 2020 at 1:25 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
> I will, however parallel backup is already quite a large patch. So I think we should first
> agree on the current work before adding a backup manifest and progress-reporting support.

It's going to be needed for commit, but it may make sense for us to do
more review of what you've got here before we worry about it.

I'm gonna try to find some time for that as soon as I can.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Attachment

parallel_backup_v12.zip

Re: WIP/PoC for parallel backup

From

Kashif Zeeshan

Date:

08 April 2020, 13:53:14

On Tue, Apr 7, 2020 at 9:44 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

Hi,

Thanks, Kashif and Rajkumar. I have fixed the reported issues.

I have added the shared state as previously described. The new grammar changes
are as follows:

START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
- This will generate a unique backupid using pg_strong_random(16) and hex-encoded
it. which is then returned as the result set.
- It will also create a shared state and add it to the hashtable. The hash table size is set
to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically, I think it's
sufficient initial size. max_wal_senders is not used, because it can be set to quite a
large values.

JOIN_BACKUP 'backup_id'
- finds 'backup_id' in hashtable and attaches it to server process.

SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
- renamed SEND_FILES to SEND_FILE
- removed START_WAL_LOCATION from this because 'startptr' is now accessible through
shared state.

There is no change in other commands:
STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']

The current patches (v11) have been rebased to the latest master. The backup manifest is enabled
by default, so I have disabled it for parallel backup mode and have generated a warning so that
user is aware of it and not expect it in the backup.

Hi Asif

I have verified the bug fixes, one bug is fixed and working now as expected

For the verification of the other bug fixes faced following issues, please have a look.

1) Following bug fixes mentioned below are generating segmentation fault.

Please note for reference I have added a description only as steps were given in previous emails of each bug I tried to verify the fix. Backtrace is also added with each case which points to one bug for both the cases.

a) The backup failed with errors "error: could not connect to server: could not look up local user ID 1000: Too many open files" when the max_wal_senders was set to 2000.

[edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D /home/edb/Desktop/backup/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_9925"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
….
….
pg_basebackup: backup worker (1014) created
pg_basebackup: backup worker (1015) created
pg_basebackup: backup worker (1016) created
pg_basebackup: backup worker (1017) created
pg_basebackup: error: could not connect to server: could not look up local user ID 1000: Too many open files
Segmentation fault
[edb@localhost bin]$

[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup /tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 13219]
[New LWP 13222]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 1990 -D /home/edb/Desktop/backup/'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47 if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1 0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
#2 0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
#3 0x00007f2226f76a49 in __run_exit_handlers (status=1, listp=0x7f22272f86c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#4 0x00007f2226f76a95 in __GI_exit (status=<optimized out>) at exit.c:99
#5 0x0000000000408c54 in create_parallel_workers (backupinfo=0x952ca0) at pg_basebackup.c:2811
#6 0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
#7 0x0000000000408b4d in main (argc=6, argv=0x7ffe3dabc718) at pg_basebackup.c:2765
(gdb)

b) When executing two backups at the same time, getting FATAL error due to max_wal_senders and instead of exit Backup got completed.

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 1/DA000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_17066"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
Segmentation fault (core dumped)
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup /tmp/cores/core.pg_basebackup.17041.localhost.localdomain.1586353696
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 17041]
[New LWP 17067]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47 if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1 0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
#2 0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
#3 0x00007f051edc1a49 in __run_exit_handlers (status=1, listp=0x7f051f1436c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#4 0x00007f051edc1a95 in __GI_exit (status=<optimized out>) at exit.c:99
#5 0x0000000000408c54 in create_parallel_workers (backupinfo=0x1c6dca0) at pg_basebackup.c:2811
#6 0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
#7 0x0000000000408b4d in main (argc=6, argv=0x7ffdb76a6d68) at pg_basebackup.c:2765
(gdb)

2) The following bug is not fixed yet

A similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.

[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_16235"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$

[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
base pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_xact postgresql.conf
pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION postgresql.auto.conf
[edb@localhost bin]$
[edb@localhost bin]$

Thanks

Kashif Zeeshan

On Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:

On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:
Hi Asif

When a non-existent slot is used with tablespace then correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.

Steps
=======

edb@localhost bin]$
[edb@localhost bin]$ mkdir /home/edb/tbl1
[edb@localhost bin]$ mkdir /home/edb/tbl_res
[edb@localhost bin]$
postgres=# create tablespace tbl1 location '/home/edb/tbl1';
CREATE TABLESPACE
postgres=#
postgres=# create table t1 (a int) tablespace tbl1;
CREATE TABLE
postgres=# insert into t1 values(100);
INSERT 0 1
postgres=# insert into t1 values(200);
INSERT 0 1
postgres=# insert into t1 values(300);
INSERT 0 1
postgres=#

[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 2 -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test" does not exist
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: write-ahead log end point: 0/2E000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child thread exited with error 1
[edb@localhost bin]$

backup folder not cleaned

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label global pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION pg_xact postgresql.conf
base pg_commit_ts pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_wal postgresql.auto.conf
[edb@localhost bin]$

If the same case is executed without the parallel backup patch then the backup folder is cleaned after the error is displayed.

[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test999
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test999" does not exist
pg_basebackup: write-ahead log end point: 0/2B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory " /home/edb/Desktop/backup"
pg_basebackup: changes to tablespace directories will not be undone

Hi Asif

A similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup. I think one bug fix will solve all these cases where clean up is not done when parallel backup is failed.

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57337"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$

Same case when executed on pg_basebackup without the Parallel backup patch then proper clean up is done.

[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_5590"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
[edb@localhost bin]$

Thanks

On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> Why would you need to do that? As long as the process where
>> STOP_BACKUP can do the check, that seems good enough.
>
> Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
> in progress. So if the backup is a large one, early error detection would be much beneficial.
> This is the current behavior of non-parallel backup as well.

Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.

TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.

Non-parallel backup already does the early error checking. I only intended
to make parallel behave the same as non-parallel here. So, I agree with
you that the behavior of parallel backup should be consistent with the
non-parallel one. Please see the code snippet below from
basebackup.c:sendDir()

/*
* Check if the postmaster has signaled us to exit, and abort with an
* error in that case. The error handler further up will call
* do_pg_abort_backup() for us. Also check that if the backup was
* started while still in recovery, the server wasn't promoted.
* do_pg_stop_backup() will check that too, but it's better to stop
* the backup early than continue to the end and fail there.
*/
CHECK_FOR_INTERRUPTS();
if (RecoveryInProgress() != backup_started_in_recovery)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("the standby was promoted during online backup"),
errhint("This means that the backup being taken is corrupt "
"and should not be used. "
"Try taking another online backup.")));

> Okay, then I will add the shared state. And since we are adding the shared state, we can use
> that for throttling, progress-reporting and standby early error checking.

Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.

> There are two possible options:
>
> (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> (2) (Preferred Option) Use the WAL start location as the BackupID.
>
> This BackupID should be given back as a response to start backup command. All client workers
> must append this ID to all parallel backup replication commands. So that we can use this identifier
> to search for that particular backup. Does that sound good?

Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.

I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:

START_BACKUP unique_backup_id [options]...

And then others do:

JOIN_BACKUP unique_backup_id

My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.

If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.

Okay so If we are to add another replication command ‘JOIN_BACKUP unique_backup_id’
to make workers find the relevant shared state. There won't be any need for changing
the grammar for any other command. The START_BACKUP can return the unique_backup_id
in the result set.

I am thinking of the following struct for shared state:
typedef struct
{
char backupid[NAMEDATALEN];
XLogRecPtr startptr;

slock_t lock;
int64 throttling_counter;
bool backup_started_in_recovery;
} BackupSharedState;

The shared state structure entries would be maintained by a shared hash table.
There will be one structure per parallel backup. Since a single parallel backup
can engage more than one wal sender, so I think max_wal_senders might be a little
too much; perhaps max_wal_senders/2 since there will be at least 2 connections
per parallel backup? Alternatively, we can set a new GUC that defines the maximum
number of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’
perhaps, or we can make it user-configurable.

The key would be “backupid=hex_encode(pg_random_strong(16))”

Checking for Standby Promotion:
At the START_BACKUP command, we initialize BackupSharedState.backup_started_in_recovery
and keep checking it whenever send_file () is called to send a new file.

Throttling:
BackupSharedState.throttling_counter - The throttling logic remains the same
as for non-parallel backup with the exception that multiple threads will now be
updating it. So in parallel backup, this will represent the overall bytes that
have been transferred. So the workers would sleep if they have exceeded the
limit. Hence, the shared state carries a lock to safely update the throttling
value atomically.

Progress Reporting:
Although I think we should add progress-reporting for parallel backup as a
separate patch. The relevant entries for progress-reporting such as
‘backup_total’ and ‘backup_streamed’ would be then added to this structure
as well.

Grammar:
There is a change in the resultset being returned for START_BACKUP command;
unique_backup_id is added. Additionally, JOIN_BACKUP replication command is
added. SEND_FILES has been renamed to SEND_FILE. There are no other changes
to the grammar.

START_BACKUP [LABEL '<label>'] [FAST]
- returns startptr, tli, backup_label, unique_backup_id
STOP_BACKUP [NOWAIT]
- returns startptr, tli, backup_label
JOIN_BACKUP ‘unique_backup_id’
- attaches a shared state identified by ‘unique_backup_id’ to a backend process.

LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

--
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation
The Enterprise Postgres Company

--
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation
The Enterprise Postgres Company

--
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Regards

====================================

Kashif Zeeshan

Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation

The Enterprise Postgres Company

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

14 April 2020, 12:33:16

On Wed, Apr 8, 2020 at 6:53 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:

On Tue, Apr 7, 2020 at 9:44 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
Hi,

Thanks, Kashif and Rajkumar. I have fixed the reported issues.

I have added the shared state as previously described. The new grammar changes
are as follows:

START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
- This will generate a unique backupid using pg_strong_random(16) and hex-encoded
it. which is then returned as the result set.
- It will also create a shared state and add it to the hashtable. The hash table size is set
to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically, I think it's
sufficient initial size. max_wal_senders is not used, because it can be set to quite a
large values.

JOIN_BACKUP 'backup_id'
- finds 'backup_id' in hashtable and attaches it to server process.

SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
- renamed SEND_FILES to SEND_FILE
- removed START_WAL_LOCATION from this because 'startptr' is now accessible through
shared state.

There is no change in other commands:
STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']

The current patches (v11) have been rebased to the latest master. The backup manifest is enabled
by default, so I have disabled it for parallel backup mode and have generated a warning so that
user is aware of it and not expect it in the backup.

Hi Asif

I have verified the bug fixes, one bug is fixed and working now as expected

For the verification of the other bug fixes faced following issues, please have a look.

1) Following bug fixes mentioned below are generating segmentation fault.

Please note for reference I have added a description only as steps were given in previous emails of each bug I tried to verify the fix. Backtrace is also added with each case which points to one bug for both the cases.

a) The backup failed with errors "error: could not connect to server: could not look up local user ID 1000: Too many open files" when the max_wal_senders was set to 2000.

[edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D /home/edb/Desktop/backup/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_9925"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
….
….
pg_basebackup: backup worker (1014) created
pg_basebackup: backup worker (1015) created
pg_basebackup: backup worker (1016) created
pg_basebackup: backup worker (1017) created
pg_basebackup: error: could not connect to server: could not look up local user ID 1000: Too many open files
Segmentation fault
[edb@localhost bin]$

[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup /tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 13219]
[New LWP 13222]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 1990 -D /home/edb/Desktop/backup/'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47 if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1 0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
#2 0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
#3 0x00007f2226f76a49 in __run_exit_handlers (status=1, listp=0x7f22272f86c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#4 0x00007f2226f76a95 in __GI_exit (status=<optimized out>) at exit.c:99
#5 0x0000000000408c54 in create_parallel_workers (backupinfo=0x952ca0) at pg_basebackup.c:2811
#6 0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
#7 0x0000000000408b4d in main (argc=6, argv=0x7ffe3dabc718) at pg_basebackup.c:2765
(gdb)

b) When executing two backups at the same time, getting FATAL error due to max_wal_senders and instead of exit Backup got completed.

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 1/DA000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_17066"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
Segmentation fault (core dumped)
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup /tmp/cores/core.pg_basebackup.17041.localhost.localdomain.1586353696
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 17041]
[New LWP 17067]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47 if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1 0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
#2 0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
#3 0x00007f051edc1a49 in __run_exit_handlers (status=1, listp=0x7f051f1436c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#4 0x00007f051edc1a95 in __GI_exit (status=<optimized out>) at exit.c:99
#5 0x0000000000408c54 in create_parallel_workers (backupinfo=0x1c6dca0) at pg_basebackup.c:2811
#6 0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
#7 0x0000000000408b4d in main (argc=6, argv=0x7ffdb76a6d68) at pg_basebackup.c:2765
(gdb)

2) The following bug is not fixed yet

A similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.

[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_16235"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$

[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
base pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_xact postgresql.conf
pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION postgresql.auto.conf
[edb@localhost bin]$
[edb@localhost bin]$

Thanks
Kashif Zeeshan

On Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:

On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:
Hi Asif

When a non-existent slot is used with tablespace then correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.

Steps
=======

edb@localhost bin]$
[edb@localhost bin]$ mkdir /home/edb/tbl1
[edb@localhost bin]$ mkdir /home/edb/tbl_res
[edb@localhost bin]$
postgres=# create tablespace tbl1 location '/home/edb/tbl1';
CREATE TABLESPACE
postgres=#
postgres=# create table t1 (a int) tablespace tbl1;
CREATE TABLE
postgres=# insert into t1 values(100);
INSERT 0 1
postgres=# insert into t1 values(200);
INSERT 0 1
postgres=# insert into t1 values(300);
INSERT 0 1
postgres=#

[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 2 -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test" does not exist
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: write-ahead log end point: 0/2E000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child thread exited with error 1
[edb@localhost bin]$

backup folder not cleaned

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label global pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION pg_xact postgresql.conf
base pg_commit_ts pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_wal postgresql.auto.conf
[edb@localhost bin]$

If the same case is executed without the parallel backup patch then the backup folder is cleaned after the error is displayed.

[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test999
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test999" does not exist
pg_basebackup: write-ahead log end point: 0/2B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory " /home/edb/Desktop/backup"
pg_basebackup: changes to tablespace directories will not be undone

Hi Asif

A similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup. I think one bug fix will solve all these cases where clean up is not done when parallel backup is failed.

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57337"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$

Same case when executed on pg_basebackup without the Parallel backup patch then proper clean up is done.

[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_5590"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
[edb@localhost bin]$

Thanks

On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> Why would you need to do that? As long as the process where
>> STOP_BACKUP can do the check, that seems good enough.
>
> Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
> in progress. So if the backup is a large one, early error detection would be much beneficial.
> This is the current behavior of non-parallel backup as well.

Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.

TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.

Non-parallel backup already does the early error checking. I only intended
to make parallel behave the same as non-parallel here. So, I agree with
you that the behavior of parallel backup should be consistent with the
non-parallel one. Please see the code snippet below from
basebackup.c:sendDir()

/*
* Check if the postmaster has signaled us to exit, and abort with an
* error in that case. The error handler further up will call
* do_pg_abort_backup() for us. Also check that if the backup was
* started while still in recovery, the server wasn't promoted.
* do_pg_stop_backup() will check that too, but it's better to stop
* the backup early than continue to the end and fail there.
*/
CHECK_FOR_INTERRUPTS();
if (RecoveryInProgress() != backup_started_in_recovery)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("the standby was promoted during online backup"),
errhint("This means that the backup being taken is corrupt "
"and should not be used. "
"Try taking another online backup.")));

> Okay, then I will add the shared state. And since we are adding the shared state, we can use
> that for throttling, progress-reporting and standby early error checking.

Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.

> There are two possible options:
>
> (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> (2) (Preferred Option) Use the WAL start location as the BackupID.
>
> This BackupID should be given back as a response to start backup command. All client workers
> must append this ID to all parallel backup replication commands. So that we can use this identifier
> to search for that particular backup. Does that sound good?

Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.

I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:

START_BACKUP unique_backup_id [options]...

And then others do:

JOIN_BACKUP unique_backup_id

My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.

If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.

Okay so If we are to add another replication command ‘JOIN_BACKUP unique_backup_id’
to make workers find the relevant shared state. There won't be any need for changing
the grammar for any other command. The START_BACKUP can return the unique_backup_id
in the result set.

I am thinking of the following struct for shared state:
typedef struct
{
char backupid[NAMEDATALEN];
XLogRecPtr startptr;

slock_t lock;
int64 throttling_counter;
bool backup_started_in_recovery;
} BackupSharedState;

The shared state structure entries would be maintained by a shared hash table.
There will be one structure per parallel backup. Since a single parallel backup
can engage more than one wal sender, so I think max_wal_senders might be a little
too much; perhaps max_wal_senders/2 since there will be at least 2 connections
per parallel backup? Alternatively, we can set a new GUC that defines the maximum
number of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’
perhaps, or we can make it user-configurable.

The key would be “backupid=hex_encode(pg_random_strong(16))”

Checking for Standby Promotion:
At the START_BACKUP command, we initialize BackupSharedState.backup_started_in_recovery
and keep checking it whenever send_file () is called to send a new file.

Throttling:
BackupSharedState.throttling_counter - The throttling logic remains the same
as for non-parallel backup with the exception that multiple threads will now be
updating it. So in parallel backup, this will represent the overall bytes that
have been transferred. So the workers would sleep if they have exceeded the
limit. Hence, the shared state carries a lock to safely update the throttling
value atomically.

Progress Reporting:
Although I think we should add progress-reporting for parallel backup as a
separate patch. The relevant entries for progress-reporting such as
‘backup_total’ and ‘backup_streamed’ would be then added to this structure
as well.

Grammar:
There is a change in the resultset being returned for START_BACKUP command;
unique_backup_id is added. Additionally, JOIN_BACKUP replication command is
added. SEND_FILES has been renamed to SEND_FILE. There are no other changes
to the grammar.

START_BACKUP [LABEL '<label>'] [FAST]
- returns startptr, tli, backup_label, unique_backup_id
STOP_BACKUP [NOWAIT]
- returns startptr, tli, backup_label
JOIN_BACKUP ‘unique_backup_id’
- attaches a shared state identified by ‘unique_backup_id’ to a backend process.

LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]

Hi,

rebased and updated to the current master (8128b0c1). v13 is attached.

- Fixes the above reported issues.

- Added progress-reporting support for parallel:
For this, 'backup_streamed' is moved to a shared structure (BackupState) as

pg_atomic_uint64 variable. The worker processes will keep incrementing this

variable.

While files are being transferred from server to client. The main process remains

in an idle state. So after each increment, the worker process will signal master to

update the stats in pg_stat_progress_basebackup view.

The 'tablespace_streamed' column is not updated and will remain empty. This is

because multiple workers may be copying files from different tablespaces.

- Added backup manifest:
The backend workers maintain their own manifest file which contains a list of files

that are being transferred by the work. Once all backup files are transferred, the

workers will create a temp file as ('pg_tempdir/temp_file_prefix_backupid.workerid')

to write the content of the manifest file from BufFile. The workers won’t add the

header, nor the WAL information in their manifest. These two will be added by the

main process while merging all worker manifest files.

The main process will read these individual files and concatenate them into a single file

which is then sent back to the client.

The manifest file is created when the following command is received:

BUILD_MANIFEST 'backupid'

This is a new replication command. It is sent when pg_basebackup has copied all the

$PGDATA files including WAL files.

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Attachment

parallel_backup_v13.zip

Re: WIP/PoC for parallel backup

From

Kashif Zeeshan

Date:

14 April 2020, 13:32:40

Hi Asif

Getting the following error on Parallel backup when --no-manifest option is used.

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 5 -D /home/edb/Desktop/backup/ --no-manifest
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_10223"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: write-ahead log end point: 0/2000100
pg_basebackup: error: could not get data for 'BUILD_MANIFEST': ERROR: could not open file "base/pgsql_tmp/pgsql_tmp_b4ef5ac0fd150b2a28caf626bbb1bef2.1": No such file or directory
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
[edb@localhost bin]$

Thanks

On Tue, Apr 14, 2020 at 5:33 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Wed, Apr 8, 2020 at 6:53 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:

On Tue, Apr 7, 2020 at 9:44 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
Hi,

Thanks, Kashif and Rajkumar. I have fixed the reported issues.

I have added the shared state as previously described. The new grammar changes
are as follows:

START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
- This will generate a unique backupid using pg_strong_random(16) and hex-encoded
it. which is then returned as the result set.
- It will also create a shared state and add it to the hashtable. The hash table size is set
to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically, I think it's
sufficient initial size. max_wal_senders is not used, because it can be set to quite a
large values.

JOIN_BACKUP 'backup_id'
- finds 'backup_id' in hashtable and attaches it to server process.

SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
- renamed SEND_FILES to SEND_FILE
- removed START_WAL_LOCATION from this because 'startptr' is now accessible through
shared state.

There is no change in other commands:
STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']

The current patches (v11) have been rebased to the latest master. The backup manifest is enabled
by default, so I have disabled it for parallel backup mode and have generated a warning so that
user is aware of it and not expect it in the backup.

Hi Asif

I have verified the bug fixes, one bug is fixed and working now as expected

For the verification of the other bug fixes faced following issues, please have a look.

1) Following bug fixes mentioned below are generating segmentation fault.

Please note for reference I have added a description only as steps were given in previous emails of each bug I tried to verify the fix. Backtrace is also added with each case which points to one bug for both the cases.

a) The backup failed with errors "error: could not connect to server: could not look up local user ID 1000: Too many open files" when the max_wal_senders was set to 2000.

[edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D /home/edb/Desktop/backup/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_9925"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
….
….
pg_basebackup: backup worker (1014) created
pg_basebackup: backup worker (1015) created
pg_basebackup: backup worker (1016) created
pg_basebackup: backup worker (1017) created
pg_basebackup: error: could not connect to server: could not look up local user ID 1000: Too many open files
Segmentation fault
[edb@localhost bin]$

[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup /tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 13219]
[New LWP 13222]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 1990 -D /home/edb/Desktop/backup/'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47 if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1 0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
#2 0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
#3 0x00007f2226f76a49 in __run_exit_handlers (status=1, listp=0x7f22272f86c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#4 0x00007f2226f76a95 in __GI_exit (status=<optimized out>) at exit.c:99
#5 0x0000000000408c54 in create_parallel_workers (backupinfo=0x952ca0) at pg_basebackup.c:2811
#6 0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
#7 0x0000000000408b4d in main (argc=6, argv=0x7ffe3dabc718) at pg_basebackup.c:2765
(gdb)

b) When executing two backups at the same time, getting FATAL error due to max_wal_senders and instead of exit Backup got completed.

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 1/DA000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_17066"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
Segmentation fault (core dumped)
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup /tmp/cores/core.pg_basebackup.17041.localhost.localdomain.1586353696
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 17041]
[New LWP 17067]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47 if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1 0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
#2 0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
#3 0x00007f051edc1a49 in __run_exit_handlers (status=1, listp=0x7f051f1436c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#4 0x00007f051edc1a95 in __GI_exit (status=<optimized out>) at exit.c:99
#5 0x0000000000408c54 in create_parallel_workers (backupinfo=0x1c6dca0) at pg_basebackup.c:2811
#6 0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
#7 0x0000000000408b4d in main (argc=6, argv=0x7ffdb76a6d68) at pg_basebackup.c:2765
(gdb)

2) The following bug is not fixed yet

A similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.

[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_16235"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$

[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
base pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_xact postgresql.conf
pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION postgresql.auto.conf
[edb@localhost bin]$
[edb@localhost bin]$

Thanks
Kashif Zeeshan

On Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:

On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:
Hi Asif

When a non-existent slot is used with tablespace then correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.

Steps
=======

edb@localhost bin]$
[edb@localhost bin]$ mkdir /home/edb/tbl1
[edb@localhost bin]$ mkdir /home/edb/tbl_res
[edb@localhost bin]$
postgres=# create tablespace tbl1 location '/home/edb/tbl1';
CREATE TABLESPACE
postgres=#
postgres=# create table t1 (a int) tablespace tbl1;
CREATE TABLE
postgres=# insert into t1 values(100);
INSERT 0 1
postgres=# insert into t1 values(200);
INSERT 0 1
postgres=# insert into t1 values(300);
INSERT 0 1
postgres=#

[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 2 -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test" does not exist
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: write-ahead log end point: 0/2E000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child thread exited with error 1
[edb@localhost bin]$

backup folder not cleaned

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label global pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION pg_xact postgresql.conf
base pg_commit_ts pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_wal postgresql.auto.conf
[edb@localhost bin]$

If the same case is executed without the parallel backup patch then the backup folder is cleaned after the error is displayed.

[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test999
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test999" does not exist
pg_basebackup: write-ahead log end point: 0/2B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory " /home/edb/Desktop/backup"
pg_basebackup: changes to tablespace directories will not be undone

Hi Asif

A similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup. I think one bug fix will solve all these cases where clean up is not done when parallel backup is failed.

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57337"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$

Same case when executed on pg_basebackup without the Parallel backup patch then proper clean up is done.

[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_5590"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
[edb@localhost bin]$

Thanks

On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> Why would you need to do that? As long as the process where
>> STOP_BACKUP can do the check, that seems good enough.
>
> Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
> in progress. So if the backup is a large one, early error detection would be much beneficial.
> This is the current behavior of non-parallel backup as well.

Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.

TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.

Non-parallel backup already does the early error checking. I only intended
to make parallel behave the same as non-parallel here. So, I agree with
you that the behavior of parallel backup should be consistent with the
non-parallel one. Please see the code snippet below from
basebackup.c:sendDir()

/*
* Check if the postmaster has signaled us to exit, and abort with an
* error in that case. The error handler further up will call
* do_pg_abort_backup() for us. Also check that if the backup was
* started while still in recovery, the server wasn't promoted.
* do_pg_stop_backup() will check that too, but it's better to stop
* the backup early than continue to the end and fail there.
*/
CHECK_FOR_INTERRUPTS();
if (RecoveryInProgress() != backup_started_in_recovery)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("the standby was promoted during online backup"),
errhint("This means that the backup being taken is corrupt "
"and should not be used. "
"Try taking another online backup.")));

> Okay, then I will add the shared state. And since we are adding the shared state, we can use
> that for throttling, progress-reporting and standby early error checking.

Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.

> There are two possible options:
>
> (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> (2) (Preferred Option) Use the WAL start location as the BackupID.
>
> This BackupID should be given back as a response to start backup command. All client workers
> must append this ID to all parallel backup replication commands. So that we can use this identifier
> to search for that particular backup. Does that sound good?

Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.

I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:

START_BACKUP unique_backup_id [options]...

And then others do:

JOIN_BACKUP unique_backup_id

My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.

If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.

Okay so If we are to add another replication command ‘JOIN_BACKUP unique_backup_id’
to make workers find the relevant shared state. There won't be any need for changing
the grammar for any other command. The START_BACKUP can return the unique_backup_id
in the result set.

I am thinking of the following struct for shared state:
typedef struct
{
char backupid[NAMEDATALEN];
XLogRecPtr startptr;

slock_t lock;
int64 throttling_counter;
bool backup_started_in_recovery;
} BackupSharedState;

The shared state structure entries would be maintained by a shared hash table.
There will be one structure per parallel backup. Since a single parallel backup
can engage more than one wal sender, so I think max_wal_senders might be a little
too much; perhaps max_wal_senders/2 since there will be at least 2 connections
per parallel backup? Alternatively, we can set a new GUC that defines the maximum
number of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’
perhaps, or we can make it user-configurable.

The key would be “backupid=hex_encode(pg_random_strong(16))”

Checking for Standby Promotion:
At the START_BACKUP command, we initialize BackupSharedState.backup_started_in_recovery
and keep checking it whenever send_file () is called to send a new file.

Throttling:
BackupSharedState.throttling_counter - The throttling logic remains the same
as for non-parallel backup with the exception that multiple threads will now be
updating it. So in parallel backup, this will represent the overall bytes that
have been transferred. So the workers would sleep if they have exceeded the
limit. Hence, the shared state carries a lock to safely update the throttling
value atomically.

Progress Reporting:
Although I think we should add progress-reporting for parallel backup as a
separate patch. The relevant entries for progress-reporting such as
‘backup_total’ and ‘backup_streamed’ would be then added to this structure
as well.

Grammar:
There is a change in the resultset being returned for START_BACKUP command;
unique_backup_id is added. Additionally, JOIN_BACKUP replication command is
added. SEND_FILES has been renamed to SEND_FILE. There are no other changes
to the grammar.

START_BACKUP [LABEL '<label>'] [FAST]
- returns startptr, tli, backup_label, unique_backup_id
STOP_BACKUP [NOWAIT]
- returns startptr, tli, backup_label
JOIN_BACKUP ‘unique_backup_id’
- attaches a shared state identified by ‘unique_backup_id’ to a backend process.

LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]

Hi,

rebased and updated to the current master (8128b0c1). v13 is attached.

- Fixes the above reported issues.

- Added progress-reporting support for parallel:
For this, 'backup_streamed' is moved to a shared structure (BackupState) as
pg_atomic_uint64 variable. The worker processes will keep incrementing this
variable.

While files are being transferred from server to client. The main process remains
in an idle state. So after each increment, the worker process will signal master to
update the stats in pg_stat_progress_basebackup view.

The 'tablespace_streamed' column is not updated and will remain empty. This is
because multiple workers may be copying files from different tablespaces.

- Added backup manifest:
The backend workers maintain their own manifest file which contains a list of files
that are being transferred by the work. Once all backup files are transferred, the
workers will create a temp file as ('pg_tempdir/temp_file_prefix_backupid.workerid')
to write the content of the manifest file from BufFile. The workers won’t add the
header, nor the WAL information in their manifest. These two will be added by the
main process while merging all worker manifest files.

The main process will read these individual files and concatenate them into a single file
which is then sent back to the client.

The manifest file is created when the following command is received:
BUILD_MANIFEST 'backupid'

This is a new replication command. It is sent when pg_basebackup has copied all the
$PGDATA files including WAL files.

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Regards

====================================

Kashif Zeeshan

Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation

The Enterprise Postgres Company

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

14 April 2020, 14:36:58

On Tue, Apr 14, 2020 at 6:32 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:

Hi Asif

Getting the following error on Parallel backup when --no-manifest option is used.

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 5 -D /home/edb/Desktop/backup/ --no-manifest
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_10223"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: write-ahead log end point: 0/2000100
pg_basebackup: error: could not get data for 'BUILD_MANIFEST': ERROR: could not open file "base/pgsql_tmp/pgsql_tmp_b4ef5ac0fd150b2a28caf626bbb1bef2.1": No such file or directory
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
[edb@localhost bin]$

I forgot to make a check for no-manifest. Fixed. Attached is the updated patch.

Thanks

On Tue, Apr 14, 2020 at 5:33 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Wed, Apr 8, 2020 at 6:53 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:

On Tue, Apr 7, 2020 at 9:44 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
Hi,

Thanks, Kashif and Rajkumar. I have fixed the reported issues.

I have added the shared state as previously described. The new grammar changes
are as follows:

START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
- This will generate a unique backupid using pg_strong_random(16) and hex-encoded
it. which is then returned as the result set.
- It will also create a shared state and add it to the hashtable. The hash table size is set
to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically, I think it's
sufficient initial size. max_wal_senders is not used, because it can be set to quite a
large values.

JOIN_BACKUP 'backup_id'
- finds 'backup_id' in hashtable and attaches it to server process.

SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
- renamed SEND_FILES to SEND_FILE
- removed START_WAL_LOCATION from this because 'startptr' is now accessible through
shared state.

There is no change in other commands:
STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']

The current patches (v11) have been rebased to the latest master. The backup manifest is enabled
by default, so I have disabled it for parallel backup mode and have generated a warning so that
user is aware of it and not expect it in the backup.

Hi Asif

I have verified the bug fixes, one bug is fixed and working now as expected

For the verification of the other bug fixes faced following issues, please have a look.

1) Following bug fixes mentioned below are generating segmentation fault.

Please note for reference I have added a description only as steps were given in previous emails of each bug I tried to verify the fix. Backtrace is also added with each case which points to one bug for both the cases.

a) The backup failed with errors "error: could not connect to server: could not look up local user ID 1000: Too many open files" when the max_wal_senders was set to 2000.

[edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D /home/edb/Desktop/backup/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_9925"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
….
….
pg_basebackup: backup worker (1014) created
pg_basebackup: backup worker (1015) created
pg_basebackup: backup worker (1016) created
pg_basebackup: backup worker (1017) created
pg_basebackup: error: could not connect to server: could not look up local user ID 1000: Too many open files
Segmentation fault
[edb@localhost bin]$

[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup /tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 13219]
[New LWP 13222]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 1990 -D /home/edb/Desktop/backup/'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47 if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1 0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
#2 0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
#3 0x00007f2226f76a49 in __run_exit_handlers (status=1, listp=0x7f22272f86c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#4 0x00007f2226f76a95 in __GI_exit (status=<optimized out>) at exit.c:99
#5 0x0000000000408c54 in create_parallel_workers (backupinfo=0x952ca0) at pg_basebackup.c:2811
#6 0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
#7 0x0000000000408b4d in main (argc=6, argv=0x7ffe3dabc718) at pg_basebackup.c:2765
(gdb)

b) When executing two backups at the same time, getting FATAL error due to max_wal_senders and instead of exit Backup got completed.

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 1/DA000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_17066"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
Segmentation fault (core dumped)
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup /tmp/cores/core.pg_basebackup.17041.localhost.localdomain.1586353696
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 17041]
[New LWP 17067]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47 if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1 0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
#2 0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
#3 0x00007f051edc1a49 in __run_exit_handlers (status=1, listp=0x7f051f1436c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#4 0x00007f051edc1a95 in __GI_exit (status=<optimized out>) at exit.c:99
#5 0x0000000000408c54 in create_parallel_workers (backupinfo=0x1c6dca0) at pg_basebackup.c:2811
#6 0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
#7 0x0000000000408b4d in main (argc=6, argv=0x7ffdb76a6d68) at pg_basebackup.c:2765
(gdb)

2) The following bug is not fixed yet

A similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.

[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_16235"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$

[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
base pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_xact postgresql.conf
pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION postgresql.auto.conf
[edb@localhost bin]$
[edb@localhost bin]$

Thanks
Kashif Zeeshan

On Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:

On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:
Hi Asif

When a non-existent slot is used with tablespace then correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.

Steps
=======

edb@localhost bin]$
[edb@localhost bin]$ mkdir /home/edb/tbl1
[edb@localhost bin]$ mkdir /home/edb/tbl_res
[edb@localhost bin]$
postgres=# create tablespace tbl1 location '/home/edb/tbl1';
CREATE TABLESPACE
postgres=#
postgres=# create table t1 (a int) tablespace tbl1;
CREATE TABLE
postgres=# insert into t1 values(100);
INSERT 0 1
postgres=# insert into t1 values(200);
INSERT 0 1
postgres=# insert into t1 values(300);
INSERT 0 1
postgres=#

[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 2 -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test" does not exist
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: write-ahead log end point: 0/2E000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child thread exited with error 1
[edb@localhost bin]$

backup folder not cleaned

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label global pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION pg_xact postgresql.conf
base pg_commit_ts pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_wal postgresql.auto.conf
[edb@localhost bin]$

If the same case is executed without the parallel backup patch then the backup folder is cleaned after the error is displayed.

[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test999
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test999" does not exist
pg_basebackup: write-ahead log end point: 0/2B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory " /home/edb/Desktop/backup"
pg_basebackup: changes to tablespace directories will not be undone

Hi Asif

A similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup. I think one bug fix will solve all these cases where clean up is not done when parallel backup is failed.

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57337"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$

Same case when executed on pg_basebackup without the Parallel backup patch then proper clean up is done.

[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_5590"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
[edb@localhost bin]$

Thanks

On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> Why would you need to do that? As long as the process where
>> STOP_BACKUP can do the check, that seems good enough.
>
> Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
> in progress. So if the backup is a large one, early error detection would be much beneficial.
> This is the current behavior of non-parallel backup as well.

Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.

TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.

Non-parallel backup already does the early error checking. I only intended
to make parallel behave the same as non-parallel here. So, I agree with
you that the behavior of parallel backup should be consistent with the
non-parallel one. Please see the code snippet below from
basebackup.c:sendDir()

/*
* Check if the postmaster has signaled us to exit, and abort with an
* error in that case. The error handler further up will call
* do_pg_abort_backup() for us. Also check that if the backup was
* started while still in recovery, the server wasn't promoted.
* do_pg_stop_backup() will check that too, but it's better to stop
* the backup early than continue to the end and fail there.
*/
CHECK_FOR_INTERRUPTS();
if (RecoveryInProgress() != backup_started_in_recovery)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("the standby was promoted during online backup"),
errhint("This means that the backup being taken is corrupt "
"and should not be used. "
"Try taking another online backup.")));

> Okay, then I will add the shared state. And since we are adding the shared state, we can use
> that for throttling, progress-reporting and standby early error checking.

Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.

> There are two possible options:
>
> (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> (2) (Preferred Option) Use the WAL start location as the BackupID.
>
> This BackupID should be given back as a response to start backup command. All client workers
> must append this ID to all parallel backup replication commands. So that we can use this identifier
> to search for that particular backup. Does that sound good?

Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.

I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:

START_BACKUP unique_backup_id [options]...

And then others do:

JOIN_BACKUP unique_backup_id

My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.

If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.

Okay so If we are to add another replication command ‘JOIN_BACKUP unique_backup_id’
to make workers find the relevant shared state. There won't be any need for changing
the grammar for any other command. The START_BACKUP can return the unique_backup_id
in the result set.

I am thinking of the following struct for shared state:
typedef struct
{
char backupid[NAMEDATALEN];
XLogRecPtr startptr;

slock_t lock;
int64 throttling_counter;
bool backup_started_in_recovery;
} BackupSharedState;

The shared state structure entries would be maintained by a shared hash table.
There will be one structure per parallel backup. Since a single parallel backup
can engage more than one wal sender, so I think max_wal_senders might be a little
too much; perhaps max_wal_senders/2 since there will be at least 2 connections
per parallel backup? Alternatively, we can set a new GUC that defines the maximum
number of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’
perhaps, or we can make it user-configurable.

The key would be “backupid=hex_encode(pg_random_strong(16))”

Checking for Standby Promotion:
At the START_BACKUP command, we initialize BackupSharedState.backup_started_in_recovery
and keep checking it whenever send_file () is called to send a new file.

Throttling:
BackupSharedState.throttling_counter - The throttling logic remains the same
as for non-parallel backup with the exception that multiple threads will now be
updating it. So in parallel backup, this will represent the overall bytes that
have been transferred. So the workers would sleep if they have exceeded the
limit. Hence, the shared state carries a lock to safely update the throttling
value atomically.

Progress Reporting:
Although I think we should add progress-reporting for parallel backup as a
separate patch. The relevant entries for progress-reporting such as
‘backup_total’ and ‘backup_streamed’ would be then added to this structure
as well.

Grammar:
There is a change in the resultset being returned for START_BACKUP command;
unique_backup_id is added. Additionally, JOIN_BACKUP replication command is
added. SEND_FILES has been renamed to SEND_FILE. There are no other changes
to the grammar.

START_BACKUP [LABEL '<label>'] [FAST]
- returns startptr, tli, backup_label, unique_backup_id
STOP_BACKUP [NOWAIT]
- returns startptr, tli, backup_label
JOIN_BACKUP ‘unique_backup_id’
- attaches a shared state identified by ‘unique_backup_id’ to a backend process.

LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]

Hi,

rebased and updated to the current master (8128b0c1). v13 is attached.

- Fixes the above reported issues.

- Added progress-reporting support for parallel:
For this, 'backup_streamed' is moved to a shared structure (BackupState) as
pg_atomic_uint64 variable. The worker processes will keep incrementing this
variable.

While files are being transferred from server to client. The main process remains
in an idle state. So after each increment, the worker process will signal master to
update the stats in pg_stat_progress_basebackup view.

The 'tablespace_streamed' column is not updated and will remain empty. This is
because multiple workers may be copying files from different tablespaces.

- Added backup manifest:
The backend workers maintain their own manifest file which contains a list of files
that are being transferred by the work. Once all backup files are transferred, the
workers will create a temp file as ('pg_tempdir/temp_file_prefix_backupid.workerid')
to write the content of the manifest file from BufFile. The workers won’t add the
header, nor the WAL information in their manifest. These two will be added by the
main process while merging all worker manifest files.

The main process will read these individual files and concatenate them into a single file
which is then sent back to the client.

The manifest file is created when the following command is received:
BUILD_MANIFEST 'backupid'

This is a new replication command. It is sent when pg_basebackup has copied all the
$PGDATA files including WAL files.

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

--
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation
The Enterprise Postgres Company

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Attachment

parallel_backup_v14.zip

Re: WIP/PoC for parallel backup

From

Robert Haas

Date:

14 April 2020, 20:49:04

On Tue, Apr 14, 2020 at 10:37 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> I forgot to make a check for no-manifest. Fixed. Attached is the updated patch.

+typedef struct
+{
...
+} BackupFile;
+
+typedef struct
+{
...
+} BackupState;

These structures need comments.

+list_wal_files_opt_list:
+                       SCONST SCONST
                                {
-                                 $$ = makeDefElem("manifest_checksums",
-
(Node *)makeString($2), -1);
+                                       $$ = list_make2(
+                                       makeDefElem("start_wal_location",
+                                               (Node *)makeString($2), -1),
+                                       makeDefElem("end_wal_location",
+                                               (Node *)makeString($2), -1));
+
                                }

This seems like an unnecessarily complicated parse representation. The
DefElems seem to be completely unnecessary here.

@@ -998,7 +1110,37 @@ SendBaseBackup(BaseBackupCmd *cmd)
                set_ps_display(activitymsg);
        }

-       perform_base_backup(&opt);
+       switch (cmd->cmdtag)

So the design here is that SendBaseBackup() is now going to do a bunch
of things that are NOT sending a base backup? With no updates to the
comments of that function and no change to the process title it sets?

-       return (manifest->buffile != NULL);
+       return (manifest && manifest->buffile != NULL);

Heck no. It appears that you didn't even bother reading the function
header comment.

+ * Send a single resultset containing XLogRecPtr record (in text format)
+ * TimelineID and backup label.
  */
 static void
-SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
+SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli,
+                                        StringInfo label, char *backupid)

This just casually breaks wire protocol compatibility, which seems
completely unacceptable.

+       if (strlen(opt->tablespace) > 0)
+               sendTablespace(opt->tablespace, NULL, true, NULL, &files);
+       else
+               sendDir(".", 1, true, NIL, true, NULL, NULL, &files);
+
+       SendFilesHeader(files);

So I guess the idea here is that we buffer the entire list of files in
memory, regardless of size, and then we send it out afterwards. That
doesn't seem like a good idea. The list of files might be very large.
We probably need some code refactoring here rather than just piling
more and more different responsibilities onto sendTablespace() and
sendDir().

+       if (state->parallel_mode)
+               SpinLockAcquire(&state->lock);
+
+       state->throttling_counter += increment;
+
+       if (state->parallel_mode)
+               SpinLockRelease(&state->lock);

I don't like this much. It seems to me that we would do better to use
atomics here all the time, instead of conditional spinlocks.

+static void
+send_file(basebackup_options *opt, char *file, bool missing_ok)
...
+       if (file == NULL)
+               return;

That seems totally inappropriate.

+                       sendFile(file, file + basepathlen, &statbuf,
true, InvalidOid, NULL, NULL);

Maybe I'm misunderstanding, but this looks like it's going to write a
tar header, even though we're not writing a tarfile.

+               else
+                       ereport(WARNING,
+                                       (errmsg("skipping special file
or directory \"%s\"", file)));

So, if the user asks for a directory or symlink, what's going to
happen is that they're going to receive an empty file, and get a
warning. That sounds like terrible behavior.

+       /*
+        * Check for checksum failures. If there are failures across multiple
+        * processes it may not report total checksum count, but it will error
+        * out,terminating the backup.
+        */

In other words, the patch breaks the feature. Not that the feature in
question works particularly well as things stand, but this makes it
worse.

I think this patch (0003) is in really bad shape. I'm having second
thoughts about the design, but it's kind of hard to even have a
discussion about the design when the patch is riddled with minor
problems like inadequate comments, failure to update existing
comments, and breaking a bunch of things. I understand that sometimes
things get missed, but this is version 14 of a patch that's been
kicking around since last August.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

From

Ahsan Hadi

Date:

15 April 2020, 08:49:39

On Wed, 15 Apr 2020 at 1:49 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Tue, Apr 14, 2020 at 10:37 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> I forgot to make a check for no-manifest. Fixed. Attached is the updated patch.

+typedef struct
+{
...
+} BackupFile;
+
+typedef struct
+{
...
+} BackupState;

These structures need comments.

+list_wal_files_opt_list:
+ SCONST SCONST
{
- $$ = makeDefElem("manifest_checksums",
-
(Node *)makeString($2), -1);
+ $$ = list_make2(
+ makeDefElem("start_wal_location",
+ (Node *)makeString($2), -1),
+ makeDefElem("end_wal_location",
+ (Node *)makeString($2), -1));
+
}

This seems like an unnecessarily complicated parse representation. The
DefElems seem to be completely unnecessary here.

@@ -998,7 +1110,37 @@ SendBaseBackup(BaseBackupCmd *cmd)
set_ps_display(activitymsg);
}

- perform_base_backup(&opt);
+ switch (cmd->cmdtag)

So the design here is that SendBaseBackup() is now going to do a bunch
of things that are NOT sending a base backup? With no updates to the
comments of that function and no change to the process title it sets?

- return (manifest->buffile != NULL);
+ return (manifest && manifest->buffile != NULL);

Heck no. It appears that you didn't even bother reading the function
header comment.

+ * Send a single resultset containing XLogRecPtr record (in text format)
+ * TimelineID and backup label.
*/
static void
-SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
+SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli,
+ StringInfo label, char *backupid)

This just casually breaks wire protocol compatibility, which seems
completely unacceptable.

+ if (strlen(opt->tablespace) > 0)
+ sendTablespace(opt->tablespace, NULL, true, NULL, &files);
+ else
+ sendDir(".", 1, true, NIL, true, NULL, NULL, &files);
+
+ SendFilesHeader(files);

So I guess the idea here is that we buffer the entire list of files in
memory, regardless of size, and then we send it out afterwards. That
doesn't seem like a good idea. The list of files might be very large.
We probably need some code refactoring here rather than just piling
more and more different responsibilities onto sendTablespace() and
sendDir().

+ if (state->parallel_mode)
+ SpinLockAcquire(&state->lock);
+
+ state->throttling_counter += increment;
+
+ if (state->parallel_mode)
+ SpinLockRelease(&state->lock);

I don't like this much. It seems to me that we would do better to use
atomics here all the time, instead of conditional spinlocks.

+static void
+send_file(basebackup_options *opt, char *file, bool missing_ok)
...
+ if (file == NULL)
+ return;

That seems totally inappropriate.

+ sendFile(file, file + basepathlen, &statbuf,
true, InvalidOid, NULL, NULL);

Maybe I'm misunderstanding, but this looks like it's going to write a
tar header, even though we're not writing a tarfile.

+ else
+ ereport(WARNING,
+ (errmsg("skipping special file
or directory \"%s\"", file)));

So, if the user asks for a directory or symlink, what's going to
happen is that they're going to receive an empty file, and get a
warning. That sounds like terrible behavior.

+ /*
+ * Check for checksum failures. If there are failures across multiple
+ * processes it may not report total checksum count, but it will error
+ * out,terminating the backup.
+ */

In other words, the patch breaks the feature. Not that the feature in
question works particularly well as things stand, but this makes it
worse.

I think this patch (0003) is in really bad shape. I'm having second
thoughts about the design, but it's kind of hard to even have a
discussion about the design when the patch is riddled with minor
problems like inadequate comments, failure to update existing
comments, and breaking a bunch of things. I understand that sometimes
things get missed, but this is version 14 of a patch that's been
kicking around since last August.

Fair enough. Some of this is also due to backup related features i.e backup manifest, progress reporting that got committed to master towards the tail end of PG-13. Rushing to get parallel backup feature compatible with these features also caused some of the oversights.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.ca

Re: WIP/PoC for parallel backup

From

Rajkumar Raghuwanshi

Date:

15 April 2020, 09:28:37

Hi Asif,

In below scenarios backup verification failed for tablespace, when backup taken with parallel option.

without parallel for the same scenario pg_verifybackup is passed without any error.

[edb@localhost bin]$ mkdir /tmp/test_bkp/tblsp1
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp1 location '/tmp/test_bkp/tblsp1';"
CREATE TABLESPACE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a text) tablespace tblsp1;"
CREATE TABLE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values ('parallel_backup with -T tablespace option');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/test_bkp/bkp -T /tmp/test_bkp/tblsp1=/tmp/test_bkp/tblsp2 -j 4
[edb@localhost bin]$ ./pg_verifybackup /tmp/test_bkp/bkp
pg_verifybackup: error: "pg_tblspc/16384/PG_13_202004074/13530/16390" is present on disk but not in the manifest
pg_verifybackup: error: "pg_tblspc/16384/PG_13_202004074/13530/16388" is present on disk but not in the manifest
pg_verifybackup: error: "pg_tblspc/16384/PG_13_202004074/13530/16385" is present on disk but not in the manifest
pg_verifybackup: error: "/PG_13_202004074/13530/16388" is present in the manifest but not on disk
pg_verifybackup: error: "/PG_13_202004074/13530/16390" is present in the manifest but not on disk
pg_verifybackup: error: "/PG_13_202004074/13530/16385" is present in the manifest but not on disk

--without parallel backup

[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/test_bkp/bkp1 -T /tmp/test_bkp/tblsp1=/tmp/test_bkp/tblsp3 -j 1
[edb@localhost bin]$ ./pg_verifybackup /tmp/test_bkp/bkp1
backup successfully verified

Thanks & Regards,

Rajkumar Raghuwanshi

On Wed, Apr 15, 2020 at 2:19 PM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:

On Wed, 15 Apr 2020 at 1:49 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Apr 14, 2020 at 10:37 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> I forgot to make a check for no-manifest. Fixed. Attached is the updated patch.

+typedef struct
+{
...
+} BackupFile;
+
+typedef struct
+{
...
+} BackupState;

These structures need comments.

+list_wal_files_opt_list:
+ SCONST SCONST
{
- $$ = makeDefElem("manifest_checksums",
-
(Node *)makeString($2), -1);
+ $$ = list_make2(
+ makeDefElem("start_wal_location",
+ (Node *)makeString($2), -1),
+ makeDefElem("end_wal_location",
+ (Node *)makeString($2), -1));
+
}

This seems like an unnecessarily complicated parse representation. The
DefElems seem to be completely unnecessary here.

@@ -998,7 +1110,37 @@ SendBaseBackup(BaseBackupCmd *cmd)
set_ps_display(activitymsg);
}

- perform_base_backup(&opt);
+ switch (cmd->cmdtag)

So the design here is that SendBaseBackup() is now going to do a bunch
of things that are NOT sending a base backup? With no updates to the
comments of that function and no change to the process title it sets?

- return (manifest->buffile != NULL);
+ return (manifest && manifest->buffile != NULL);

Heck no. It appears that you didn't even bother reading the function
header comment.

+ * Send a single resultset containing XLogRecPtr record (in text format)
+ * TimelineID and backup label.
*/
static void
-SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
+SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli,
+ StringInfo label, char *backupid)

This just casually breaks wire protocol compatibility, which seems
completely unacceptable.

+ if (strlen(opt->tablespace) > 0)
+ sendTablespace(opt->tablespace, NULL, true, NULL, &files);
+ else
+ sendDir(".", 1, true, NIL, true, NULL, NULL, &files);
+
+ SendFilesHeader(files);

So I guess the idea here is that we buffer the entire list of files in
memory, regardless of size, and then we send it out afterwards. That
doesn't seem like a good idea. The list of files might be very large.
We probably need some code refactoring here rather than just piling
more and more different responsibilities onto sendTablespace() and
sendDir().

+ if (state->parallel_mode)
+ SpinLockAcquire(&state->lock);
+
+ state->throttling_counter += increment;
+
+ if (state->parallel_mode)
+ SpinLockRelease(&state->lock);

I don't like this much. It seems to me that we would do better to use
atomics here all the time, instead of conditional spinlocks.

+static void
+send_file(basebackup_options *opt, char *file, bool missing_ok)
...
+ if (file == NULL)
+ return;

That seems totally inappropriate.

+ sendFile(file, file + basepathlen, &statbuf,
true, InvalidOid, NULL, NULL);

Maybe I'm misunderstanding, but this looks like it's going to write a
tar header, even though we're not writing a tarfile.

+ else
+ ereport(WARNING,
+ (errmsg("skipping special file
or directory \"%s\"", file)));

So, if the user asks for a directory or symlink, what's going to
happen is that they're going to receive an empty file, and get a
warning. That sounds like terrible behavior.

+ /*
+ * Check for checksum failures. If there are failures across multiple
+ * processes it may not report total checksum count, but it will error
+ * out,terminating the backup.
+ */

In other words, the patch breaks the feature. Not that the feature in
question works particularly well as things stand, but this makes it
worse.

I think this patch (0003) is in really bad shape. I'm having second
thoughts about the design, but it's kind of hard to even have a
discussion about the design when the patch is riddled with minor
problems like inadequate comments, failure to update existing
comments, and breaking a bunch of things. I understand that sometimes
things get missed, but this is version 14 of a patch that's been
kicking around since last August.

Fair enough. Some of this is also due to backup related features i.e backup manifest, progress reporting that got committed to master towards the tail end of PG-13. Rushing to get parallel backup feature compatible with these features also caused some of the oversights.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.ca

Re: WIP/PoC for parallel backup

From

Robert Haas

Date:

15 April 2020, 13:31:22

On Wed, Apr 15, 2020 at 4:49 AM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:
> Fair enough. Some of this is also due to backup related features i.e backup manifest, progress reporting that got
committedto master towards the tail end of PG-13. Rushing to get parallel backup feature compatible with these features
alsocaused some of the oversights. 

Sure, but there's also no point in rushing out a feature that's in a
state where it's got no chance of being acceptable, and quite a number
of these problems are not new, either.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

From

Kashif Zeeshan

Date:

17 April 2020, 06:00:55

On Tue, Apr 14, 2020 at 7:37 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Tue, Apr 14, 2020 at 6:32 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:
Hi Asif

Getting the following error on Parallel backup when --no-manifest option is used.

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 5 -D /home/edb/Desktop/backup/ --no-manifest
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_10223"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: write-ahead log end point: 0/2000100
pg_basebackup: error: could not get data for 'BUILD_MANIFEST': ERROR: could not open file "base/pgsql_tmp/pgsql_tmp_b4ef5ac0fd150b2a28caf626bbb1bef2.1": No such file or directory
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
[edb@localhost bin]$

I forgot to make a check for no-manifest. Fixed. Attached is the updated patch.

Hi Asif

Verified the fix, thanks.

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 5 -D
/home/edb/Desktop/backup --no-manifest
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/4000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_27407"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: write-ahead log end point: 0/4000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: base backup completed
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label pg_commit_ts pg_ident.conf pg_notify pg_snapshots
pg_subtrans PG_VERSION postgresql.auto.conf
base pg_dynshmem pg_logical pg_replslot pg_stat
pg_tblspc pg_wal postgresql.conf
global pg_hba.conf pg_multixact pg_serial pg_stat_tmp
pg_twophase pg_xact
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$

Regards

Kashif Zeeshan

Thanks

On Tue, Apr 14, 2020 at 5:33 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Wed, Apr 8, 2020 at 6:53 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:

On Tue, Apr 7, 2020 at 9:44 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
Hi,

Thanks, Kashif and Rajkumar. I have fixed the reported issues.

I have added the shared state as previously described. The new grammar changes
are as follows:

START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
- This will generate a unique backupid using pg_strong_random(16) and hex-encoded
it. which is then returned as the result set.
- It will also create a shared state and add it to the hashtable. The hash table size is set
to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically, I think it's
sufficient initial size. max_wal_senders is not used, because it can be set to quite a
large values.

JOIN_BACKUP 'backup_id'
- finds 'backup_id' in hashtable and attaches it to server process.

SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
- renamed SEND_FILES to SEND_FILE
- removed START_WAL_LOCATION from this because 'startptr' is now accessible through
shared state.

There is no change in other commands:
STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']

The current patches (v11) have been rebased to the latest master. The backup manifest is enabled
by default, so I have disabled it for parallel backup mode and have generated a warning so that
user is aware of it and not expect it in the backup.

Hi Asif

I have verified the bug fixes, one bug is fixed and working now as expected

For the verification of the other bug fixes faced following issues, please have a look.

1) Following bug fixes mentioned below are generating segmentation fault.

Please note for reference I have added a description only as steps were given in previous emails of each bug I tried to verify the fix. Backtrace is also added with each case which points to one bug for both the cases.

a) The backup failed with errors "error: could not connect to server: could not look up local user ID 1000: Too many open files" when the max_wal_senders was set to 2000.

[edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D /home/edb/Desktop/backup/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_9925"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
….
….
pg_basebackup: backup worker (1014) created
pg_basebackup: backup worker (1015) created
pg_basebackup: backup worker (1016) created
pg_basebackup: backup worker (1017) created
pg_basebackup: error: could not connect to server: could not look up local user ID 1000: Too many open files
Segmentation fault
[edb@localhost bin]$

[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup /tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 13219]
[New LWP 13222]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 1990 -D /home/edb/Desktop/backup/'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47 if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1 0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
#2 0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
#3 0x00007f2226f76a49 in __run_exit_handlers (status=1, listp=0x7f22272f86c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#4 0x00007f2226f76a95 in __GI_exit (status=<optimized out>) at exit.c:99
#5 0x0000000000408c54 in create_parallel_workers (backupinfo=0x952ca0) at pg_basebackup.c:2811
#6 0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
#7 0x0000000000408b4d in main (argc=6, argv=0x7ffe3dabc718) at pg_basebackup.c:2765
(gdb)

b) When executing two backups at the same time, getting FATAL error due to max_wal_senders and instead of exit Backup got completed.

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 1/DA000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_17066"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
Segmentation fault (core dumped)
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup /tmp/cores/core.pg_basebackup.17041.localhost.localdomain.1586353696
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 17041]
[New LWP 17067]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47 if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1 0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
#2 0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
#3 0x00007f051edc1a49 in __run_exit_handlers (status=1, listp=0x7f051f1436c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#4 0x00007f051edc1a95 in __GI_exit (status=<optimized out>) at exit.c:99
#5 0x0000000000408c54 in create_parallel_workers (backupinfo=0x1c6dca0) at pg_basebackup.c:2811
#6 0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
#7 0x0000000000408b4d in main (argc=6, argv=0x7ffdb76a6d68) at pg_basebackup.c:2765
(gdb)

2) The following bug is not fixed yet

A similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.

[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_16235"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$

[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
base pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_xact postgresql.conf
pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION postgresql.auto.conf
[edb@localhost bin]$
[edb@localhost bin]$

Thanks
Kashif Zeeshan

On Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:

On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:
Hi Asif

When a non-existent slot is used with tablespace then correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.

Steps
=======

edb@localhost bin]$
[edb@localhost bin]$ mkdir /home/edb/tbl1
[edb@localhost bin]$ mkdir /home/edb/tbl_res
[edb@localhost bin]$
postgres=# create tablespace tbl1 location '/home/edb/tbl1';
CREATE TABLESPACE
postgres=#
postgres=# create table t1 (a int) tablespace tbl1;
CREATE TABLE
postgres=# insert into t1 values(100);
INSERT 0 1
postgres=# insert into t1 values(200);
INSERT 0 1
postgres=# insert into t1 values(300);
INSERT 0 1
postgres=#

[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 2 -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test" does not exist
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: write-ahead log end point: 0/2E000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child thread exited with error 1
[edb@localhost bin]$

backup folder not cleaned

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label global pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION pg_xact postgresql.conf
base pg_commit_ts pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_wal postgresql.auto.conf
[edb@localhost bin]$

If the same case is executed without the parallel backup patch then the backup folder is cleaned after the error is displayed.

[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test999
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test999" does not exist
pg_basebackup: write-ahead log end point: 0/2B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory " /home/edb/Desktop/backup"
pg_basebackup: changes to tablespace directories will not be undone

Hi Asif

A similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup. I think one bug fix will solve all these cases where clean up is not done when parallel backup is failed.

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57337"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$

Same case when executed on pg_basebackup without the Parallel backup patch then proper clean up is done.

[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_5590"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
[edb@localhost bin]$

Thanks

On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> Why would you need to do that? As long as the process where
>> STOP_BACKUP can do the check, that seems good enough.
>
> Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
> in progress. So if the backup is a large one, early error detection would be much beneficial.
> This is the current behavior of non-parallel backup as well.

Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.

TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.

Non-parallel backup already does the early error checking. I only intended
to make parallel behave the same as non-parallel here. So, I agree with
you that the behavior of parallel backup should be consistent with the
non-parallel one. Please see the code snippet below from
basebackup.c:sendDir()

/*
* Check if the postmaster has signaled us to exit, and abort with an
* error in that case. The error handler further up will call
* do_pg_abort_backup() for us. Also check that if the backup was
* started while still in recovery, the server wasn't promoted.
* do_pg_stop_backup() will check that too, but it's better to stop
* the backup early than continue to the end and fail there.
*/
CHECK_FOR_INTERRUPTS();
if (RecoveryInProgress() != backup_started_in_recovery)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("the standby was promoted during online backup"),
errhint("This means that the backup being taken is corrupt "
"and should not be used. "
"Try taking another online backup.")));

> Okay, then I will add the shared state. And since we are adding the shared state, we can use
> that for throttling, progress-reporting and standby early error checking.

Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.

> There are two possible options:
>
> (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> (2) (Preferred Option) Use the WAL start location as the BackupID.
>
> This BackupID should be given back as a response to start backup command. All client workers
> must append this ID to all parallel backup replication commands. So that we can use this identifier
> to search for that particular backup. Does that sound good?

Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.

I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:

START_BACKUP unique_backup_id [options]...

And then others do:

JOIN_BACKUP unique_backup_id

My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.

If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.

Okay so If we are to add another replication command ‘JOIN_BACKUP unique_backup_id’
to make workers find the relevant shared state. There won't be any need for changing
the grammar for any other command. The START_BACKUP can return the unique_backup_id
in the result set.

I am thinking of the following struct for shared state:
typedef struct
{
char backupid[NAMEDATALEN];
XLogRecPtr startptr;

slock_t lock;
int64 throttling_counter;
bool backup_started_in_recovery;
} BackupSharedState;

The shared state structure entries would be maintained by a shared hash table.
There will be one structure per parallel backup. Since a single parallel backup
can engage more than one wal sender, so I think max_wal_senders might be a little
too much; perhaps max_wal_senders/2 since there will be at least 2 connections
per parallel backup? Alternatively, we can set a new GUC that defines the maximum
number of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’
perhaps, or we can make it user-configurable.

The key would be “backupid=hex_encode(pg_random_strong(16))”

Checking for Standby Promotion:
At the START_BACKUP command, we initialize BackupSharedState.backup_started_in_recovery
and keep checking it whenever send_file () is called to send a new file.

Throttling:
BackupSharedState.throttling_counter - The throttling logic remains the same
as for non-parallel backup with the exception that multiple threads will now be
updating it. So in parallel backup, this will represent the overall bytes that
have been transferred. So the workers would sleep if they have exceeded the
limit. Hence, the shared state carries a lock to safely update the throttling
value atomically.

Progress Reporting:
Although I think we should add progress-reporting for parallel backup as a
separate patch. The relevant entries for progress-reporting such as
‘backup_total’ and ‘backup_streamed’ would be then added to this structure
as well.

Grammar:
There is a change in the resultset being returned for START_BACKUP command;
unique_backup_id is added. Additionally, JOIN_BACKUP replication command is
added. SEND_FILES has been renamed to SEND_FILE. There are no other changes
to the grammar.

START_BACKUP [LABEL '<label>'] [FAST]
- returns startptr, tli, backup_label, unique_backup_id
STOP_BACKUP [NOWAIT]
- returns startptr, tli, backup_label
JOIN_BACKUP ‘unique_backup_id’
- attaches a shared state identified by ‘unique_backup_id’ to a backend process.

LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]

Hi,

rebased and updated to the current master (8128b0c1). v13 is attached.

- Fixes the above reported issues.

- Added progress-reporting support for parallel:
For this, 'backup_streamed' is moved to a shared structure (BackupState) as
pg_atomic_uint64 variable. The worker processes will keep incrementing this
variable.

While files are being transferred from server to client. The main process remains
in an idle state. So after each increment, the worker process will signal master to
update the stats in pg_stat_progress_basebackup view.

The 'tablespace_streamed' column is not updated and will remain empty. This is
because multiple workers may be copying files from different tablespaces.

- Added backup manifest:
The backend workers maintain their own manifest file which contains a list of files
that are being transferred by the work. Once all backup files are transferred, the
workers will create a temp file as ('pg_tempdir/temp_file_prefix_backupid.workerid')
to write the content of the manifest file from BufFile. The workers won’t add the
header, nor the WAL information in their manifest. These two will be added by the
main process while merging all worker manifest files.

The main process will read these individual files and concatenate them into a single file
which is then sent back to the client.

The manifest file is created when the following command is received:
BUILD_MANIFEST 'backupid'

This is a new replication command. It is sent when pg_basebackup has copied all the
$PGDATA files including WAL files.

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

--
Regards
====================================
Kashif Zeeshan
Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation
The Enterprise Postgres Company

--
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Regards

====================================

Kashif Zeeshan

Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation

The Enterprise Postgres Company

Re: WIP/PoC for parallel backup

From

Kashif Zeeshan

Date:

17 April 2020, 07:08:14

On Tue, Apr 14, 2020 at 5:33 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Wed, Apr 8, 2020 at 6:53 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:

On Tue, Apr 7, 2020 at 9:44 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
Hi,

Thanks, Kashif and Rajkumar. I have fixed the reported issues.

I have added the shared state as previously described. The new grammar changes
are as follows:

START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]
- This will generate a unique backupid using pg_strong_random(16) and hex-encoded
it. which is then returned as the result set.
- It will also create a shared state and add it to the hashtable. The hash table size is set
to BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically, I think it's
sufficient initial size. max_wal_senders is not used, because it can be set to quite a
large values.

JOIN_BACKUP 'backup_id'
- finds 'backup_id' in hashtable and attaches it to server process.

SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]
- renamed SEND_FILES to SEND_FILE
- removed START_WAL_LOCATION from this because 'startptr' is now accessible through
shared state.

There is no change in other commands:
STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']

The current patches (v11) have been rebased to the latest master. The backup manifest is enabled
by default, so I have disabled it for parallel backup mode and have generated a warning so that
user is aware of it and not expect it in the backup.

Hi Asif

I have verified the bug fixes, one bug is fixed and working now as expected

For the verification of the other bug fixes faced following issues, please have a look.

1) Following bug fixes mentioned below are generating segmentation fault.

Please note for reference I have added a description only as steps were given in previous emails of each bug I tried to verify the fix. Backtrace is also added with each case which points to one bug for both the cases.

a) The backup failed with errors "error: could not connect to server: could not look up local user ID 1000: Too many open files" when the max_wal_senders was set to 2000.

[edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D /home/edb/Desktop/backup/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_9925"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
….
….
pg_basebackup: backup worker (1014) created
pg_basebackup: backup worker (1015) created
pg_basebackup: backup worker (1016) created
pg_basebackup: backup worker (1017) created
pg_basebackup: error: could not connect to server: could not look up local user ID 1000: Too many open files
Segmentation fault
[edb@localhost bin]$

[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup /tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 13219]
[New LWP 13222]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 1990 -D /home/edb/Desktop/backup/'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47 if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1 0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
#2 0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
#3 0x00007f2226f76a49 in __run_exit_handlers (status=1, listp=0x7f22272f86c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#4 0x00007f2226f76a95 in __GI_exit (status=<optimized out>) at exit.c:99
#5 0x0000000000408c54 in create_parallel_workers (backupinfo=0x952ca0) at pg_basebackup.c:2811
#6 0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
#7 0x0000000000408b4d in main (argc=6, argv=0x7ffe3dabc718) at pg_basebackup.c:2765
(gdb)

b) When executing two backups at the same time, getting FATAL error due to max_wal_senders and instead of exit Backup got completed.

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 1/DA000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_17066"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
Segmentation fault (core dumped)
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup /tmp/cores/core.pg_basebackup.17041.localhost.localdomain.1586353696
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 17041]
[New LWP 17067]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47 if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1 0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
#2 0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
#3 0x00007f051edc1a49 in __run_exit_handlers (status=1, listp=0x7f051f1436c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#4 0x00007f051edc1a95 in __GI_exit (status=<optimized out>) at exit.c:99
#5 0x0000000000408c54 in create_parallel_workers (backupinfo=0x1c6dca0) at pg_basebackup.c:2811
#6 0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
#7 0x0000000000408b4d in main (argc=6, argv=0x7ffdb76a6d68) at pg_basebackup.c:2765
(gdb)

2) The following bug is not fixed yet

A similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.

[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_16235"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$

[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
base pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_xact postgresql.conf
pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION postgresql.auto.conf
[edb@localhost bin]$
[edb@localhost bin]$

Thanks
Kashif Zeeshan

On Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:

On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:
Hi Asif

When a non-existent slot is used with tablespace then correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.

Steps
=======

edb@localhost bin]$
[edb@localhost bin]$ mkdir /home/edb/tbl1
[edb@localhost bin]$ mkdir /home/edb/tbl_res
[edb@localhost bin]$
postgres=# create tablespace tbl1 location '/home/edb/tbl1';
CREATE TABLESPACE
postgres=#
postgres=# create table t1 (a int) tablespace tbl1;
CREATE TABLE
postgres=# insert into t1 values(100);
INSERT 0 1
postgres=# insert into t1 values(200);
INSERT 0 1
postgres=# insert into t1 values(300);
INSERT 0 1
postgres=#

[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 2 -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test" does not exist
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: write-ahead log end point: 0/2E000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child thread exited with error 1
[edb@localhost bin]$

backup folder not cleaned

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label global pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION pg_xact postgresql.conf
base pg_commit_ts pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_wal postgresql.auto.conf
[edb@localhost bin]$

If the same case is executed without the parallel backup patch then the backup folder is cleaned after the error is displayed.

[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test999
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test999" does not exist
pg_basebackup: write-ahead log end point: 0/2B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory " /home/edb/Desktop/backup"
pg_basebackup: changes to tablespace directories will not be undone

Hi Asif

A similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup. I think one bug fix will solve all these cases where clean up is not done when parallel backup is failed.

[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57337"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$

Same case when executed on pg_basebackup without the Parallel backup patch then proper clean up is done.

[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_5590"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
[edb@localhost bin]$

Thanks

On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> Why would you need to do that? As long as the process where
>> STOP_BACKUP can do the check, that seems good enough.
>
> Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
> in progress. So if the backup is a large one, early error detection would be much beneficial.
> This is the current behavior of non-parallel backup as well.

Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.

TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.

Non-parallel backup already does the early error checking. I only intended
to make parallel behave the same as non-parallel here. So, I agree with
you that the behavior of parallel backup should be consistent with the
non-parallel one. Please see the code snippet below from
basebackup.c:sendDir()

/*
* Check if the postmaster has signaled us to exit, and abort with an
* error in that case. The error handler further up will call
* do_pg_abort_backup() for us. Also check that if the backup was
* started while still in recovery, the server wasn't promoted.
* do_pg_stop_backup() will check that too, but it's better to stop
* the backup early than continue to the end and fail there.
*/
CHECK_FOR_INTERRUPTS();
if (RecoveryInProgress() != backup_started_in_recovery)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("the standby was promoted during online backup"),
errhint("This means that the backup being taken is corrupt "
"and should not be used. "
"Try taking another online backup.")));

> Okay, then I will add the shared state. And since we are adding the shared state, we can use
> that for throttling, progress-reporting and standby early error checking.

Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.

> There are two possible options:
>
> (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> (2) (Preferred Option) Use the WAL start location as the BackupID.
>
> This BackupID should be given back as a response to start backup command. All client workers
> must append this ID to all parallel backup replication commands. So that we can use this identifier
> to search for that particular backup. Does that sound good?

Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.

I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:

START_BACKUP unique_backup_id [options]...

And then others do:

JOIN_BACKUP unique_backup_id

My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.

If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.

Okay so If we are to add another replication command ‘JOIN_BACKUP unique_backup_id’
to make workers find the relevant shared state. There won't be any need for changing
the grammar for any other command. The START_BACKUP can return the unique_backup_id
in the result set.

I am thinking of the following struct for shared state:
typedef struct
{
char backupid[NAMEDATALEN];
XLogRecPtr startptr;

slock_t lock;
int64 throttling_counter;
bool backup_started_in_recovery;
} BackupSharedState;

The shared state structure entries would be maintained by a shared hash table.
There will be one structure per parallel backup. Since a single parallel backup
can engage more than one wal sender, so I think max_wal_senders might be a little
too much; perhaps max_wal_senders/2 since there will be at least 2 connections
per parallel backup? Alternatively, we can set a new GUC that defines the maximum
number of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’
perhaps, or we can make it user-configurable.

The key would be “backupid=hex_encode(pg_random_strong(16))”

Checking for Standby Promotion:
At the START_BACKUP command, we initialize BackupSharedState.backup_started_in_recovery
and keep checking it whenever send_file () is called to send a new file.

Throttling:
BackupSharedState.throttling_counter - The throttling logic remains the same
as for non-parallel backup with the exception that multiple threads will now be
updating it. So in parallel backup, this will represent the overall bytes that
have been transferred. So the workers would sleep if they have exceeded the
limit. Hence, the shared state carries a lock to safely update the throttling
value atomically.

Progress Reporting:
Although I think we should add progress-reporting for parallel backup as a
separate patch. The relevant entries for progress-reporting such as
‘backup_total’ and ‘backup_streamed’ would be then added to this structure
as well.

Grammar:
There is a change in the resultset being returned for START_BACKUP command;
unique_backup_id is added. Additionally, JOIN_BACKUP replication command is
added. SEND_FILES has been renamed to SEND_FILE. There are no other changes
to the grammar.

START_BACKUP [LABEL '<label>'] [FAST]
- returns startptr, tli, backup_label, unique_backup_id
STOP_BACKUP [NOWAIT]
- returns startptr, tli, backup_label
JOIN_BACKUP ‘unique_backup_id’
- attaches a shared state identified by ‘unique_backup_id’ to a backend process.

LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]

Hi,

rebased and updated to the current master (8128b0c1). v13 is attached.

- Fixes the above reported issues.

Hi Asif

I have verified the bug fixes, out of 3 bugs 2 are now fixed but the following issue is still not fixed.

A similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.

[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_16235"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$

[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
base pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_xact postgresql.conf
pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION postgresql.auto.conf
[edb@localhost bin]$
[edb@localhost bin]$

Thanks

Kashif zeeshan

- Added progress-reporting support for parallel:
For this, 'backup_streamed' is moved to a shared structure (BackupState) as
pg_atomic_uint64 variable. The worker processes will keep incrementing this
variable.

While files are being transferred from server to client. The main process remains
in an idle state. So after each increment, the worker process will signal master to
update the stats in pg_stat_progress_basebackup view.

The 'tablespace_streamed' column is not updated and will remain empty. This is
because multiple workers may be copying files from different tablespaces.

- Added backup manifest:
The backend workers maintain their own manifest file which contains a list of files
that are being transferred by the work. Once all backup files are transferred, the
workers will create a temp file as ('pg_tempdir/temp_file_prefix_backupid.workerid')
to write the content of the manifest file from BufFile. The workers won’t add the
header, nor the WAL information in their manifest. These two will be added by the
main process while merging all worker manifest files.

The main process will read these individual files and concatenate them into a single file
which is then sent back to the client.

The manifest file is created when the following command is received:
BUILD_MANIFEST 'backupid'

This is a new replication command. It is sent when pg_basebackup has copied all the
$PGDATA files including WAL files.

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Regards

====================================

Kashif Zeeshan

Lead Quality Assurance Engineer / Manager

EnterpriseDB Corporation

The Enterprise Postgres Company

Re: WIP/PoC for parallel backup

From

Amit Kapila

Date:

21 April 2020, 04:27:31

On Tue, Apr 14, 2020 at 8:07 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

I forgot to make a check for no-manifest. Fixed. Attached is the updated patch.

Have we done any performance testing with this patch to see the benefits? If so, can you point me to the results? If not, then can we perform some tests on large backups to see the benefits of this patch/idea?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

21 April 2020, 07:29:47

Hi,

I did some tests a while back, and here are the results. The tests were done to simulate

a live database environment using pgbench.

machine configuration used for this test:

Instance Type: t2.xlarge

Volume Type : io1

Memory (MiB) : 16384

vCPU # : 4

Architecture : X86_64

IOP : 16000

Database Size (GB) : 102

The setup consist of 3 machines.

- one for database instances

- one for pg_basebackup client and

- one for pgbench with some parallel workers, simulating SELECT loads.

basebackup | 4 workers | 8 Workers | 16 workers

Backup Duration(Min): 69.25 | 20.44 | 19.86 | 20.15

(pgbench running with 50 parallel client simulating SELECT load)

Backup Duration(Min): 154.75 | 49.28 | 45.27 | 20.35

(pgbench running with 100 parallel client simulating SELECT load)

On Tue, Apr 21, 2020 at 9:27 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 14, 2020 at 8:07 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

I forgot to make a check for no-manifest. Fixed. Attached is the updated patch.

Have we done any performance testing with this patch to see the benefits? If so, can you point me to the results? If not, then can we perform some tests on large backups to see the benefits of this patch/idea?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

From

Jeevan Ladhe

Date:

21 April 2020, 09:35:38

Hi Asif,

On Tue, Apr 21, 2020 at 1:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

Hi,

I did some tests a while back, and here are the results. The tests were done to simulate
a live database environment using pgbench.

machine configuration used for this test:
Instance Type: t2.xlarge
Volume Type : io1
Memory (MiB) : 16384
vCPU # : 4
Architecture : X86_64
IOP : 16000
Database Size (GB) : 102

The setup consist of 3 machines.
- one for database instances
- one for pg_basebackup client and
- one for pgbench with some parallel workers, simulating SELECT loads.

basebackup | 4 workers | 8 Workers | 16 workers
Backup Duration(Min): 69.25 | 20.44 | 19.86 | 20.15
(pgbench running with 50 parallel client simulating SELECT load)

Well that looks a bit strange. All 4, 8 and 16 workers backup configurations

seem to have taken the same time. Is it because the machine CPUs are

only 4? In that case did you try to run with 2-workers and compare that

with 4-workers time?

Also, just to clarify and be sure - was there anything else running on any of

these 3 machines while the backup was in progress.

Regards,

Jeevan Ladhe

Backup Duration(Min): 154.75 | 49.28 | 45.27 | 20.35
(pgbench running with 100 parallel client simulating SELECT load)

On Tue, Apr 21, 2020 at 9:27 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Apr 14, 2020 at 8:07 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

I forgot to make a check for no-manifest. Fixed. Attached is the updated patch.

Have we done any performance testing with this patch to see the benefits? If so, can you point me to the results? If not, then can we perform some tests on large backups to see the benefits of this patch/idea?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

21 April 2020, 10:16:33

On Tue, 21 Apr 2020 at 2:36 PM, Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:

Hi Asif,

On Tue, Apr 21, 2020 at 1:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
Hi,

I did some tests a while back, and here are the results. The tests were done to simulate
a live database environment using pgbench.

machine configuration used for this test:
Instance Type: t2.xlarge
Volume Type : io1
Memory (MiB) : 16384
vCPU # : 4
Architecture : X86_64
IOP : 16000
Database Size (GB) : 102

The setup consist of 3 machines.
- one for database instances
- one for pg_basebackup client and
- one for pgbench with some parallel workers, simulating SELECT loads.

basebackup | 4 workers | 8 Workers | 16 workers
Backup Duration(Min): 69.25 | 20.44 | 19.86 | 20.15
(pgbench running with 50 parallel client simulating SELECT load)

Well that looks a bit strange. All 4, 8 and 16 workers backup configurations
seem to have taken the same time. Is it because the machine CPUs are
only 4? In that case did you try to run with 2-workers and compare that
with 4-workers time?

Also, just to clarify and be sure - was there anything else running on any of
these 3 machines while the backup was in progress.

The tests were performed only for 4, 8 and 16 at the time and there was nothing else running on any of the machines.

Regards,
Jeevan Ladhe

Backup Duration(Min): 154.75 | 49.28 | 45.27 | 20.35
(pgbench running with 100 parallel client simulating SELECT load)

On Tue, Apr 21, 2020 at 9:27 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Tue, Apr 14, 2020 at 8:07 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

I forgot to make a check for no-manifest. Fixed. Attached is the updated patch.

Have we done any performance testing with this patch to see the benefits? If so, can you point me to the results? If not, then can we perform some tests on large backups to see the benefits of this patch/idea?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

From

Amit Kapila

Date:

21 April 2020, 11:48:17

On Tue, Apr 21, 2020 at 1:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
>
> I did some tests a while back, and here are the results. The tests were done to simulate
> a live database environment using pgbench.
>
> machine configuration used for this test:
> Instance Type:    t2.xlarge
> Volume Type  :    io1
> Memory (MiB) :    16384
> vCPU #           :    4
> Architecture    :    X86_64
> IOP                 :    16000
> Database Size (GB) :    102
>
> The setup consist of 3 machines.
> - one for database instances
> - one for pg_basebackup client and
> - one for pgbench with some parallel workers, simulating SELECT loads.
>
>                                    basebackup | 4 workers | 8 Workers  | 16 workers
> Backup Duration(Min):       69.25    |  20.44      | 19.86          | 20.15
> (pgbench running with 50 parallel client simulating SELECT load)
>
> Backup Duration(Min):       154.75   |  49.28     | 45.27         | 20.35
> (pgbench running with 100 parallel client simulating SELECT load)
>

Thanks for sharing the results, these show nice speedup!  However, I
think we should try to find what exactly causes this speed up.  If you
see the recent discussion on another thread related to this topic,
Andres, pointed out that he doesn't think that we can gain much by
having multiple connections[1].  It might be due to some internal
limitations (like small buffers) [2] due to which we are seeing these
speedups.  It might help if you can share the perf reports of the
server-side and pg_basebackup side.  We don't need pgbench type
workload to see what caused speed up.

[1] - https://www.postgresql.org/message-id/20200420201922.55ab7ovg6535suyz%40alap3.anarazel.de
[2] - https://www.postgresql.org/message-id/20200421064420.z7eattzqbunbutz3%40alap3.anarazel.de

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: WIP/PoC for parallel backup

From

Amit Kapila

Date:

21 April 2020, 11:49:45

On Tue, Apr 21, 2020 at 5:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Apr 21, 2020 at 1:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
> >
> > I did some tests a while back, and here are the results. The tests were done to simulate
> > a live database environment using pgbench.
> >
> > machine configuration used for this test:
> > Instance Type:    t2.xlarge
> > Volume Type  :    io1
> > Memory (MiB) :    16384
> > vCPU #           :    4
> > Architecture    :    X86_64
> > IOP                 :    16000
> > Database Size (GB) :    102
> >
> > The setup consist of 3 machines.
> > - one for database instances
> > - one for pg_basebackup client and
> > - one for pgbench with some parallel workers, simulating SELECT loads.
> >
> >                                    basebackup | 4 workers | 8 Workers  | 16 workers
> > Backup Duration(Min):       69.25    |  20.44      | 19.86          | 20.15
> > (pgbench running with 50 parallel client simulating SELECT load)
> >
> > Backup Duration(Min):       154.75   |  49.28     | 45.27         | 20.35
> > (pgbench running with 100 parallel client simulating SELECT load)
> >
>
> Thanks for sharing the results, these show nice speedup!  However, I
> think we should try to find what exactly causes this speed up.  If you
> see the recent discussion on another thread related to this topic,
> Andres, pointed out that he doesn't think that we can gain much by
> having multiple connections[1].  It might be due to some internal
> limitations (like small buffers) [2] due to which we are seeing these
> speedups.  It might help if you can share the perf reports of the
> server-side and pg_basebackup side.
>

Just to be clear, we need perf reports both with and without patch-set.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: WIP/PoC for parallel backup

From

Ahsan Hadi

Date:

21 April 2020, 11:56:16

On Tue, Apr 21, 2020 at 4:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Tue, Apr 21, 2020 at 5:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Apr 21, 2020 at 1:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
> >
> > I did some tests a while back, and here are the results. The tests were done to simulate
> > a live database environment using pgbench.
> >
> > machine configuration used for this test:
> > Instance Type: t2.xlarge
> > Volume Type : io1
> > Memory (MiB) : 16384
> > vCPU # : 4
> > Architecture : X86_64
> > IOP : 16000
> > Database Size (GB) : 102
> >
> > The setup consist of 3 machines.
> > - one for database instances
> > - one for pg_basebackup client and
> > - one for pgbench with some parallel workers, simulating SELECT loads.
> >
> > basebackup | 4 workers | 8 Workers | 16 workers
> > Backup Duration(Min): 69.25 | 20.44 | 19.86 | 20.15
> > (pgbench running with 50 parallel client simulating SELECT load)
> >
> > Backup Duration(Min): 154.75 | 49.28 | 45.27 | 20.35
> > (pgbench running with 100 parallel client simulating SELECT load)
> >
>
> Thanks for sharing the results, these show nice speedup! However, I
> think we should try to find what exactly causes this speed up. If you
> see the recent discussion on another thread related to this topic,
> Andres, pointed out that he doesn't think that we can gain much by
> having multiple connections[1]. It might be due to some internal
> limitations (like small buffers) [2] due to which we are seeing these
> speedups. It might help if you can share the perf reports of the
> server-side and pg_basebackup side.
>

Just to be clear, we need perf reports both with and without patch-set.

These tests were done a while back, I think it would be good to run the benchmark again with the latest patches of parallel backup and share the results and perf reports.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.ca

Re: WIP/PoC for parallel backup

From

Amit Kapila

Date:

21 April 2020, 14:12:37

On Tue, Apr 21, 2020 at 5:26 PM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:
>
> On Tue, Apr 21, 2020 at 4:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Tue, Apr 21, 2020 at 5:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>> >
>> > On Tue, Apr 21, 2020 at 1:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> > >
>> > > I did some tests a while back, and here are the results. The tests were done to simulate
>> > > a live database environment using pgbench.
>> > >
>> > > machine configuration used for this test:
>> > > Instance Type:    t2.xlarge
>> > > Volume Type  :    io1
>> > > Memory (MiB) :    16384
>> > > vCPU #           :    4
>> > > Architecture    :    X86_64
>> > > IOP                 :    16000
>> > > Database Size (GB) :    102
>> > >
>> > > The setup consist of 3 machines.
>> > > - one for database instances
>> > > - one for pg_basebackup client and
>> > > - one for pgbench with some parallel workers, simulating SELECT loads.
>> > >
>> > >                                    basebackup | 4 workers | 8 Workers  | 16 workers
>> > > Backup Duration(Min):       69.25    |  20.44      | 19.86          | 20.15
>> > > (pgbench running with 50 parallel client simulating SELECT load)
>> > >
>> > > Backup Duration(Min):       154.75   |  49.28     | 45.27         | 20.35
>> > > (pgbench running with 100 parallel client simulating SELECT load)
>> > >
>> >
>> > Thanks for sharing the results, these show nice speedup!  However, I
>> > think we should try to find what exactly causes this speed up.  If you
>> > see the recent discussion on another thread related to this topic,
>> > Andres, pointed out that he doesn't think that we can gain much by
>> > having multiple connections[1].  It might be due to some internal
>> > limitations (like small buffers) [2] due to which we are seeing these
>> > speedups.  It might help if you can share the perf reports of the
>> > server-side and pg_basebackup side.
>> >
>>
>> Just to be clear, we need perf reports both with and without patch-set.
>
>
> These tests were done a while back, I think it would be good to run the benchmark again with the latest patches of
parallelbackup and share the results and perf reports.
 
>

Sounds good. I think we should also try to run the test with 1 worker
as well.  The reason it will be good to see the results with 1 worker
is that we can know if the technique to send file by file as is done
in this patch is better or worse than the current HEAD code.  So, it
will be good to see the results of an unpatched code, 1 worker, 2
workers, 4 workers, etc.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: WIP/PoC for parallel backup

From

Dipesh Pandit

Date:

22 April 2020, 11:07:25

Hi Asif,

I am reviewing your recent patch and found the patch is not applicable on latest master. 

Could you please resolve the conflicts and update a new patch?

Thanks,
Dipesh
EnterpriseDB: http://www.enterprisedb.com

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

22 April 2020, 14:18:20

Hi Dipesh,

The rebased and updated patch is attached. Its rebased to (9f2c4ede).

+typedef struct
+{
...
+} BackupFile;
+
+typedef struct
+{
...
+} BackupState;

These structures need comments.

Done.

+list_wal_files_opt_list:
+ SCONST SCONST
{
- $$ = makeDefElem("manifest_checksums",
-
(Node *)makeString($2), -1);
+ $$ = list_make2(
+ makeDefElem("start_wal_location",
+ (Node *)makeString($2), -1),
+ makeDefElem("end_wal_location",
+ (Node *)makeString($2), -1));
+
}

This seems like an unnecessarily complicated parse representation. The
DefElems seem to be completely unnecessary here.

The startptr and endptr are now in a shared state. so this command does not

need to have these two options now. So I have removed this rule entirely.

@@ -998,7 +1110,37 @@ SendBaseBackup(BaseBackupCmd *cmd)
set_ps_display(activitymsg);
}

- perform_base_backup(&opt);
+ switch (cmd->cmdtag)

So the design here is that SendBaseBackup() is now going to do a bunch
of things that are NOT sending a base backup? With no updates to the
comments of that function and no change to the process title it sets?

Okay. I have renamed the function and have updated the comments.

- return (manifest->buffile != NULL);
+ return (manifest && manifest->buffile != NULL);

Heck no. It appears that you didn't even bother reading the function
header comment.

Okay, I forgot to remove this check. In the backup manifest patch, manifest_info
object is always available. Anyways I have removed this check for 003 patch
as well.

+ * Send a single resultset containing XLogRecPtr record (in text format)
+ * TimelineID and backup label.
*/
static void
-SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
+SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli,
+ StringInfo label, char *backupid)

This just casually breaks wire protocol compatibility, which seems
completely unacceptable.

Non-parallal backup returns startptr and tli in the result set. The START_BACKUP
returns startptr, tli, backup label and backupid. So I had extended this result set.

I have removed the changes from SendXlogRecPtrResult and have added another

function just for returning the result set for parallel backup.

+ if (strlen(opt->tablespace) > 0)
+ sendTablespace(opt->tablespace, NULL, true, NULL, &files);
+ else
+ sendDir(".", 1, true, NIL, true, NULL, NULL, &files);
+
+ SendFilesHeader(files);

So I guess the idea here is that we buffer the entire list of files in
memory, regardless of size, and then we send it out afterwards. That
doesn't seem like a good idea. The list of files might be very large.
We probably need some code refactoring here rather than just piling
more and more different responsibilities onto sendTablespace() and
sendDir().

I don't foresee memory to be a challenge here. Assuming a database containing 10240

relation files (that max reach to 10 TB of size), the list will occupy approximately 102MB

of space in memory. This obviously can be reduced, but it doesn’t seem too bad either.

One way of doing it is by fetching a smaller set of files and clients can result in the next

set if the current one is processed; perhaps fetch initially per table space and request for

next one once the current one is done with.

Currently, basebackup only does compression on the client-side. So, I suggest we stick with

the existing behavior. On the other thread, you have mentioned that the backend should send

the tarballs and that the server should decide which files per tarball. I believe the current

design can accommodate that easily if it's the client deciding the files per tarball. The current

design can also accommodate server-side compression and encryption with minimal changes.

Is there a point I’m overlooking here?

+ if (state->parallel_mode)
+ SpinLockAcquire(&state->lock);
+
+ state->throttling_counter += increment;
+
+ if (state->parallel_mode)
+ SpinLockRelease(&state->lock);

I don't like this much. It seems to me that we would do better to use
atomics here all the time, instead of conditional spinlocks.

Okay have added throttling_counter as atomic. however a lock is still required
for throttling_counter%=throttling_sample.

+static void
+send_file(basebackup_options *opt, char *file, bool missing_ok)
...
+ if (file == NULL)
+ return;

That seems totally inappropriate.

Removed.

+ sendFile(file, file + basepathlen, &statbuf,
true, InvalidOid, NULL, NULL);

Maybe I'm misunderstanding, but this looks like it's going to write a
tar header, even though we're not writing a tarfile.

sendFile() always sends files with tar header included, even if the backup mode

is plain. pg_basebackup also expects the same. That's the current behavior of

the system.

Otherwise, we will have to duplicate this function which would be doing the pretty

much same thing, except the tar header.

+ else
+ ereport(WARNING,
+ (errmsg("skipping special file
or directory \"%s\"", file)));

So, if the user asks for a directory or symlink, what's going to
happen is that they're going to receive an empty file, and get a
warning. That sounds like terrible behavior.

Removed the warning and generated an error if other then a regular file is requested.

+ /*
+ * Check for checksum failures. If there are failures across multiple
+ * processes it may not report total checksum count, but it will error
+ * out,terminating the backup.
+ */

In other words, the patch breaks the feature. Not that the feature in
question works particularly well as things stand, but this makes it
worse.

Added an atomic uint64 total_checksum_failures to shared state to keep
the total count across workers, So it will have the same behavior as current.

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Attachment

parallel_backup_v15.zip

Re: WIP/PoC for parallel backup

From

Robert Haas

Date:

22 April 2020, 16:27:35

On Wed, Apr 22, 2020 at 10:18 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> I don't foresee memory to be a challenge here. Assuming a database containing 10240
> relation files (that max reach to 10 TB of size), the list will occupy approximately 102MB
> of space in memory. This obviously can be reduced, but it doesn’t seem too bad either.
> One way of doing it is by fetching a smaller set of files and clients can result in the next
> set if the current one is processed; perhaps fetch initially per table space and request for
> next one once the current one is done with.

The more concerning case is when someone has a lot of small files.

> Okay have added throttling_counter as atomic. however a lock is still required
> for  throttling_counter%=throttling_sample.

Well, if you can't get rid of the lock, using a atomics is pointless.

>> +                       sendFile(file, file + basepathlen, &statbuf,
>> true, InvalidOid, NULL, NULL);
>>
>> Maybe I'm misunderstanding, but this looks like it's going to write a
>> tar header, even though we're not writing a tarfile.
>
> sendFile() always sends files with tar header included, even if the backup mode
> is plain. pg_basebackup also expects the same. That's the current behavior of
> the system.
>
> Otherwise, we will have to duplicate this function which would be doing the pretty
> much same thing, except the tar header.

Well, as I said before, the solution to that problem is refactoring,
not crummy interfaces. You're never going to persuade any committer
who understands what that code actually does to commit it.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

From

Rajkumar Raghuwanshi

Date:

23 April 2020, 06:43:33

On Wed, Apr 22, 2020 at 7:48 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

Hi Dipesh,

The rebased and updated patch is attached. Its rebased to (9f2c4ede).

Make is failing for v15 patch.

gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -g -g -O0 -I. -I. -I../../../src/include -D_GNU_SOURCE -c -o basebackup.o basebackup.c -MMD -MP -MF .deps/basebackup.Po

In file included from basebackup.c:33:

../../../src/include/replication/backup_manifest.h:37: error: redefinition of typedef ‘manifest_info’

../../../src/include/replication/basebackup.h:35: note: previous declaration of ‘manifest_info’ was here

make[3]: *** [basebackup.o] Error 1

make[3]: Leaving directory `/home/edb/WORKDB/PG2/postgresql/src/backend/replication'

make[2]: *** [replication-recursive] Error 2

Re: WIP/PoC for parallel backup

From

Asif Rehman

Date:

23 April 2020, 08:17:03

On Thu, Apr 23, 2020 at 11:43 AM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:

On Wed, Apr 22, 2020 at 7:48 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

Hi Dipesh,

The rebased and updated patch is attached. Its rebased to (9f2c4ede).

Make is failing for v15 patch.

gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -g -g -O0 -I. -I. -I../../../src/include -D_GNU_SOURCE -c -o basebackup.o basebackup.c -MMD -MP -MF .deps/basebackup.Po
In file included from basebackup.c:33:
../../../src/include/replication/backup_manifest.h:37: error: redefinition of typedef ‘manifest_info’
../../../src/include/replication/basebackup.h:35: note: previous declaration of ‘manifest_info’ was here
make[3]: *** [basebackup.o] Error 1
make[3]: Leaving directory `/home/edb/WORKDB/PG2/postgresql/src/backend/replication'
make[2]: *** [replication-recursive] Error 2

Just compiled on clean source and its compiling fine. Can you see if you have a clean source tree?

Asif Rehman

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

From

Rajkumar Raghuwanshi

Date:

23 April 2020, 09:53:05

On Thu, Apr 23, 2020 at 1:47 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

On Thu, Apr 23, 2020 at 11:43 AM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:

On Wed, Apr 22, 2020 at 7:48 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

Hi Dipesh,

The rebased and updated patch is attached. Its rebased to (9f2c4ede).

Make is failing for v15 patch.

gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -g -g -O0 -I. -I. -I../../../src/include -D_GNU_SOURCE -c -o basebackup.o basebackup.c -MMD -MP -MF .deps/basebackup.Po
In file included from basebackup.c:33:
../../../src/include/replication/backup_manifest.h:37: error: redefinition of typedef ‘manifest_info’
../../../src/include/replication/basebackup.h:35: note: previous declaration of ‘manifest_info’ was here
make[3]: *** [basebackup.o] Error 1
make[3]: Leaving directory `/home/edb/WORKDB/PG2/postgresql/src/backend/replication'
make[2]: *** [replication-recursive] Error 2

Just compiled on clean source and its compiling fine. Can you see if you have a clean source tree?

Yeah, my machine is not cleaned. My colleague Suraj is also able to compile.

Thanks, sorry for the noise.

--
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

Re: WIP/PoC for parallel backup

From

David Zhang

Date:

27 April 2020, 16:53:16

Hi,

Here is the parallel backup performance test results with and without
the patch "parallel_backup_v15" on AWS cloud environment. Two
"t2.xlarge" machines were used: one for Postgres server and the other
one for pg_basebackup with the same machine configuration showing below.

Machine configuration:
     Instance Type        :t2.xlarge
     Volume type          :io1
     Memory (MiB)         :16GB
     vCPU #               :4
     Architecture         :x86_64
     IOP                  :6000
     Database Size (GB)   :108

Performance test results:
without patch:
     real 18m49.346s
     user 1m24.178s
     sys 7m2.966s

1 worker with patch:
     real 18m43.201s
     user 1m55.787s
     sys 7m24.724s

2 worker with patch:
     real 18m47.373s
     user 2m22.970s
     sys 11m23.891s

4 worker with patch:
     real 18m46.878s
     user 2m26.791s
     sys 13m14.716s

As required, I didn't have the pgbench running in parallel like we did
in the previous benchmark.

The perf report files for both Postgres server and pg_basebackup sides
are attached.

The files are listed like below. i.e. without patch 1 worker, and with
patch 1, 2, 4 workers.

perf report on Postgres server side:
     perf.data-postgres-without-parallel_backup_v15.txt
     perf.data-postgres-with-parallel_backup_v15-j1.txt
     perf.data-postgres-with-parallel_backup_v15-j2.txt
     perf.data-postgres-with-parallel_backup_v15-j4.txt

perf report on pg_basebackup side:
     perf.data-pg_basebackup-without-parallel_backup_v15.txt
     perf.data-pg_basebackup-with-parallel_backup_v15-j1.txt
     perf.data-pg_basebackup-with-parallel_backup_v15-j2.txt
     perf.data-pg_basebackup-with-parallel_backup_v15-j4.txt


If any more information required please let me know.


On 2020-04-21 7:12 a.m., Amit Kapila wrote:
> On Tue, Apr 21, 2020 at 5:26 PM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:
>> On Tue, Apr 21, 2020 at 4:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>> On Tue, Apr 21, 2020 at 5:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>> On Tue, Apr 21, 2020 at 1:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
>>>>> I did some tests a while back, and here are the results. The tests were done to simulate
>>>>> a live database environment using pgbench.
>>>>>
>>>>> machine configuration used for this test:
>>>>> Instance Type:    t2.xlarge
>>>>> Volume Type  :    io1
>>>>> Memory (MiB) :    16384
>>>>> vCPU #           :    4
>>>>> Architecture    :    X86_64
>>>>> IOP                 :    16000
>>>>> Database Size (GB) :    102
>>>>>
>>>>> The setup consist of 3 machines.
>>>>> - one for database instances
>>>>> - one for pg_basebackup client and
>>>>> - one for pgbench with some parallel workers, simulating SELECT loads.
>>>>>
>>>>>                                     basebackup | 4 workers | 8 Workers  | 16 workers
>>>>> Backup Duration(Min):       69.25    |  20.44      | 19.86          | 20.15
>>>>> (pgbench running with 50 parallel client simulating SELECT load)
>>>>>
>>>>> Backup Duration(Min):       154.75   |  49.28     | 45.27         | 20.35
>>>>> (pgbench running with 100 parallel client simulating SELECT load)
>>>>>
>>>> Thanks for sharing the results, these show nice speedup!  However, I
>>>> think we should try to find what exactly causes this speed up.  If you
>>>> see the recent discussion on another thread related to this topic,
>>>> Andres, pointed out that he doesn't think that we can gain much by
>>>> having multiple connections[1].  It might be due to some internal
>>>> limitations (like small buffers) [2] due to which we are seeing these
>>>> speedups.  It might help if you can share the perf reports of the
>>>> server-side and pg_basebackup side.
>>>>
>>> Just to be clear, we need perf reports both with and without patch-set.
>>
>> These tests were done a while back, I think it would be good to run the benchmark again with the latest patches of
parallelbackup and share the results and perf reports. 
>>
> Sounds good. I think we should also try to run the test with 1 worker
> as well.  The reason it will be good to see the results with 1 worker
> is that we can know if the technique to send file by file as is done
> in this patch is better or worse than the current HEAD code.  So, it
> will be good to see the results of an unpatched code, 1 worker, 2
> workers, 4 workers, etc.
>
--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca

Attachment

perf-report-parallel_backup_v15.zip

Re: WIP/PoC for parallel backup

From

Amit Kapila

Date:

28 April 2020, 03:15:55

On Mon, Apr 27, 2020 at 10:23 PM David Zhang <david.zhang@highgo.ca> wrote:
>
> Hi,
>
> Here is the parallel backup performance test results with and without
> the patch "parallel_backup_v15" on AWS cloud environment. Two
> "t2.xlarge" machines were used: one for Postgres server and the other
> one for pg_basebackup with the same machine configuration showing below.
>
> Machine configuration:
>      Instance Type        :t2.xlarge
>      Volume type          :io1
>      Memory (MiB)         :16GB
>      vCPU #               :4
>      Architecture         :x86_64
>      IOP                  :6000
>      Database Size (GB)   :108
>
> Performance test results:
> without patch:
>      real 18m49.346s
>      user 1m24.178s
>      sys 7m2.966s
>
> 1 worker with patch:
>      real 18m43.201s
>      user 1m55.787s
>      sys 7m24.724s
>
> 2 worker with patch:
>      real 18m47.373s
>      user 2m22.970s
>      sys 11m23.891s
>
> 4 worker with patch:
>      real 18m46.878s
>      user 2m26.791s
>      sys 13m14.716s
>
> As required, I didn't have the pgbench running in parallel like we did
> in the previous benchmark.
>

So, there doesn't seem to be any significant improvement in this
scenario.  Now, it is not clear why there was a significant
improvement in the previous run where pgbench was also running
simultaneously.  I am not sure but maybe it is because when a lot of
other backends were running (performing read-only workload) the
backend that was responsible for doing backup was getting frequently
scheduled out and it slowed down the overall backup process.  And when
we start using multiple backends for backup one or other backup
process is always running making the overall backup faster.  One idea
to find this out is to check how much time backup takes when we run it
with and without pgbench workload on HEAD (aka unpatched code).  Even
if what I am saying is true or there is some other reason due to which
we are seeing speedup in some cases (where there is a concurrent
workload), it might not make the case for using multiple backends for
backup but still, it is good to find that information as it might help
in designing this feature better.

> The perf report files for both Postgres server and pg_basebackup sides
> are attached.
>

It is not clear which functions are taking more time or for which
functions time is reduced as function symbols are not present in the
reports.  I think you can refer
"https://wiki.postgresql.org/wiki/Profiling_with_perf" to see how to
take profiles and additionally use -fno-omit-frame-pointer during
configure (you can use CFLAGS="-fno-omit-frame-pointer during
configure).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: WIP/PoC for parallel backup

From

Suraj Kharage

Date:

29 April 2020, 12:41:07

Hi,

We at EnterpriseDB did some performance testing around this parallel backup to check how this is beneficial and below are the results. In this testing, we run the backup -

1) Without Asif’s patch

2) With Asif’s patch and combination of workers 1,2,4,8.

We run those test on two setup

1) Client and Server both on the same machine (Local backups)

2) Client and server on a different machine (remote backups)

Machine details:

1: Server (on which local backups performed and used as server for remote backups)

2: Client (Used as a client for remote backups)

Server:

RAM: 500 GB

CPU details:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 8
NUMA node(s): 8

Filesystem: ext4

Client:

RAM: 490 GB

CPU details:
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 24

Filesystem: ext4

Below are the results for the local test:

Data size	without paralle backup patch	parallel backup with 1 worker	% performance increased/decreased compare to normal backup (without patch)	parallel backup with 2 worker	% performance increased/decreased compare to normal backup (without patch)	parallel backup with 4 worker	% performance increased/decreased compare to normal backup (without patch)	parallel backup with 8 worker	% performance increased/decreased compare to normal backup (without patch)
10 GB (10 tables - each table around 1.05 GB)	real 0m27.016s user 0m3.378s sys 0m23.059s	real 0m30.314s user 0m3.575s sys 0m22.946s	12% performance decreased	real 0m20.400s user 0m3.622s sys 0m29.670s	27% performace increased	real 0m15.331s user 0m3.706s sys 0m39.189s	43% performance increased	real 0m15.094s user 0m3.915s sys 1m23.350s	44% performace increased.
50GB (50 tables - each table around 1.05 GB)	real 2m11.049s user 0m16.464s sys 2m1.757s	real 2m26.621s user 0m18.497s sys 2m4.792s	21% performance decreased	real 1m9.581s user 0m18.298s sys 2m12.030s	46% performance increased	real 0m53.894s user 0m18.588s sys 2m47.390s	58% performance increased.	real 0m55.373s user 0m18.423s sys 5m57.470s	57% performance increased.
100GB (100 tables - each table around 1.05 GB)	real 4m4.776s user 0m33.699s sys 3m27.777s	real 4m20.862s user 0m35.753s sys 3m28.262s	6% performance decreased	real 2m37.411s user 0m36.440s sys 4m16.424s"	35% performance increased	real 1m49.503s user 0m37.200s sys 5m58.077s	55% performace increased	real 1m36.762s user 0m36.987s sys 9m36.906s	60% performace increased.
200GB (200 tables - each table around 1.05 GB)	real 10m34.998s user 1m8.471s sys 7m21.520s	real 11m30.899s user 1m12.933s sys 8m14.496s	8% performance decreased	real 6m8.481s user 1m13.771s sys 9m31.216s	41% performance increased	real 4m2.403s user 1m18.331s sys 12m29.661s	61% performance increased	real 4m3.768s user 1m24.547s sys 15m21.421s	61% performance increased

Results for the remote test:

Data size	without paralle backup patch	parallel backup with 1 worker	% performance increased/decreased compare to normal backup (without patch)	parallel backup with 2 worker	% performance increased/decreased compare to normal backup (without patch)	parallel backup with 4 worker	% performance increased/decreased compare to normal backup (without patch)	parallel backup with 8 worker	% performance increased/decreased compare to normal backup (without patch)
10 GB (10 tables - each table around 1.05 GB)	real 1m36.829s user 0m2.124s sys 0m14.004s	real 1m37.598s user 0m3.272s sys 0m11.110s	0.8% performance decreased	real 1m36.753s user 0m2.627s sys 0m15.312s	0.08% performance increased.	real 1m37.212s user 0m3.835s sys 0m13.221s	0.3% performance decreased.	real 1m36.977s user 0m4.475s sys 0m17.937s	0.1% perfomance decreased.
50GB (50 tables - each table around 1.05 GB)	real 7m54.211s user 0m10.826s sys 1m10.435s	real 7m55.603s user 0m16.535s sys 1m8.147s	0.2% performance decreased	real 7m53.499s user 0m18.131s sys 1m8.822s	0.1% performance increased.	real 7m54.687s user 0m15.818s sys 1m30.991s	0.1% performance decreased	real 7m54.658s user 0m20.783s sys 1m34.460s	0.1% performance decreased
100GB (100 tables - each table around 1.05 GB)	real 15m45.776s user 0m21.802s sys 2m59.006s	real 15m46.315s user 0m32.499s sys 2m47.245s	0.05% performance decreased	real 15m46.065s user 0m28.877s sys 2m21.181s	0.03% performacne drcreased	real 15m47.793s user 0m30.932s sys 2m36.708s	0.2% performance decresed	real 15m47.129s user 0m35.151s sys 3m23.572s	0.14% performance decreased.
200GB (200 tables - each table around 1.05 GB)	real 32m55.720s user 0m50.602s sys 5m38.875s	real 31m30.602s user 0m45.377s sys 4m57.405s	4% performance increased	real 31m30.214s user 0m55.023s sys 5m8.689s	4% performance increased	real 31m31.187s user 1m13.390s sys 5m40.861s	4% performance increased	real 31m31.729s user 1m4.955s sys 6m35.774s	4% performance decreased

Client & Server on the same machine, the result shows around 50% improvement in parallel run with worker 4 and 8. We don’t see the huge performance improvement with more workers been added.

Whereas, when the client and server on a different machine, we don’t see any major benefit in performance. This testing result matches the testing results posted by David Zhang up thread.

We ran the test for 100GB backup with parallel worker 4 to see the CPU usage and other information. What we noticed is that server is consuming the CPU almost 100% whole the time and pg_stat_activity shows that server is busy with ClientWrite most of the time.

Attaching captured output for

1) Top command output on the server after every 5 second

2) pg_stat_activity output after every 5 second

3) Top command output on the client after every 5 second

Do let me know if anyone has further questions/inputs for the benchmarking.

Thanks to Rushabh Lathia for helping me with this testing.

On Tue, Apr 28, 2020 at 8:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Mon, Apr 27, 2020 at 10:23 PM David Zhang <david.zhang@highgo.ca> wrote:
>
> Hi,
>
> Here is the parallel backup performance test results with and without
> the patch "parallel_backup_v15" on AWS cloud environment. Two
> "t2.xlarge" machines were used: one for Postgres server and the other
> one for pg_basebackup with the same machine configuration showing below.
>
> Machine configuration:
> Instance Type :t2.xlarge
> Volume type :io1
> Memory (MiB) :16GB
> vCPU # :4
> Architecture :x86_64
> IOP :6000
> Database Size (GB) :108
>
> Performance test results:
> without patch:
> real 18m49.346s
> user 1m24.178s
> sys 7m2.966s
>
> 1 worker with patch:
> real 18m43.201s
> user 1m55.787s
> sys 7m24.724s
>
> 2 worker with patch:
> real 18m47.373s
> user 2m22.970s
> sys 11m23.891s
>
> 4 worker with patch:
> real 18m46.878s
> user 2m26.791s
> sys 13m14.716s
>
> As required, I didn't have the pgbench running in parallel like we did
> in the previous benchmark.
>

So, there doesn't seem to be any significant improvement in this
scenario. Now, it is not clear why there was a significant
improvement in the previous run where pgbench was also running
simultaneously. I am not sure but maybe it is because when a lot of
other backends were running (performing read-only workload) the
backend that was responsible for doing backup was getting frequently
scheduled out and it slowed down the overall backup process. And when
we start using multiple backends for backup one or other backup
process is always running making the overall backup faster. One idea
to find this out is to check how much time backup takes when we run it
with and without pgbench workload on HEAD (aka unpatched code). Even
if what I am saying is true or there is some other reason due to which
we are seeing speedup in some cases (where there is a concurrent
workload), it might not make the case for using multiple backends for
backup but still, it is good to find that information as it might help
in designing this feature better.

> The perf report files for both Postgres server and pg_basebackup sides
> are attached.
>

It is not clear which functions are taking more time or for which
functions time is reduced as function symbols are not present in the
reports. I think you can refer
"https://wiki.postgresql.org/wiki/Profiling_with_perf" to see how to
take profiles and additionally use -fno-omit-frame-pointer during
configure (you can use CFLAGS="-fno-omit-frame-pointer during
configure).

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Thanks & Regards,

Suraj kharage,

EnterpriseDB Corporation,

The Postgres Database Company.

Attachment

Re: WIP/PoC for parallel backup

From

David Zhang

Date:

30 April 2020, 06:26:16

Hi,

Thanks a lot for sharing the test results. Here is the our test results using perf on three ASW t2.xlarge with below configuration.

Machine configuration:
      Instance Type        :t2.xlarge
      Volume type          :io1
      Memory (MiB)         :16GB
      vCPU #                   :4
      Architecture           :x86_64
      IOP                         :6000
      Database Size (GB) :45 (Server)

case 1: postgres server: without patch and without load

* Disk I/O:

# Samples: 342K of event 'block:block_rq_insert'
# Event count (approx.): 342834
#
# Overhead Command          Shared Object      Symbol
# ........ ............... ................. .....................
#
    97.65% postgres         [kernel.kallsyms] [k] __elv_add_request
     2.27% kworker/u30:0    [kernel.kallsyms] [k] __elv_add_request

* CPU:

# Samples: 6M of event 'cpu-clock'
# Event count (approx.): 1559444750000
#
# Overhead Command          Shared Object         Symbol
# ........ ............... .................... .............................................
#
    64.73% swapper          [kernel.kallsyms]     [k] native_safe_halt
    10.89% postgres         [vdso]                [.] __vdso_gettimeofday
     5.64% postgres         [kernel.kallsyms]     [k] do_syscall_64
     5.43% postgres         libpthread-2.26.so    [.] __libc_recv
     1.72% postgres         [kernel.kallsyms]     [k] pvclock_clocksource_read

* Network:

# Samples: 2M of event 'skb:consume_skb'
# Event count (approx.): 2739785
#
# Overhead Command          Shared Object      Symbol
# ........ ............... ................. ...........................
#
    91.58% swapper          [kernel.kallsyms] [k] consume_skb
     7.09% postgres         [kernel.kallsyms] [k] consume_skb
     0.61% kswapd0          [kernel.kallsyms] [k] consume_skb
     0.44% ksoftirqd/3      [kernel.kallsyms] [k] consume_skb

case 1: pg_basebackup client: without patch and without load

* Disk I/O:

# Samples: 371K of event 'block:block_rq_insert'
# Event count (approx.): 371362
#
# Overhead Command          Shared Object      Symbol
# ........ ............... ................. .....................
#
    96.78% kworker/u30:0    [kernel.kallsyms] [k] __elv_add_request
     2.82% pg_basebackup    [kernel.kallsyms] [k] __elv_add_request
     0.29% kworker/u30:1    [kernel.kallsyms] [k] __elv_add_request
     0.09% xfsaild/xvda1    [kernel.kallsyms] [k] __elv_add_request

* CPU:

# Samples: 3M of event 'cpu-clock'
# Event count (approx.): 903527000000
#
# Overhead Command          Shared Object       Symbol
# ........ ............... .................. .............................................
#
    87.99% swapper          [kernel.kallsyms]   [k] native_safe_halt
     3.14% swapper          [kernel.kallsyms]   [k] __lock_text_start
     0.48% swapper          [kernel.kallsyms]   [k] __softirqentry_text_start
     0.37% pg_basebackup    [kernel.kallsyms]   [k] copy_user_enhanced_fast_string
     0.35% swapper          [kernel.kallsyms]   [k] do_csum

* Network:

# Samples: 12M of event 'skb:consume_skb'
# Event count (approx.): 12260713
#
# Overhead Command          Shared Object      Symbol
# ........ ............... ................. ...........................
#
    95.12% swapper          [kernel.kallsyms] [k] consume_skb
     3.23% pg_basebackup    [kernel.kallsyms] [k] consume_skb
     0.83% ksoftirqd/1      [kernel.kallsyms] [k] consume_skb
     0.45% kswapd0          [kernel.kallsyms] [k] consume_skb

case 2: postgres server: with patch and with load, 4 backup workers on client side

* Disk I/O:

# Samples: 3M of event 'block:block_rq_insert'
# Event count (approx.): 3634542
#
# Overhead Command          Shared Object      Symbol
# ........ ............... ................. .....................
#
    98.88% postgres         [kernel.kallsyms] [k] __elv_add_request
     0.66% perf             [kernel.kallsyms] [k] __elv_add_request
     0.42% kworker/u30:1    [kernel.kallsyms] [k] __elv_add_request
     0.01% sshd             [kernel.kallsyms] [k] __elv_add_request

* CPU:

# Samples: 9M of event 'cpu-clock'
# Event count (approx.): 2299129250000
#
# Overhead Command          Shared Object          Symbol
# ........ ............... ..................... .............................................
#
    52.73% swapper          [kernel.kallsyms]      [k] native_safe_halt
     8.31% postgres         [vdso]                 [.] __vdso_gettimeofday
     4.46% postgres         [kernel.kallsyms]      [k] do_syscall_64
     4.16% postgres         libpthread-2.26.so     [.] __libc_recv
     1.58% postgres         [kernel.kallsyms]      [k] __lock_text_start
     1.52% postgres         [kernel.kallsyms]      [k] pvclock_clocksource_read
     0.81% postgres         [kernel.kallsyms]      [k] copy_user_enhanced_fast_string

* Network:

# Samples: 6M of event 'skb:consume_skb'
# Event count (approx.): 6048795
#
# Overhead Command          Shared Object      Symbol
# ........ ............... ................. ...........................
#
    85.81% postgres         [kernel.kallsyms] [k] consume_skb
    12.03% swapper          [kernel.kallsyms] [k] consume_skb
     0.97% postgres         [kernel.kallsyms] [k] __consume_stateless_skb
     0.85% ksoftirqd/3      [kernel.kallsyms] [k] consume_skb
     0.24% perf             [kernel.kallsyms] [k] consume_skb

case 2: pg_basebackup 4 workers: with patch and with load

* Disk I/O:

# Samples: 372K of event 'block:block_rq_insert'
# Event count (approx.): 372360
#
# Overhead Command          Shared Object      Symbol
# ........ ............... ................. .....................
#
    97.26% kworker/u30:0    [kernel.kallsyms] [k] __elv_add_request
     1.45% pg_basebackup    [kernel.kallsyms] [k] __elv_add_request
     0.95% kworker/u30:1    [kernel.kallsyms] [k] __elv_add_request
     0.14% xfsaild/xvda1    [kernel.kallsyms] [k] __elv_add_request

* CPU:

# Samples: 4M of event 'cpu-clock'
# Event count (approx.): 1234071000000
#
# Overhead Command          Shared Object             Symbol
# ........ ............... ........................ .................................................
#
    89.25% swapper          [kernel.kallsyms]         [k] native_safe_halt
     0.93% pg_basebackup    [kernel.kallsyms]         [k] __lock_text_start
     0.91% swapper          [kernel.kallsyms]         [k] __lock_text_start
     0.69% pg_basebackup    [kernel.kallsyms]         [k] copy_user_enhanced_fast_string
     0.45% swapper          [kernel.kallsyms]         [k] do_csum

* Network:

# Samples: 6M of event 'skb:consume_skb'
# Event count (approx.): 6449013
#
# Overhead Command          Shared Object      Symbol
# ........ ............... ................. ...........................
#
    90.28% pg_basebackup    [kernel.kallsyms] [k] consume_skb
     9.09% swapper          [kernel.kallsyms] [k] consume_skb
     0.29% ksoftirqd/1      [kernel.kallsyms] [k] consume_skb
     0.21% sshd             [kernel.kallsyms] [k] consume_skb

The detailed perf report is attached, with different scenarios, i.e. without patch (with and without load for server and client) , with patch (with and without load for 1, 2, 4, 8 workers for both server and client). The file name should self explain the cases.

Let me know if more information required.

Best regards,

David

On 2020-04-29 5:41 a.m., Suraj Kharage wrote:

Hi,

We at EnterpriseDB did some performance testing around this parallel backup to check how this is beneficial and below are the results. In this testing, we run the backup -
1) Without Asif’s patch
2) With Asif’s patch and combination of workers 1,2,4,8.

We run those test on two setup
1) Client and Server both on the same machine (Local backups)
2) Client and server on a different machine (remote backups)

Machine details:
1: Server (on which local backups performed and used as server for remote backups)
2: Client (Used as a client for remote backups)

Server:
RAM: 500 GB
CPU details:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 8
NUMA node(s): 8
Filesystem: ext4

Client:
RAM: 490 GB
CPU details:
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 24
Filesystem: ext4

Below are the results for the local test:

Data size without paralle backup
patch parallel backup with
1 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
2 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
4 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
8 worker % performance
increased/decreased
compare to normal
backup
(without patch)
10 GB
(10 tables - each table around 1.05 GB) real 0m27.016s
user 0m3.378s
sys 0m23.059s real 0m30.314s
user 0m3.575s
sys 0m22.946s 12% performance
decreased real 0m20.400s
user 0m3.622s
sys 0m29.670s 27% performace
increased real 0m15.331s
user 0m3.706s
sys 0m39.189s 43% performance
increased real 0m15.094s
user 0m3.915s
sys 1m23.350s 44% performace
increased.
50GB
(50 tables - each table around 1.05 GB) real 2m11.049s
user 0m16.464s
sys 2m1.757s real 2m26.621s
user 0m18.497s
sys 2m4.792s 21% performance
decreased real 1m9.581s
user 0m18.298s
sys 2m12.030s 46% performance
increased real 0m53.894s
user 0m18.588s
sys 2m47.390s 58% performance
increased. real 0m55.373s
user 0m18.423s
sys 5m57.470s 57% performance
increased.
100GB
(100 tables - each table around 1.05 GB) real 4m4.776s
user 0m33.699s
sys 3m27.777s real 4m20.862s
user 0m35.753s
sys 3m28.262s 6% performance
decreased real 2m37.411s
user 0m36.440s
sys 4m16.424s" 35% performance
increased real 1m49.503s
user 0m37.200s
sys 5m58.077s 55% performace
increased real 1m36.762s
user 0m36.987s
sys 9m36.906s 60% performace
increased.
200GB
(200 tables - each table around 1.05 GB) real 10m34.998s
user 1m8.471s
sys 7m21.520s real 11m30.899s
user 1m12.933s
sys 8m14.496s 8% performance
decreased real 6m8.481s
user 1m13.771s
sys 9m31.216s 41% performance
increased real 4m2.403s
user 1m18.331s
sys 12m29.661s 61% performance
increased real 4m3.768s
user 1m24.547s
sys 15m21.421s 61% performance
increased

Results for the remote test:

Data size without paralle backup
patch parallel backup with
1 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
2 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
4 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
8 worker % performance
increased/decreased
compare to normal
backup
(without patch)
10 GB
(10 tables - each table around 1.05 GB) real 1m36.829s
user 0m2.124s
sys 0m14.004s real 1m37.598s
user 0m3.272s
sys 0m11.110s 0.8% performance
decreased real 1m36.753s
user 0m2.627s
sys 0m15.312s 0.08% performance
increased. real 1m37.212s
user 0m3.835s
sys 0m13.221s 0.3% performance
decreased. real 1m36.977s
user 0m4.475s
sys 0m17.937s 0.1% perfomance
decreased.
50GB
(50 tables - each table around 1.05 GB) real 7m54.211s
user 0m10.826s
sys 1m10.435s real 7m55.603s
user 0m16.535s
sys 1m8.147s 0.2% performance
decreased real 7m53.499s
user 0m18.131s
sys 1m8.822s 0.1% performance
increased. real 7m54.687s
user 0m15.818s
sys 1m30.991s 0.1% performance
decreased real 7m54.658s
user 0m20.783s
sys 1m34.460s 0.1% performance
decreased
100GB
(100 tables - each table around 1.05 GB) real 15m45.776s
user 0m21.802s
sys 2m59.006s real 15m46.315s
user 0m32.499s
sys 2m47.245s 0.05% performance
decreased real 15m46.065s
user 0m28.877s
sys 2m21.181s 0.03% performacne
drcreased real 15m47.793s
user 0m30.932s
sys 2m36.708s 0.2% performance
decresed real 15m47.129s
user 0m35.151s
sys 3m23.572s 0.14% performance
decreased.
200GB
(200 tables - each table around 1.05 GB) real 32m55.720s
user 0m50.602s
sys 5m38.875s real 31m30.602s
user 0m45.377s
sys 4m57.405s 4% performance
increased real 31m30.214s
user 0m55.023s
sys 5m8.689s 4% performance
increased real 31m31.187s
user 1m13.390s
sys 5m40.861s 4% performance
increased real 31m31.729s
user 1m4.955s
sys 6m35.774s 4% performance
decreased

Client & Server on the same machine, the result shows around 50% improvement in parallel run with worker 4 and 8.  We don’t see the huge performance improvement with more workers been added.

Whereas, when the client and server on a different machine, we don’t see any major benefit in performance.  This testing result matches the testing results posted by David Zhang up thread.

We ran the test for 100GB backup with parallel worker 4 to see the CPU usage and other information. What we noticed is that server is consuming the CPU almost 100% whole the time and pg_stat_activity shows that server is busy with ClientWrite most of the time.

Attaching captured output for
1) Top command output on the server after every 5 second
2) pg_stat_activity output after every 5 second
3) Top command output on the client after every 5 second

Do let me know if anyone has further questions/inputs for the benchmarking.

Thanks to Rushabh Lathia for helping me with this testing.

On Tue, Apr 28, 2020 at 8:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Apr 27, 2020 at 10:23 PM David Zhang <david.zhang@highgo.ca> wrote:
>
> Hi,
>
> Here is the parallel backup performance test results with and without
> the patch "parallel_backup_v15" on AWS cloud environment. Two
> "t2.xlarge" machines were used: one for Postgres server and the other
> one for pg_basebackup with the same machine configuration showing below.
>
> Machine configuration:
> Instance Type :t2.xlarge
> Volume type :io1
> Memory (MiB) :16GB
> vCPU # :4
> Architecture :x86_64
> IOP :6000
> Database Size (GB) :108
>
> Performance test results:
> without patch:
> real 18m49.346s
> user 1m24.178s
> sys 7m2.966s
>
> 1 worker with patch:
> real 18m43.201s
> user 1m55.787s
> sys 7m24.724s
>
> 2 worker with patch:
> real 18m47.373s
> user 2m22.970s
> sys 11m23.891s
>
> 4 worker with patch:
> real 18m46.878s
> user 2m26.791s
> sys 13m14.716s
>
> As required, I didn't have the pgbench running in parallel like we did
> in the previous benchmark.
>

So, there doesn't seem to be any significant improvement in this
scenario. Now, it is not clear why there was a significant
improvement in the previous run where pgbench was also running
simultaneously. I am not sure but maybe it is because when a lot of
other backends were running (performing read-only workload) the
backend that was responsible for doing backup was getting frequently
scheduled out and it slowed down the overall backup process. And when
we start using multiple backends for backup one or other backup
process is always running making the overall backup faster. One idea
to find this out is to check how much time backup takes when we run it
with and without pgbench workload on HEAD (aka unpatched code). Even
if what I am saying is true or there is some other reason due to which
we are seeing speedup in some cases (where there is a concurrent
workload), it might not make the case for using multiple backends for
backup but still, it is good to find that information as it might help
in designing this feature better.

> The perf report files for both Postgres server and pg_basebackup sides
> are attached.
>

It is not clear which functions are taking more time or for which
functions time is reduced as function symbols are not present in the
reports. I think you can refer
"https://wiki.postgresql.org/wiki/Profiling_with_perf" to see how to
take profiles and additionally use -fno-omit-frame-pointer during
configure (you can use CFLAGS="-fno-omit-frame-pointer during
configure).

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
--

Thanks & Regards,
Suraj kharage,
EnterpriseDB Corporation,
The Postgres Database Company.

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca

Attachment

perf_report.tar.gz

Re: WIP/PoC for parallel backup

From

Sumanta Mukherjee

Date:

30 April 2020, 09:18:23

Hi,

Would it be possible to put in the absolute numbers of the perf

so that it is easier to understand the amount of improvement with

and without the patch and different loads and workers.

I am also unsure why the swapper is taking such a huge percentage of the absolute time

in the base run of just the postgres server and pg_basebackup client.

With Regards,

Sumanta Mukherjee.

EnterpriseDB: http://www.enterprisedb.com

On Thu, Apr 30, 2020 at 1:18 PM David Zhang <david.zhang@highgo.ca> wrote:

Hi,
Thanks a lot for sharing the test results. Here is the our test results using perf on three ASW t2.xlarge with below configuration.
Machine configuration:
      Instance Type        :t2.xlarge
      Volume type          :io1
      Memory (MiB)         :16GB
      vCPU #                   :4
      Architecture           :x86_64
      IOP                         :6000
      Database Size (GB) :45 (Server)
case 1: postgres server: without patch and without load
* Disk I/O:
# Samples: 342K of event 'block:block_rq_insert'
# Event count (approx.): 342834
#
# Overhead Command          Shared Object      Symbol
# ........ ............... ................. .....................
#
    97.65% postgres         [kernel.kallsyms] [k] __elv_add_request
     2.27% kworker/u30:0    [kernel.kallsyms] [k] __elv_add_request

* CPU:
# Samples: 6M of event 'cpu-clock'
# Event count (approx.): 1559444750000
#
# Overhead Command          Shared Object         Symbol
# ........ ............... .................... .............................................
#
    64.73% swapper          [kernel.kallsyms]     [k] native_safe_halt
    10.89% postgres         [vdso]                [.] __vdso_gettimeofday
     5.64% postgres         [kernel.kallsyms]     [k] do_syscall_64
     5.43% postgres         libpthread-2.26.so    [.] __libc_recv
     1.72% postgres         [kernel.kallsyms]     [k] pvclock_clocksource_read
* Network:
# Samples: 2M of event 'skb:consume_skb'
# Event count (approx.): 2739785
#
# Overhead Command          Shared Object      Symbol
# ........ ............... ................. ...........................
#
    91.58% swapper          [kernel.kallsyms] [k] consume_skb
     7.09% postgres         [kernel.kallsyms] [k] consume_skb
     0.61% kswapd0          [kernel.kallsyms] [k] consume_skb
     0.44% ksoftirqd/3      [kernel.kallsyms] [k] consume_skb

case 1: pg_basebackup client: without patch and without load
* Disk I/O:
# Samples: 371K of event 'block:block_rq_insert'
# Event count (approx.): 371362
#
# Overhead Command          Shared Object      Symbol
# ........ ............... ................. .....................
#
    96.78% kworker/u30:0    [kernel.kallsyms] [k] __elv_add_request
     2.82% pg_basebackup    [kernel.kallsyms] [k] __elv_add_request
     0.29% kworker/u30:1    [kernel.kallsyms] [k] __elv_add_request
     0.09% xfsaild/xvda1    [kernel.kallsyms] [k] __elv_add_request

* CPU:
# Samples: 3M of event 'cpu-clock'
# Event count (approx.): 903527000000
#
# Overhead Command          Shared Object       Symbol
# ........ ............... .................. .............................................
#
    87.99% swapper          [kernel.kallsyms]   [k] native_safe_halt
     3.14% swapper          [kernel.kallsyms]   [k] __lock_text_start
     0.48% swapper          [kernel.kallsyms]   [k] __softirqentry_text_start
     0.37% pg_basebackup    [kernel.kallsyms]   [k] copy_user_enhanced_fast_string
     0.35% swapper          [kernel.kallsyms]   [k] do_csum
* Network:
# Samples: 12M of event 'skb:consume_skb'
# Event count (approx.): 12260713
#
# Overhead Command          Shared Object      Symbol
# ........ ............... ................. ...........................
#
    95.12% swapper          [kernel.kallsyms] [k] consume_skb
     3.23% pg_basebackup    [kernel.kallsyms] [k] consume_skb
     0.83% ksoftirqd/1      [kernel.kallsyms] [k] consume_skb
     0.45% kswapd0          [kernel.kallsyms] [k] consume_skb

case 2: postgres server: with patch and with load, 4 backup workers on client side
* Disk I/O:
# Samples: 3M of event 'block:block_rq_insert'
# Event count (approx.): 3634542
#
# Overhead Command          Shared Object      Symbol
# ........ ............... ................. .....................
#
    98.88% postgres         [kernel.kallsyms] [k] __elv_add_request
     0.66% perf             [kernel.kallsyms] [k] __elv_add_request
     0.42% kworker/u30:1    [kernel.kallsyms] [k] __elv_add_request
     0.01% sshd             [kernel.kallsyms] [k] __elv_add_request
* CPU:
# Samples: 9M of event 'cpu-clock'
# Event count (approx.): 2299129250000
#
# Overhead Command          Shared Object          Symbol
# ........ ............... ..................... .............................................
#
    52.73% swapper          [kernel.kallsyms]      [k] native_safe_halt
     8.31% postgres         [vdso]                 [.] __vdso_gettimeofday
     4.46% postgres         [kernel.kallsyms]      [k] do_syscall_64
     4.16% postgres         libpthread-2.26.so     [.] __libc_recv
     1.58% postgres         [kernel.kallsyms]      [k] __lock_text_start
     1.52% postgres         [kernel.kallsyms]      [k] pvclock_clocksource_read
     0.81% postgres         [kernel.kallsyms]      [k] copy_user_enhanced_fast_string

* Network:
# Samples: 6M of event 'skb:consume_skb'
# Event count (approx.): 6048795
#
# Overhead Command          Shared Object      Symbol
# ........ ............... ................. ...........................
#
    85.81% postgres         [kernel.kallsyms] [k] consume_skb
    12.03% swapper          [kernel.kallsyms] [k] consume_skb
     0.97% postgres         [kernel.kallsyms] [k] __consume_stateless_skb
     0.85% ksoftirqd/3      [kernel.kallsyms] [k] consume_skb
     0.24% perf             [kernel.kallsyms] [k] consume_skb

case 2: pg_basebackup 4 workers: with patch and with load
* Disk I/O:
# Samples: 372K of event 'block:block_rq_insert'
# Event count (approx.): 372360
#
# Overhead Command          Shared Object      Symbol
# ........ ............... ................. .....................
#
    97.26% kworker/u30:0    [kernel.kallsyms] [k] __elv_add_request
     1.45% pg_basebackup    [kernel.kallsyms] [k] __elv_add_request
     0.95% kworker/u30:1    [kernel.kallsyms] [k] __elv_add_request
     0.14% xfsaild/xvda1    [kernel.kallsyms] [k] __elv_add_request

* CPU:
# Samples: 4M of event 'cpu-clock'
# Event count (approx.): 1234071000000
#
# Overhead Command          Shared Object             Symbol
# ........ ............... ........................ .................................................
#
    89.25% swapper          [kernel.kallsyms]         [k] native_safe_halt
     0.93% pg_basebackup    [kernel.kallsyms]         [k] __lock_text_start
     0.91% swapper          [kernel.kallsyms]         [k] __lock_text_start
     0.69% pg_basebackup    [kernel.kallsyms]         [k] copy_user_enhanced_fast_string
     0.45% swapper          [kernel.kallsyms]         [k] do_csum

* Network:
# Samples: 6M of event 'skb:consume_skb'
# Event count (approx.): 6449013
#
# Overhead Command          Shared Object      Symbol
# ........ ............... ................. ...........................
#
    90.28% pg_basebackup    [kernel.kallsyms] [k] consume_skb
     9.09% swapper          [kernel.kallsyms] [k] consume_skb
     0.29% ksoftirqd/1      [kernel.kallsyms] [k] consume_skb
     0.21% sshd             [kernel.kallsyms] [k] consume_skb

The detailed perf report is attached, with different scenarios, i.e. without patch (with and without load for server and client) , with patch (with and without load for 1, 2, 4, 8 workers for both server and client). The file name should self explain the cases.
Let me know if more information required.
Best regards,
David
On 2020-04-29 5:41 a.m., Suraj Kharage wrote:
Hi,

We at EnterpriseDB did some performance testing around this parallel backup to check how this is beneficial and below are the results. In this testing, we run the backup -
1) Without Asif’s patch
2) With Asif’s patch and combination of workers 1,2,4,8.

We run those test on two setup
1) Client and Server both on the same machine (Local backups)
2) Client and server on a different machine (remote backups)

Machine details:
1: Server (on which local backups performed and used as server for remote backups)
2: Client (Used as a client for remote backups)

Server:
RAM: 500 GB
CPU details:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 8
NUMA node(s): 8
Filesystem: ext4

Client:
RAM: 490 GB
CPU details:
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 24
Filesystem: ext4

Below are the results for the local test:

Data size without paralle backup
patch parallel backup with
1 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
2 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
4 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
8 worker % performance
increased/decreased
compare to normal
backup
(without patch)
10 GB
(10 tables - each table around 1.05 GB) real 0m27.016s
user 0m3.378s
sys 0m23.059s real 0m30.314s
user 0m3.575s
sys 0m22.946s 12% performance
decreased real 0m20.400s
user 0m3.622s
sys 0m29.670s 27% performace
increased real 0m15.331s
user 0m3.706s
sys 0m39.189s 43% performance
increased real 0m15.094s
user 0m3.915s
sys 1m23.350s 44% performace
increased.
50GB
(50 tables - each table around 1.05 GB) real 2m11.049s
user 0m16.464s
sys 2m1.757s real 2m26.621s
user 0m18.497s
sys 2m4.792s 21% performance
decreased real 1m9.581s
user 0m18.298s
sys 2m12.030s 46% performance
increased real 0m53.894s
user 0m18.588s
sys 2m47.390s 58% performance
increased. real 0m55.373s
user 0m18.423s
sys 5m57.470s 57% performance
increased.
100GB
(100 tables - each table around 1.05 GB) real 4m4.776s
user 0m33.699s
sys 3m27.777s real 4m20.862s
user 0m35.753s
sys 3m28.262s 6% performance
decreased real 2m37.411s
user 0m36.440s
sys 4m16.424s" 35% performance
increased real 1m49.503s
user 0m37.200s
sys 5m58.077s 55% performace
increased real 1m36.762s
user 0m36.987s
sys 9m36.906s 60% performace
increased.
200GB
(200 tables - each table around 1.05 GB) real 10m34.998s
user 1m8.471s
sys 7m21.520s real 11m30.899s
user 1m12.933s
sys 8m14.496s 8% performance
decreased real 6m8.481s
user 1m13.771s
sys 9m31.216s 41% performance
increased real 4m2.403s
user 1m18.331s
sys 12m29.661s 61% performance
increased real 4m3.768s
user 1m24.547s
sys 15m21.421s 61% performance
increased

Results for the remote test:

Data size without paralle backup
patch parallel backup with
1 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
2 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
4 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
8 worker % performance
increased/decreased
compare to normal
backup
(without patch)
10 GB
(10 tables - each table around 1.05 GB) real 1m36.829s
user 0m2.124s
sys 0m14.004s real 1m37.598s
user 0m3.272s
sys 0m11.110s 0.8% performance
decreased real 1m36.753s
user 0m2.627s
sys 0m15.312s 0.08% performance
increased. real 1m37.212s
user 0m3.835s
sys 0m13.221s 0.3% performance
decreased. real 1m36.977s
user 0m4.475s
sys 0m17.937s 0.1% perfomance
decreased.
50GB
(50 tables - each table around 1.05 GB) real 7m54.211s
user 0m10.826s
sys 1m10.435s real 7m55.603s
user 0m16.535s
sys 1m8.147s 0.2% performance
decreased real 7m53.499s
user 0m18.131s
sys 1m8.822s 0.1% performance
increased. real 7m54.687s
user 0m15.818s
sys 1m30.991s 0.1% performance
decreased real 7m54.658s
user 0m20.783s
sys 1m34.460s 0.1% performance
decreased
100GB
(100 tables - each table around 1.05 GB) real 15m45.776s
user 0m21.802s
sys 2m59.006s real 15m46.315s
user 0m32.499s
sys 2m47.245s 0.05% performance
decreased real 15m46.065s
user 0m28.877s
sys 2m21.181s 0.03% performacne
drcreased real 15m47.793s
user 0m30.932s
sys 2m36.708s 0.2% performance
decresed real 15m47.129s
user 0m35.151s
sys 3m23.572s 0.14% performance
decreased.
200GB
(200 tables - each table around 1.05 GB) real 32m55.720s
user 0m50.602s
sys 5m38.875s real 31m30.602s
user 0m45.377s
sys 4m57.405s 4% performance
increased real 31m30.214s
user 0m55.023s
sys 5m8.689s 4% performance
increased real 31m31.187s
user 1m13.390s
sys 5m40.861s 4% performance
increased real 31m31.729s
user 1m4.955s
sys 6m35.774s 4% performance
decreased

Client & Server on the same machine, the result shows around 50% improvement in parallel run with worker 4 and 8.  We don’t see the huge performance improvement with more workers been added.

Whereas, when the client and server on a different machine, we don’t see any major benefit in performance.  This testing result matches the testing results posted by David Zhang up thread.

We ran the test for 100GB backup with parallel worker 4 to see the CPU usage and other information. What we noticed is that server is consuming the CPU almost 100% whole the time and pg_stat_activity shows that server is busy with ClientWrite most of the time.

Attaching captured output for
1) Top command output on the server after every 5 second
2) pg_stat_activity output after every 5 second
3) Top command output on the client after every 5 second

Do let me know if anyone has further questions/inputs for the benchmarking.

Thanks to Rushabh Lathia for helping me with this testing.

On Tue, Apr 28, 2020 at 8:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Apr 27, 2020 at 10:23 PM David Zhang <david.zhang@highgo.ca> wrote:
>
> Hi,
>
> Here is the parallel backup performance test results with and without
> the patch "parallel_backup_v15" on AWS cloud environment. Two
> "t2.xlarge" machines were used: one for Postgres server and the other
> one for pg_basebackup with the same machine configuration showing below.
>
> Machine configuration:
> Instance Type :t2.xlarge
> Volume type :io1
> Memory (MiB) :16GB
> vCPU # :4
> Architecture :x86_64
> IOP :6000
> Database Size (GB) :108
>
> Performance test results:
> without patch:
> real 18m49.346s
> user 1m24.178s
> sys 7m2.966s
>
> 1 worker with patch:
> real 18m43.201s
> user 1m55.787s
> sys 7m24.724s
>
> 2 worker with patch:
> real 18m47.373s
> user 2m22.970s
> sys 11m23.891s
>
> 4 worker with patch:
> real 18m46.878s
> user 2m26.791s
> sys 13m14.716s
>
> As required, I didn't have the pgbench running in parallel like we did
> in the previous benchmark.
>

So, there doesn't seem to be any significant improvement in this
scenario. Now, it is not clear why there was a significant
improvement in the previous run where pgbench was also running
simultaneously. I am not sure but maybe it is because when a lot of
other backends were running (performing read-only workload) the
backend that was responsible for doing backup was getting frequently
scheduled out and it slowed down the overall backup process. And when
we start using multiple backends for backup one or other backup
process is always running making the overall backup faster. One idea
to find this out is to check how much time backup takes when we run it
with and without pgbench workload on HEAD (aka unpatched code). Even
if what I am saying is true or there is some other reason due to which
we are seeing speedup in some cases (where there is a concurrent
workload), it might not make the case for using multiple backends for
backup but still, it is good to find that information as it might help
in designing this feature better.

> The perf report files for both Postgres server and pg_basebackup sides
> are attached.
>

It is not clear which functions are taking more time or for which
functions time is reduced as function symbols are not present in the
reports. I think you can refer
"https://wiki.postgresql.org/wiki/Profiling_with_perf" to see how to
take profiles and additionally use -fno-omit-frame-pointer during
configure (you can use CFLAGS="-fno-omit-frame-pointer during
configure).

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
--

Thanks & Regards,
Suraj kharage,
EnterpriseDB Corporation,
The Postgres Database Company.
--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca

Re: WIP/PoC for parallel backup

From

Amit Kapila

Date:

30 April 2020, 10:45:13

On Wed, Apr 29, 2020 at 6:11 PM Suraj Kharage
<suraj.kharage@enterprisedb.com> wrote:
>
> Hi,
>
> We at EnterpriseDB did some performance testing around this parallel backup to check how this is beneficial and below
arethe results. In this testing, we run the backup - 
> 1) Without Asif’s patch
> 2) With Asif’s patch and combination of workers 1,2,4,8.
>
> We run those test on two setup
>
> 1) Client and Server both on the same machine (Local backups)
>
> 2) Client and server on a different machine (remote backups)
>
>
> Machine details:
>
> 1: Server (on which local backups performed and used as server for remote backups)
>
> 2: Client (Used as a client for remote backups)
>
>
...
>
>
> Client & Server on the same machine, the result shows around 50% improvement in parallel run with worker 4 and 8.  We
don’tsee the huge performance improvement with more workers been added. 
>
>
> Whereas, when the client and server on a different machine, we don’t see any major benefit in performance.  This
testingresult matches the testing results posted by David Zhang up thread. 
>
>
>
> We ran the test for 100GB backup with parallel worker 4 to see the CPU usage and other information. What we noticed
isthat server is consuming the CPU almost 100% whole the time and pg_stat_activity shows that server is busy with
ClientWritemost of the time. 
>
>

Was this for a setup where the client and server were on the same
machine or where the client was on a different machine?  If it was for
the case where both are on the same machine, then ideally, we should
see ClientRead events in a similar proportion?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: WIP/PoC for parallel backup

From

Amit Kapila

Date:

30 April 2020, 13:09:36

On Thu, Apr 30, 2020 at 4:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Apr 29, 2020 at 6:11 PM Suraj Kharage
> <suraj.kharage@enterprisedb.com> wrote:
> >
> > Hi,
> >
> > We at EnterpriseDB did some performance testing around this parallel backup to check how this is beneficial and
beloware the results. In this testing, we run the backup - 
> > 1) Without Asif’s patch
> > 2) With Asif’s patch and combination of workers 1,2,4,8.
> >
> > We run those test on two setup
> >
> > 1) Client and Server both on the same machine (Local backups)
> >
> > 2) Client and server on a different machine (remote backups)
> >
> >
> > Machine details:
> >
> > 1: Server (on which local backups performed and used as server for remote backups)
> >
> > 2: Client (Used as a client for remote backups)
> >
> >
> ...
> >
> >
> > Client & Server on the same machine, the result shows around 50% improvement in parallel run with worker 4 and 8.
Wedon’t see the huge performance improvement with more workers been added. 
> >
> >
> > Whereas, when the client and server on a different machine, we don’t see any major benefit in performance.  This
testingresult matches the testing results posted by David Zhang up thread. 
> >
> >
> >
> > We ran the test for 100GB backup with parallel worker 4 to see the CPU usage and other information. What we noticed
isthat server is consuming the CPU almost 100% whole the time and pg_stat_activity shows that server is busy with
ClientWritemost of the time. 
> >
> >
>
> Was this for a setup where the client and server were on the same
> machine or where the client was on a different machine?  If it was for
> the case where both are on the same machine, then ideally, we should
> see ClientRead events in a similar proportion?
>

During an offlist discussion with Robert, he pointed out that current
basebackup's code doesn't account for the wait event for the reading
of files which can change what pg_stat_activity shows?  Can you please
apply his latest patch to improve basebackup.c's code [1] which will
take care of that waitevent before getting the data again?

[1] - https://www.postgresql.org/message-id/CA%2BTgmobBw-3573vMosGj06r72ajHsYeKtksT_oTxH8XvTL7DxA%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: WIP/PoC for parallel backup

From

David Zhang

Date:

30 April 2020, 18:18:18

On 2020-04-30 2:18 a.m., Sumanta Mukherjee wrote:

Hi,

Would it be possible to put in the absolute numbers of the perf
so that it is easier to understand the amount of improvement with
and without the patch and different loads and workers.

Here is the parameters used to record the perf data on both server and client side, for example, after applied the patch15 using 4 workers with load,

perf record -o postgres_patch_j4_load -e block:block_rq_insert -e cpu-clock -e cycles:k -e skb:consume_skb -aR -s -- /home/ec2-user/after/bin/postgres -D /mnt/test/data

perf record -o backup_patch_j4_load -e block:block_rq_insert -e cpu-clock -e cycles:k -e skb:consume_skb -aR -s -- /home/ec2-user/after/bin/pg_basebackup -h ${PG_SERVER} -p 5432 -D /mnt/backup/data -v

And this is how the report is generated.
perf report -i postgres_patch_j4_load --stdio > postgres_patch_j4_load.txt

The original perf data files are still available, can you please clarify which parameter you would like to be added for regenerating the report, or any other parameters need to be added to recreate the perf.data and then generate the report?

I am also unsure why the swapper is taking such a huge percentage of the absolute time
in the base run of just the postgres server and pg_basebackup client.

With Regards,
Sumanta Mukherjee.
EnterpriseDB: http://www.enterprisedb.com

On Thu, Apr 30, 2020 at 1:18 PM David Zhang <david.zhang@highgo.ca> wrote:
Hi,
Thanks a lot for sharing the test results. Here is the our test results using perf on three ASW t2.xlarge with below configuration.
Machine configuration:
      Instance Type        :t2.xlarge
      Volume type          :io1
      Memory (MiB)         :16GB
      vCPU #                   :4
      Architecture           :x86_64
      IOP                         :6000
      Database Size (GB) :45 (Server)
case 1: postgres server: without patch and without load
* Disk I/O:
# Samples: 342K of event 'block:block_rq_insert'
# Event count (approx.): 342834
#
# Overhead Command          Shared Object      Symbol
# ........ ............... ................. .....................
#
    97.65% postgres         [kernel.kallsyms] [k] __elv_add_request
     2.27% kworker/u30:0    [kernel.kallsyms] [k] __elv_add_request

* CPU:
# Samples: 6M of event 'cpu-clock'
# Event count (approx.): 1559444750000
#
# Overhead Command          Shared Object         Symbol
# ........ ............... .................... .............................................
#
    64.73% swapper          [kernel.kallsyms]     [k] native_safe_halt
    10.89% postgres         [vdso]                [.] __vdso_gettimeofday
     5.64% postgres         [kernel.kallsyms]     [k] do_syscall_64
     5.43% postgres         libpthread-2.26.so    [.] __libc_recv
     1.72% postgres         [kernel.kallsyms]     [k] pvclock_clocksource_read
* Network:
# Samples: 2M of event 'skb:consume_skb'
# Event count (approx.): 2739785
#
# Overhead Command          Shared Object      Symbol
# ........ ............... ................. ...........................
#
    91.58% swapper          [kernel.kallsyms] [k] consume_skb
     7.09% postgres         [kernel.kallsyms] [k] consume_skb
     0.61% kswapd0          [kernel.kallsyms] [k] consume_skb
     0.44% ksoftirqd/3      [kernel.kallsyms] [k] consume_skb

case 1: pg_basebackup client: without patch and without load
* Disk I/O:
# Samples: 371K of event 'block:block_rq_insert'
# Event count (approx.): 371362
#
# Overhead Command          Shared Object      Symbol
# ........ ............... ................. .....................
#
    96.78% kworker/u30:0    [kernel.kallsyms] [k] __elv_add_request
     2.82% pg_basebackup    [kernel.kallsyms] [k] __elv_add_request
     0.29% kworker/u30:1    [kernel.kallsyms] [k] __elv_add_request
     0.09% xfsaild/xvda1    [kernel.kallsyms] [k] __elv_add_request

* CPU:
# Samples: 3M of event 'cpu-clock'
# Event count (approx.): 903527000000
#
# Overhead Command          Shared Object       Symbol
# ........ ............... .................. .............................................
#
    87.99% swapper          [kernel.kallsyms]   [k] native_safe_halt
     3.14% swapper          [kernel.kallsyms]   [k] __lock_text_start
     0.48% swapper          [kernel.kallsyms]   [k] __softirqentry_text_start
     0.37% pg_basebackup    [kernel.kallsyms]   [k] copy_user_enhanced_fast_string
     0.35% swapper          [kernel.kallsyms]   [k] do_csum
* Network:
# Samples: 12M of event 'skb:consume_skb'
# Event count (approx.): 12260713
#
# Overhead Command          Shared Object      Symbol
# ........ ............... ................. ...........................
#
    95.12% swapper          [kernel.kallsyms] [k] consume_skb
     3.23% pg_basebackup    [kernel.kallsyms] [k] consume_skb
     0.83% ksoftirqd/1      [kernel.kallsyms] [k] consume_skb
     0.45% kswapd0          [kernel.kallsyms] [k] consume_skb

case 2: postgres server: with patch and with load, 4 backup workers on client side
* Disk I/O:
# Samples: 3M of event 'block:block_rq_insert'
# Event count (approx.): 3634542
#
# Overhead Command          Shared Object      Symbol
# ........ ............... ................. .....................
#
    98.88% postgres         [kernel.kallsyms] [k] __elv_add_request
     0.66% perf             [kernel.kallsyms] [k] __elv_add_request
     0.42% kworker/u30:1    [kernel.kallsyms] [k] __elv_add_request
     0.01% sshd             [kernel.kallsyms] [k] __elv_add_request
* CPU:
# Samples: 9M of event 'cpu-clock'
# Event count (approx.): 2299129250000
#
# Overhead Command          Shared Object          Symbol
# ........ ............... ..................... .............................................
#
    52.73% swapper          [kernel.kallsyms]      [k] native_safe_halt
     8.31% postgres         [vdso]                 [.] __vdso_gettimeofday
     4.46% postgres         [kernel.kallsyms]      [k] do_syscall_64
     4.16% postgres         libpthread-2.26.so     [.] __libc_recv
     1.58% postgres         [kernel.kallsyms]      [k] __lock_text_start
     1.52% postgres         [kernel.kallsyms]      [k] pvclock_clocksource_read
     0.81% postgres         [kernel.kallsyms]      [k] copy_user_enhanced_fast_string

* Network:
# Samples: 6M of event 'skb:consume_skb'
# Event count (approx.): 6048795
#
# Overhead Command          Shared Object      Symbol
# ........ ............... ................. ...........................
#
    85.81% postgres         [kernel.kallsyms] [k] consume_skb
    12.03% swapper          [kernel.kallsyms] [k] consume_skb
     0.97% postgres         [kernel.kallsyms] [k] __consume_stateless_skb
     0.85% ksoftirqd/3      [kernel.kallsyms] [k] consume_skb
     0.24% perf             [kernel.kallsyms] [k] consume_skb

case 2: pg_basebackup 4 workers: with patch and with load
* Disk I/O:
# Samples: 372K of event 'block:block_rq_insert'
# Event count (approx.): 372360
#
# Overhead Command          Shared Object      Symbol
# ........ ............... ................. .....................
#
    97.26% kworker/u30:0    [kernel.kallsyms] [k] __elv_add_request
     1.45% pg_basebackup    [kernel.kallsyms] [k] __elv_add_request
     0.95% kworker/u30:1    [kernel.kallsyms] [k] __elv_add_request
     0.14% xfsaild/xvda1    [kernel.kallsyms] [k] __elv_add_request

* CPU:
# Samples: 4M of event 'cpu-clock'
# Event count (approx.): 1234071000000
#
# Overhead Command          Shared Object             Symbol
# ........ ............... ........................ .................................................
#
    89.25% swapper          [kernel.kallsyms]         [k] native_safe_halt
     0.93% pg_basebackup    [kernel.kallsyms]         [k] __lock_text_start
     0.91% swapper          [kernel.kallsyms]         [k] __lock_text_start
     0.69% pg_basebackup    [kernel.kallsyms]         [k] copy_user_enhanced_fast_string
     0.45% swapper          [kernel.kallsyms]         [k] do_csum

* Network:
# Samples: 6M of event 'skb:consume_skb'
# Event count (approx.): 6449013
#
# Overhead Command          Shared Object      Symbol
# ........ ............... ................. ...........................
#
    90.28% pg_basebackup    [kernel.kallsyms] [k] consume_skb
     9.09% swapper          [kernel.kallsyms] [k] consume_skb
     0.29% ksoftirqd/1      [kernel.kallsyms] [k] consume_skb
     0.21% sshd             [kernel.kallsyms] [k] consume_skb

The detailed perf report is attached, with different scenarios, i.e. without patch (with and without load for server and client) , with patch (with and without load for 1, 2, 4, 8 workers for both server and client). The file name should self explain the cases.
Let me know if more information required.
Best regards,
David
On 2020-04-29 5:41 a.m., Suraj Kharage wrote:
Hi,

We at EnterpriseDB did some performance testing around this parallel backup to check how this is beneficial and below are the results. In this testing, we run the backup -
1) Without Asif’s patch
2) With Asif’s patch and combination of workers 1,2,4,8.

We run those test on two setup
1) Client and Server both on the same machine (Local backups)
2) Client and server on a different machine (remote backups)

Machine details:
1: Server (on which local backups performed and used as server for remote backups)
2: Client (Used as a client for remote backups)

Server:
RAM: 500 GB
CPU details:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 8
NUMA node(s): 8
Filesystem: ext4

Client:
RAM: 490 GB
CPU details:
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 24
Filesystem: ext4

Below are the results for the local test:

Data size without paralle backup
patch parallel backup with
1 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
2 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
4 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
8 worker % performance
increased/decreased
compare to normal
backup
(without patch)
10 GB
(10 tables - each table around 1.05 GB) real 0m27.016s
user 0m3.378s
sys 0m23.059s real 0m30.314s
user 0m3.575s
sys 0m22.946s 12% performance
decreased real 0m20.400s
user 0m3.622s
sys 0m29.670s 27% performace
increased real 0m15.331s
user 0m3.706s
sys 0m39.189s 43% performance
increased real 0m15.094s
user 0m3.915s
sys 1m23.350s 44% performace
increased.
50GB
(50 tables - each table around 1.05 GB) real 2m11.049s
user 0m16.464s
sys 2m1.757s real 2m26.621s
user 0m18.497s
sys 2m4.792s 21% performance
decreased real 1m9.581s
user 0m18.298s
sys 2m12.030s 46% performance
increased real 0m53.894s
user 0m18.588s
sys 2m47.390s 58% performance
increased. real 0m55.373s
user 0m18.423s
sys 5m57.470s 57% performance
increased.
100GB
(100 tables - each table around 1.05 GB) real 4m4.776s
user 0m33.699s
sys 3m27.777s real 4m20.862s
user 0m35.753s
sys 3m28.262s 6% performance
decreased real 2m37.411s
user 0m36.440s
sys 4m16.424s" 35% performance
increased real 1m49.503s
user 0m37.200s
sys 5m58.077s 55% performace
increased real 1m36.762s
user 0m36.987s
sys 9m36.906s 60% performace
increased.
200GB
(200 tables - each table around 1.05 GB) real 10m34.998s
user 1m8.471s
sys 7m21.520s real 11m30.899s
user 1m12.933s
sys 8m14.496s 8% performance
decreased real 6m8.481s
user 1m13.771s
sys 9m31.216s 41% performance
increased real 4m2.403s
user 1m18.331s
sys 12m29.661s 61% performance
increased real 4m3.768s
user 1m24.547s
sys 15m21.421s 61% performance
increased

Results for the remote test:

Data size without paralle backup
patch parallel backup with
1 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
2 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
4 worker % performance
increased/decreased
compare to normal
backup
(without patch) parallel backup with
8 worker % performance
increased/decreased
compare to normal
backup
(without patch)
10 GB
(10 tables - each table around 1.05 GB) real 1m36.829s
user 0m2.124s
sys 0m14.004s real 1m37.598s
user 0m3.272s
sys 0m11.110s 0.8% performance
decreased real 1m36.753s
user 0m2.627s
sys 0m15.312s 0.08% performance
increased. real 1m37.212s
user 0m3.835s
sys 0m13.221s 0.3% performance
decreased. real 1m36.977s
user 0m4.475s
sys 0m17.937s 0.1% perfomance
decreased.
50GB
(50 tables - each table around 1.05 GB) real 7m54.211s
user 0m10.826s
sys 1m10.435s real 7m55.603s
user 0m16.535s
sys 1m8.147s 0.2% performance
decreased real 7m53.499s
user 0m18.131s
sys 1m8.822s 0.1% performance
increased. real 7m54.687s
user 0m15.818s
sys 1m30.991s 0.1% performance
decreased real 7m54.658s
user 0m20.783s
sys 1m34.460s 0.1% performance
decreased
100GB
(100 tables - each table around 1.05 GB) real 15m45.776s
user 0m21.802s
sys 2m59.006s real 15m46.315s
user 0m32.499s
sys 2m47.245s 0.05% performance
decreased real 15m46.065s
user 0m28.877s
sys 2m21.181s 0.03% performacne
drcreased real 15m47.793s
user 0m30.932s
sys 2m36.708s 0.2% performance
decresed real 15m47.129s
user 0m35.151s
sys 3m23.572s 0.14% performance
decreased.
200GB
(200 tables - each table around 1.05 GB) real 32m55.720s
user 0m50.602s
sys 5m38.875s real 31m30.602s
user 0m45.377s
sys 4m57.405s 4% performance
increased real 31m30.214s
user 0m55.023s
sys 5m8.689s 4% performance
increased real 31m31.187s
user 1m13.390s
sys 5m40.861s 4% performance
increased real 31m31.729s
user 1m4.955s
sys 6m35.774s 4% performance
decreased

Client & Server on the same machine, the result shows around 50% improvement in parallel run with worker 4 and 8.  We don’t see the huge performance improvement with more workers been added.

Whereas, when the client and server on a different machine, we don’t see any major benefit in performance.  This testing result matches the testing results posted by David Zhang up thread.

We ran the test for 100GB backup with parallel worker 4 to see the CPU usage and other information. What we noticed is that server is consuming the CPU almost 100% whole the time and pg_stat_activity shows that server is busy with ClientWrite most of the time.

Attaching captured output for
1) Top command output on the server after every 5 second
2) pg_stat_activity output after every 5 second
3) Top command output on the client after every 5 second

Do let me know if anyone has further questions/inputs for the benchmarking.

Thanks to Rushabh Lathia for helping me with this testing.

On Tue, Apr 28, 2020 at 8:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Apr 27, 2020 at 10:23 PM David Zhang <david.zhang@highgo.ca> wrote:
>
> Hi,
>
> Here is the parallel backup performance test results with and without
> the patch "parallel_backup_v15" on AWS cloud environment. Two
> "t2.xlarge" machines were used: one for Postgres server and the other
> one for pg_basebackup with the same machine configuration showing below.
>
> Machine configuration:
> Instance Type :t2.xlarge
> Volume type :io1
> Memory (MiB) :16GB
> vCPU # :4
> Architecture :x86_64
> IOP :6000
> Database Size (GB) :108
>
> Performance test results:
> without patch:
> real 18m49.346s
> user 1m24.178s
> sys 7m2.966s
>
> 1 worker with patch:
> real 18m43.201s
> user 1m55.787s
> sys 7m24.724s
>
> 2 worker with patch:
> real 18m47.373s
> user 2m22.970s
> sys 11m23.891s
>
> 4 worker with patch:
> real 18m46.878s
> user 2m26.791s
> sys 13m14.716s
>
> As required, I didn't have the pgbench running in parallel like we did
> in the previous benchmark.
>

So, there doesn't seem to be any significant improvement in this
scenario. Now, it is not clear why there was a significant
improvement in the previous run where pgbench was also running
simultaneously. I am not sure but maybe it is because when a lot of
other backends were running (performing read-only workload) the
backend that was responsible for doing backup was getting frequently
scheduled out and it slowed down the overall backup process. And when
we start using multiple backends for backup one or other backup
process is always running making the overall backup faster. One idea
to find this out is to check how much time backup takes when we run it
with and without pgbench workload on HEAD (aka unpatched code). Even
if what I am saying is true or there is some other reason due to which
we are seeing speedup in some cases (where there is a concurrent
workload), it might not make the case for using multiple backends for
backup but still, it is good to find that information as it might help
in designing this feature better.

> The perf report files for both Postgres server and pg_basebackup sides
> are attached.
>

It is not clear which functions are taking more time or for which
functions time is reduced as function symbols are not present in the
reports. I think you can refer
"https://wiki.postgresql.org/wiki/Profiling_with_perf" to see how to
take profiles and additionally use -fno-omit-frame-pointer during
configure (you can use CFLAGS="-fno-omit-frame-pointer during
configure).

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
--

Thanks & Regards,
Suraj kharage,
EnterpriseDB Corporation,
The Postgres Database Company.
--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca

--
David

Software Engineer
Highgo Software Inc. (Canada)
www.highgo.ca

Re: WIP/PoC for parallel backup

From

Rushabh Lathia

Date:

04 May 2020, 13:22:37

On Thu, Apr 30, 2020 at 4:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

On Wed, Apr 29, 2020 at 6:11 PM Suraj Kharage
<suraj.kharage@enterprisedb.com> wrote:
>
> Hi,
>
> We at EnterpriseDB did some performance testing around this parallel backup to check how this is beneficial and below are the results. In this testing, we run the backup -
> 1) Without Asif’s patch
> 2) With Asif’s patch and combination of workers 1,2,4,8.
>
> We run those test on two setup
>
> 1) Client and Server both on the same machine (Local backups)
>
> 2) Client and server on a different machine (remote backups)
>
>
> Machine details:
>
> 1: Server (on which local backups performed and used as server for remote backups)
>
> 2: Client (Used as a client for remote backups)
>
>
...
>
>
> Client & Server on the same machine, the result shows around 50% improvement in parallel run with worker 4 and 8. We don’t see the huge performance improvement with more workers been added.
>
>
> Whereas, when the client and server on a different machine, we don’t see any major benefit in performance. This testing result matches the testing results posted by David Zhang up thread.
>
>
>
> We ran the test for 100GB backup with parallel worker 4 to see the CPU usage and other information. What we noticed is that server is consuming the CPU almost 100% whole the time and pg_stat_activity shows that server is busy with ClientWrite most of the time.
>
>

Was this for a setup where the client and server were on the same
machine or where the client was on a different machine? If it was for
the case where both are on the same machine, then ideally, we should
see ClientRead events in a similar proportion?

In the particular setup, the client and server were on different machines.

During an offlist discussion with Robert, he pointed out that current
basebackup's code doesn't account for the wait event for the reading
of files which can change what pg_stat_activity shows? Can you please
apply his latest patch to improve basebackup.c's code [1] which will
take care of that waitevent before getting the data again?

[1] - https://www.postgresql.org/message-id/CA%2BTgmobBw-3573vMosGj06r72ajHsYeKtksT_oTxH8XvTL7DxA%40mail.gmail.com

Sure, we can try out this and do a similar run to collect the pg_stat_activity output.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Rushabh Lathia

Re: WIP/PoC for parallel backup

From

Ahsan Hadi

Date:

21 May 2020, 05:17:29

On Mon, May 4, 2020 at 6:22 PM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:

On Thu, Apr 30, 2020 at 4:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Apr 29, 2020 at 6:11 PM Suraj Kharage
<suraj.kharage@enterprisedb.com> wrote:
>
> Hi,
>
> We at EnterpriseDB did some performance testing around this parallel backup to check how this is beneficial and below are the results. In this testing, we run the backup -
> 1) Without Asif’s patch
> 2) With Asif’s patch and combination of workers 1,2,4,8.
>
> We run those test on two setup
>
> 1) Client and Server both on the same machine (Local backups)
>
> 2) Client and server on a different machine (remote backups)
>
>
> Machine details:
>
> 1: Server (on which local backups performed and used as server for remote backups)
>
> 2: Client (Used as a client for remote backups)
>
>
...
>
>
> Client & Server on the same machine, the result shows around 50% improvement in parallel run with worker 4 and 8. We don’t see the huge performance improvement with more workers been added.
>
>
> Whereas, when the client and server on a different machine, we don’t see any major benefit in performance. This testing result matches the testing results posted by David Zhang up thread.
>
>
>
> We ran the test for 100GB backup with parallel worker 4 to see the CPU usage and other information. What we noticed is that server is consuming the CPU almost 100% whole the time and pg_stat_activity shows that server is busy with ClientWrite most of the time.
>
>

Was this for a setup where the client and server were on the same
machine or where the client was on a different machine? If it was for
the case where both are on the same machine, then ideally, we should
see ClientRead events in a similar proportion?

In the particular setup, the client and server were on different machines.

During an offlist discussion with Robert, he pointed out that current
basebackup's code doesn't account for the wait event for the reading
of files which can change what pg_stat_activity shows? Can you please
apply his latest patch to improve basebackup.c's code [1] which will
take care of that waitevent before getting the data again?

[1] - https://www.postgresql.org/message-id/CA%2BTgmobBw-3573vMosGj06r72ajHsYeKtksT_oTxH8XvTL7DxA%40mail.gmail.com

Sure, we can try out this and do a similar run to collect the pg_stat_activity output.

Have you had the chance to try this out?

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

--
Rushabh Lathia

Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.ca

Re: WIP/PoC for parallel backup

From

Rushabh Lathia

Date:

21 May 2020, 06:06:23

On Thu, May 21, 2020 at 10:47 AM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:

On Mon, May 4, 2020 at 6:22 PM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:

On Thu, Apr 30, 2020 at 4:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Apr 29, 2020 at 6:11 PM Suraj Kharage
<suraj.kharage@enterprisedb.com> wrote:
>
> Hi,
>
> We at EnterpriseDB did some performance testing around this parallel backup to check how this is beneficial and below are the results. In this testing, we run the backup -
> 1) Without Asif’s patch
> 2) With Asif’s patch and combination of workers 1,2,4,8.
>
> We run those test on two setup
>
> 1) Client and Server both on the same machine (Local backups)
>
> 2) Client and server on a different machine (remote backups)
>
>
> Machine details:
>
> 1: Server (on which local backups performed and used as server for remote backups)
>
> 2: Client (Used as a client for remote backups)
>
>
...
>
>
> Client & Server on the same machine, the result shows around 50% improvement in parallel run with worker 4 and 8. We don’t see the huge performance improvement with more workers been added.
>
>
> Whereas, when the client and server on a different machine, we don’t see any major benefit in performance. This testing result matches the testing results posted by David Zhang up thread.
>
>
>
> We ran the test for 100GB backup with parallel worker 4 to see the CPU usage and other information. What we noticed is that server is consuming the CPU almost 100% whole the time and pg_stat_activity shows that server is busy with ClientWrite most of the time.
>
>

Was this for a setup where the client and server were on the same
machine or where the client was on a different machine? If it was for
the case where both are on the same machine, then ideally, we should
see ClientRead events in a similar proportion?

In the particular setup, the client and server were on different machines.

During an offlist discussion with Robert, he pointed out that current
basebackup's code doesn't account for the wait event for the reading
of files which can change what pg_stat_activity shows? Can you please
apply his latest patch to improve basebackup.c's code [1] which will
take care of that waitevent before getting the data again?

[1] - https://www.postgresql.org/message-id/CA%2BTgmobBw-3573vMosGj06r72ajHsYeKtksT_oTxH8XvTL7DxA%40mail.gmail.com

Sure, we can try out this and do a similar run to collect the pg_stat_activity output.

Have you had the chance to try this out?

Yes. My colleague Suraj tried this and here are the pg_stat_activity output files.

Captured wait events after every 3 seconds during the backup for -
1: parallel backup for 100GB data with 4 workers (pg_stat_activity_normal_backup_100GB.txt)
2: Normal backup (without parallel backup patch) for 100GB data (pg_stat_activity_j4_100GB.txt)

Here is the observation:

The total number of events (pg_stat_activity) captured during above runs:
- 314 events for normal backups
- 316 events for parallel backups (-j 4)

BaseBackupRead wait event numbers: (newly added)
37 - in normal backups
25 - in the parallel backup (-j 4)

ClientWrite wait event numbers:
175 - in normal backup
1098 - in parallel backups

ClientRead wait event numbers:
0 - ClientRead in normal backup
326 - ClientRead in parallel backups for diff processes. (all in idle state)

Thanks,

Rushabh Lathia

www.EnterpriseDB.com

Attachment

Re: WIP/PoC for parallel backup

From

Amit Kapila

Date:

21 May 2020, 06:53:56

On Thu, May 21, 2020 at 11:36 AM Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:
>
> On Thu, May 21, 2020 at 10:47 AM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:
>>
>>>>
>>>> During an offlist discussion with Robert, he pointed out that current
>>>> basebackup's code doesn't account for the wait event for the reading
>>>> of files which can change what pg_stat_activity shows?  Can you please
>>>> apply his latest patch to improve basebackup.c's code [1] which will
>>>> take care of that waitevent before getting the data again?
>>>>
>>>> [1] -
https://www.postgresql.org/message-id/CA%2BTgmobBw-3573vMosGj06r72ajHsYeKtksT_oTxH8XvTL7DxA%40mail.gmail.com
>>>
>>>
>>>
>>> Sure, we can try out this and do a similar run to collect the pg_stat_activity output.
>>
>>
>> Have you had the chance to try this out?
>
>
> Yes. My colleague Suraj tried this and here are the pg_stat_activity output files.
>
> Captured wait events after every 3 seconds during the backup for -
> 1: parallel backup for 100GB data with 4 workers (pg_stat_activity_normal_backup_100GB.txt)
> 2: Normal backup (without parallel backup patch) for 100GB data  (pg_stat_activity_j4_100GB.txt)
>
> Here is the observation:
>
> The total number of events (pg_stat_activity) captured during above runs:
> - 314 events for normal backups
> - 316 events for parallel backups (-j 4)
>
> BaseBackupRead wait event numbers: (newly added)
> 37 - in normal backups
> 25 - in the parallel backup (-j 4)
>
> ClientWrite wait event numbers:
> 175 - in normal backup
> 1098 - in parallel backups
>
> ClientRead wait event numbers:
> 0 - ClientRead in normal backup
> 326 - ClientRead in parallel backups for diff processes. (all in idle state)
>

It might be interesting to see why ClientRead/ClientWrite has
increased so much and can we reduce it?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: WIP/PoC for parallel backup

From

Robert Haas

Date:

21 May 2020, 13:41:54

On Thu, May 21, 2020 at 2:06 AM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:
> Yes. My colleague Suraj tried this and here are the pg_stat_activity output files.
>
> Captured wait events after every 3 seconds during the backup for -
> 1: parallel backup for 100GB data with 4 workers (pg_stat_activity_normal_backup_100GB.txt)
> 2: Normal backup (without parallel backup patch) for 100GB data  (pg_stat_activity_j4_100GB.txt)
>
> Here is the observation:
>
> The total number of events (pg_stat_activity) captured during above runs:
> - 314 events for normal backups
> - 316 events for parallel backups (-j 4)
>
> BaseBackupRead wait event numbers: (newly added)
> 37 - in normal backups
> 25 - in the parallel backup (-j 4)
>
> ClientWrite wait event numbers:
> 175 - in normal backup
> 1098 - in parallel backups
>
> ClientRead wait event numbers:
> 0 - ClientRead in normal backup
> 326 - ClientRead in parallel backups for diff processes. (all in idle state)

So, basically, when we go from 1 process to 4, the additional
processes spend all of their time waiting rather than doing any useful
work, and that's why there is no performance benefit. Presumably, the
reason they spend all their time waiting for ClientRead/ClientWrite is
because the network between the two machines is saturated, so adding
more processes that are trying to use it at maximum speed just leads
to spending more time waiting for it to be available.

Do we have the same results for the local backup case, where the patch helped?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

From

Suraj Kharage

Date:

22 May 2020, 06:03:29

On Thu, May 21, 2020 at 7:12 PM Robert Haas <robertmhaas@gmail.com> wrote:

So, basically, when we go from 1 process to 4, the additional
processes spend all of their time waiting rather than doing any useful
work, and that's why there is no performance benefit. Presumably, the
reason they spend all their time waiting for ClientRead/ClientWrite is
because the network between the two machines is saturated, so adding
more processes that are trying to use it at maximum speed just leads
to spending more time waiting for it to be available.

Do we have the same results for the local backup case, where the patch helped?

Here is the result for local backup case (100GB data). Attaching the captured logs.

The total number of events (pg_stat_activity) captured during local runs:
- 82 events for normal backups
- 31 events for parallel backups (-j 4)

BaseBackupRead wait event numbers: (newly added)
24 - in normal backups
14 - in parallel backup (-j 4)

ClientWrite wait event numbers:
8 - in normal backup
43 - in parallel backups

ClientRead wait event numbers:
0 - ClientRead in normal backup
32 - ClientRead in parallel backups for diff processes.

Thanks & Regards,

Suraj kharage,

EnterpriseDB Corporation,

The Postgres Database Company.

Attachment

Re: WIP/PoC for parallel backup

From

Hamid Akhtar

Date:

11 June 2020, 17:40:38

As far I understand, parallel backup is not a mandatory performance feature, rather, one at user's discretion. This IMHO indicates that it will benefit some users and it may not others.

Taking a backup is an I/O intensive workload. So by parallelizing it through multiple worker threads/processes, creates an overhead of its own. So what precisely are we optimizing here. Looking at a running database system in any environment, I see the following potential scenarios playing out. These are probably clear to everyone here, but I'm listing these for completeness and clarity.

Locally Running Backup:
(1) Server has no clients connected other than base backup.
(2) Server has other clients connected which are actively performing operations causing disk I/O.

Remotely Running Backup:
(3) Server has no clients connected other than remote base backup.
(4) Server has other clients connected which are actively performing operations causing disk I/O.

Others:
(5) Server or the system running base backup has other processes competing for disk or network bandwidth.

Generally speaking, I see that parallelization could potentially benefit in scenarios (2), (4) and (5) with the reason being that having more than one thread increases the likelihood that backup will now get a bigger time slice for disk I/O and network bandwidth. With (1) and (3), since there are no competing processes, addition of multiple threads or processes will only increase CPU overhead whilst still getting the same network and disk time slice. In this particular case, the performance will degrade.

IMHO, that’s why by adding other load on the server, perhaps by running pgbench simultaneously may show improved performance for parallel backup. Also, running parallel backup on a local laptop more often than yields improved performance.

There are obviously other factors that may impact the performance like the type of I/O scheduler being used whether CFQ or some other.

IMHO, parallel backup has obvious performance benefits, but we need to ensure that users understand that there is potential for slower backup if there is no competition for resources.

On Fri, May 22, 2020 at 11:03 AM Suraj Kharage <suraj.kharage@enterprisedb.com> wrote:

On Thu, May 21, 2020 at 7:12 PM Robert Haas <robertmhaas@gmail.com> wrote:

So, basically, when we go from 1 process to 4, the additional
processes spend all of their time waiting rather than doing any useful
work, and that's why there is no performance benefit. Presumably, the
reason they spend all their time waiting for ClientRead/ClientWrite is
because the network between the two machines is saturated, so adding
more processes that are trying to use it at maximum speed just leads
to spending more time waiting for it to be available.

Do we have the same results for the local backup case, where the patch helped?

Here is the result for local backup case (100GB data). Attaching the captured logs.

The total number of events (pg_stat_activity) captured during local runs:
- 82 events for normal backups
- 31 events for parallel backups (-j 4)

BaseBackupRead wait event numbers: (newly added)
24 - in normal backups
14 - in parallel backup (-j 4)

ClientWrite wait event numbers:
8 - in normal backup
43 - in parallel backups

ClientRead wait event numbers:
0 - ClientRead in normal backup
32 - ClientRead in parallel backups for diff processes.

--
--

Thanks & Regards,
Suraj kharage,
EnterpriseDB Corporation,
The Postgres Database Company.

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950 EMAIL: mailto:hamid.akhtar@highgo.ca

SKYPE: engineeredvirus

Re: WIP/PoC for parallel backup

From

Robert Haas

Date:

12 June 2020, 17:28:37

On Thu, Jun 11, 2020 at 1:41 PM Hamid Akhtar <hamid.akhtar@gmail.com> wrote:
> As far I understand, parallel backup is not a mandatory performance feature, rather, one at user's discretion. This
IMHOindicates that it will benefit some users and it may not others.

>
> IMHO, parallel backup has obvious performance benefits, but we need to ensure that users understand that there is
potentialfor slower backup if there is no competition for resources.

I am sure that nobody is arguing that the patch has to be beneficial
in all cases in order to justify applying it. However, there are
several good arguments against proceding with this patch:

* Every version of the patch that has been reviewed by anybody has
been riddled with errors. Over and over again, testers have found
serious bugs, and code reviewers have noticed lots of problems, too.

* This approach requires rewriting a lot of current functionality,
either by moving it to the client side or by restructuring it to work
with parallelism. That's a lot of work, and it seems likely to
generate more work in the future as people continue to add features.
It's one thing to add a feature that doesn't benefit everybody; it's
another thing to add a feature that doesn't benefit everybody and also
hinders future development. See
http://postgr.es/m/CA+TgmoZubLXYR+Pd_gi3MVgyv5hQdLm-GBrVXkun-Lewaw12Kg@mail.gmail.com
for more discussion of these issues.

* The scenarios in which the patch delivers a performance benefit are
narrow and somewhat contrived. In remote backup scenarios, AIUI, the
patch hasn't been shown to help. In local backups, it does, but how
likely is it that you are going to do your local backups over the wire
protocol instead of by direct file copy, which is probably much
faster? I agree that if your server is overloaded, having multiple
processes competing for the server resources will allow backup to get
a larger slice relative to other things, but that seems like a pretty
hackish and inefficient solution to that problem. You could also argue
that we could provide a feature to prioritize some queries over other
queries by running them with tons of parallel workers just to convince
the OS to give them more resources, and I guess that would work, but
it would also waste tons of resources and possibly cripple or even
crash your system if you used it enough. The same argument applies
here.

* Even when the patch does provide a benefit, it seems to max out at
about 2.5X. Clearly it's nice to have something go 2.5X faster, but
the point is that it doesn't scale beyond that no matter how many
workers you add. That doesn't automatically mean that something is a
bad idea, but it is a concern. At the very least, we should be able to
say why it doesn't scale any better than that.

* Actually, we have some hints about that. Over at
http://postgr.es/m/20200503174922.mfzzdafa5g4rlhez@alap3.anarazel.de
Andres has shown that too much concurrency when copying files results
in a dramatic performance reduction, and that a lot of the reason why
concurrency helps in the first place has to do with the fact that
pg_basebackup does not have any cache control (no fallocate,
sync_file_range(WRITE), posix_fadvise(DONTNEED)). When those things
are added the performance gets better and the benefits of concurrency
are reduced. I suspect that would also be true for this patch. It
would be unreasonable to commit a large patch, especially one that
would hinder future development, if we could get the same benefits
from a small patch that would not do so.

I am not in a position to tell you how to spend your time, so you can
certainly pursue this patch if you wish. However, I think it's
probably not the best use of time. Even if you fixed all the bugs and
reimplemented all of the functionality that needs reimplementing in
order to make this approach work, it still doesn't make sense to
commit the patch if either (a) we can obtain the same benefit, or most
of it, from a much simpler patch or (b) the patch is going to make it
significantly harder to develop other features that we want to have,
especially if those features seem likely to be more beneficial than
what this patch offers. I think both of those are likely true here.

For an example of (b), consider compression of tar files on the server
side before transmission to the client. If you take the approach this
patch does and move tarfile construction to the client, that is
impossible. Now you can argue (and perhaps you will) that this would
just mean someone has to choose between using this feature and using
that feature, and why should users not have such a choice? That is a
fair argument, but my counter-argument is that users shouldn't be
forced into making that choice. If the parallel feature is beneficial
enough to justify having it, then it ought to be designed in such a
way that it works with the other features we also want to have rather
than forcing users to choose between them. Since I have already
proposed (on the other thread linked above) a design that would make
that possible, and this design does not, I have a hard time
understanding why we would pick this one, especially given all of the
other disadvantages which it seems to have.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

From

Daniel Gustafsson

Date:

06 July 2020, 12:24:03

> On 12 Jun 2020, at 19:28, Robert Haas <robertmhaas@gmail.com> wrote:

> I am sure that nobody is arguing that the patch has to be beneficial
> in all cases in order to justify applying it. However, there are
> several good arguments against proceding with this patch:

This thread has stalled with no resolution to the raised issues, and the latest
version of the patch (v15) posted no longer applies (I only tried 0001 which
failed, the green tick in the CFBot is due it mistakenlt thinking an attached
report is a patch).  I'm marking this patch Returned with Feedback.  Please
open a new CF entry when there is a new version of the patch.

cheers ./daniel

Re: WIP/PoC for parallel backup

From

Hamid Akhtar

Date:

06 July 2020, 12:39:08

On Mon, Jul 6, 2020 at 5:24 PM Daniel Gustafsson <daniel@yesql.se> wrote:

> On 12 Jun 2020, at 19:28, Robert Haas <robertmhaas@gmail.com> wrote:

> I am sure that nobody is arguing that the patch has to be beneficial
> in all cases in order to justify applying it. However, there are
> several good arguments against proceding with this patch:

This thread has stalled with no resolution to the raised issues, and the latest
version of the patch (v15) posted no longer applies (I only tried 0001 which
failed, the green tick in the CFBot is due it mistakenlt thinking an attached
report is a patch). I'm marking this patch Returned with Feedback. Please
open a new CF entry when there is a new version of the patch.

cheers ./daniel

I think this is fair. There are quite a few valid points raised by Robert.

Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL：+923335449950 EMAIL: mailto:hamid.akhtar@highgo.ca

SKYPE: engineeredvirus