Thread: WIP/PoC for parallel backup
Say, tablespace t1 has 20 files and we have 5 worker processes and tablespace t2 has 10. Ignoring all other factors for the sake of this example, each worker process will get a group of 4 files of t1 and 2 files of t2. Each process will create 2 tar files, one for t1 containing 4 files and another for t2 containing 2 files.
Attachment
Interesting proposal. Bulk of the work in a backup is transferring files from source data directory to destination. Your patch is breaking this task down in multiple sets of files and transferring each set in parallel. This seems correct, however, your patch is also creating a new process to handle each set. Is that necessary? I think we should try to achieve this using multiple asynchronous libpq connections from a single basebackup process. That is to use PQconnectStartParams() interface instead of PQconnectdbParams(), wich is currently used by basebackup. On the server side, it may still result in multiple backend processes per connection, and an attempt should be made to avoid that as well, but it seems complicated.
What do you think?
Asim
Hi Asif
Interesting proposal. Bulk of the work in a backup is transferring files from source data directory to destination. Your patch is breaking this task down in multiple sets of files and transferring each set in parallel. This seems correct, however, your patch is also creating a new process to handle each set. Is that necessary? I think we should try to achieve this using multiple asynchronous libpq connections from a single basebackup process. That is to use PQconnectStartParams() interface instead of PQconnectdbParams(), wich is currently used by basebackup. On the server side, it may still result in multiple backend processes per connection, and an attempt should be made to avoid that as well, but it seems complicated.
What do you think?
Hi Asif
Interesting proposal. Bulk of the work in a backup is transferring files from source data directory to destination. Your patch is breaking this task down in multiple sets of files and transferring each set in parallel. This seems correct, however, your patch is also creating a new process to handle each set. Is that necessary? I think we should try to achieve this using multiple asynchronous libpq connections from a single basebackup process. That is to use PQconnectStartParams() interface instead of PQconnectdbParams(), wich is currently used by basebackup. On the server side, it may still result in multiple backend processes per connection, and an attempt should be made to avoid that as well, but it seems complicated.
What do you think?
Asim
Greetings, * Asif Rehman (asifr.rehman@gmail.com) wrote: > On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote: > > Interesting proposal. Bulk of the work in a backup is transferring files > > from source data directory to destination. Your patch is breaking this > > task down in multiple sets of files and transferring each set in parallel. > > This seems correct, however, your patch is also creating a new process to > > handle each set. Is that necessary? I think we should try to achieve this > > using multiple asynchronous libpq connections from a single basebackup > > process. That is to use PQconnectStartParams() interface instead of > > PQconnectdbParams(), wich is currently used by basebackup. On the server > > side, it may still result in multiple backend processes per connection, and > > an attempt should be made to avoid that as well, but it seems complicated. > > Thanks Asim for the feedback. This is a good suggestion. The main idea I > wanted to discuss is the design where we can open multiple backend > connections to get the data instead of a single connection. > On the client side we can have multiple approaches, One is to use > asynchronous APIs ( as suggested by you) and other could be to decide > between multi-process and multi-thread. The main point was we can extract > lot of performance benefit by using the multiple connections and I built > this POC to float the idea of how the parallel backup can work, since the > core logic of getting the files using multiple connections will remain the > same, wether we use asynchronous, multi-process or multi-threaded. > > I am going to address the division of files to be distributed evenly among > multiple workers based on file sizes, that would allow to get some concrete > numbers as well as it will also us to gauge some benefits between async and > multiprocess/thread approach on client side. I would expect you to quickly want to support compression on the server side, before the data is sent across the network, and possibly encryption, and so it'd likely make sense to just have independent processes and connections through which to do that. Thanks, Stephen
Attachment
Greetings,
* Asif Rehman (asifr.rehman@gmail.com) wrote:
> On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote:
> > Interesting proposal. Bulk of the work in a backup is transferring files
> > from source data directory to destination. Your patch is breaking this
> > task down in multiple sets of files and transferring each set in parallel.
> > This seems correct, however, your patch is also creating a new process to
> > handle each set. Is that necessary? I think we should try to achieve this
> > using multiple asynchronous libpq connections from a single basebackup
> > process. That is to use PQconnectStartParams() interface instead of
> > PQconnectdbParams(), wich is currently used by basebackup. On the server
> > side, it may still result in multiple backend processes per connection, and
> > an attempt should be made to avoid that as well, but it seems complicated.
>
> Thanks Asim for the feedback. This is a good suggestion. The main idea I
> wanted to discuss is the design where we can open multiple backend
> connections to get the data instead of a single connection.
> On the client side we can have multiple approaches, One is to use
> asynchronous APIs ( as suggested by you) and other could be to decide
> between multi-process and multi-thread. The main point was we can extract
> lot of performance benefit by using the multiple connections and I built
> this POC to float the idea of how the parallel backup can work, since the
> core logic of getting the files using multiple connections will remain the
> same, wether we use asynchronous, multi-process or multi-threaded.
>
> I am going to address the division of files to be distributed evenly among
> multiple workers based on file sizes, that would allow to get some concrete
> numbers as well as it will also us to gauge some benefits between async and
> multiprocess/thread approach on client side.
I would expect you to quickly want to support compression on the server
side, before the data is sent across the network, and possibly
encryption, and so it'd likely make sense to just have independent
processes and connections through which to do that.
Thanks,
Stephen
Greetings,
* Asif Rehman (asifr.rehman@gmail.com) wrote:
> On Fri, Aug 23, 2019 at 3:18 PM Asim R P <apraveen@pivotal.io> wrote:
> > Interesting proposal. Bulk of the work in a backup is transferring files
> > from source data directory to destination. Your patch is breaking this
> > task down in multiple sets of files and transferring each set in parallel.
> > This seems correct, however, your patch is also creating a new process to
> > handle each set. Is that necessary? I think we should try to achieve this
> > using multiple asynchronous libpq connections from a single basebackup
> > process. That is to use PQconnectStartParams() interface instead of
> > PQconnectdbParams(), wich is currently used by basebackup. On the server
> > side, it may still result in multiple backend processes per connection, and
> > an attempt should be made to avoid that as well, but it seems complicated.
>
> Thanks Asim for the feedback. This is a good suggestion. The main idea I
> wanted to discuss is the design where we can open multiple backend
> connections to get the data instead of a single connection.
> On the client side we can have multiple approaches, One is to use
> asynchronous APIs ( as suggested by you) and other could be to decide
> between multi-process and multi-thread. The main point was we can extract
> lot of performance benefit by using the multiple connections and I built
> this POC to float the idea of how the parallel backup can work, since the
> core logic of getting the files using multiple connections will remain the
> same, wether we use asynchronous, multi-process or multi-threaded.
>
> I am going to address the division of files to be distributed evenly among
> multiple workers based on file sizes, that would allow to get some concrete
> numbers as well as it will also us to gauge some benefits between async and
> multiprocess/thread approach on client side.
I would expect you to quickly want to support compression on the server
side, before the data is sent across the network, and possibly
encryption, and so it'd likely make sense to just have independent
processes and connections through which to do that.
Thanks,
Stephen
Greetings, * Ahsan Hadi (ahsan.hadi@gmail.com) wrote: > On Fri, 23 Aug 2019 at 10:26 PM, Stephen Frost <sfrost@snowman.net> wrote: > > I would expect you to quickly want to support compression on the server > > side, before the data is sent across the network, and possibly > > encryption, and so it'd likely make sense to just have independent > > processes and connections through which to do that. > > It would be interesting to see the benefits of compression (before the data > is transferred over the network) on top of parallelism. Since there is also > some overhead associated with performing the compression. I agree with your > suggestion of trying to add parallelism first and then try compression > before the data is sent across the network. You're welcome to take a look at pgbackrest for insight and to play with regarding compression-before-transfer, how best to split up the files and order them, encryption, et al. We've put quite a bit of effort into figuring all of that out. Thanks! Stephen
Attachment
On Wed, Aug 21, 2019 at 9:53 AM Asif Rehman <asifr.rehman@gmail.com> wrote: > - BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA > If the parallel option is there, then it will only do pg_start_backup, scans PGDATA and sends a list of file names. So IIUC, this would mean that BASE_BACKUP without PARALLEL returns tarfiles, and BASE_BACKUP with PARALLEL returns a result set with a list of file names. I don't think that's a good approach. It's too confusing to have one replication command that returns totally different things depending on whether some option is given. > - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given list. > pg_basebackup will then send back a list of filenames in this command. This commands will be send by each worker and thatworker will be getting the said files. Seems reasonable, but I think you should just pass one file name and use the command multiple times, once per file. > - STOP_BACKUP > when all workers finish then, pg_basebackup will send STOP_BACKUP command. This also seems reasonable, but surely the matching command should then be called START_BACKUP, not BASEBACKUP PARALLEL. > I have done a basic proof of concenpt (POC), which is also attached. I would appreciate some input on this. So far, I amsimply dividing the list equally and assigning them to worker processes. I intend to fine tune this by taking into considerationfile sizes. Further to add tar format support, I am considering that each worker process, processes all filesbelonging to a tablespace in its list (i.e. creates and copies tar file), before it processes the next tablespace. Asa result, this will create tar files that are disjointed with respect tablespace data. For example: Instead of doing this, I suggest that you should just maintain a list of all the files that need to be fetched and have each worker pull a file from the head of the list and fetch it when it finishes receiving the previous file. That way, if some connections go faster or slower than others, the distribution of work ends up fairly even. If you instead pre-distribute the work, you're guessing what's going to happen in the future instead of just waiting to see what actually does happen. Guessing isn't intrinsically bad, but guessing when you could be sure of doing the right thing *is* bad. If you want to be really fancy, you could start by sorting the files in descending order of size, so that big files are fetched before small ones. Since the largest possible file is 1GB and any database where this feature is important is probably hundreds or thousands of GB, this may not be very important. I suggest not worrying about it for v1. > Say, tablespace t1 has 20 files and we have 5 worker processes and tablespace t2 has 10. Ignoring all other factors forthe sake of this example, each worker process will get a group of 4 files of t1 and 2 files of t2. Each process will create2 tar files, one for t1 containing 4 files and another for t2 containing 2 files. This is one of several possible approaches. If we're doing a plain-format backup in parallel, we can just write each file where it needs to go and call it good. But, with a tar-format backup, what should we do? I can see three options: 1. Error! Tar format parallel backups are not supported. 2. Write multiple tar files. The user might reasonably expect that they're going to end up with the same files at the end of the backup regardless of whether they do it in parallel. A user with this expectation will be disappointed. 3. Write one tar file. In this design, the workers have to take turns writing to the tar file, so you need some synchronization around that. Perhaps you'd have N threads that read and buffer a file, and N+1 buffers. Then you have one additional thread that reads the complete files from the buffers and writes them to the tar file. There's obviously some possibility that the writer won't be able to keep up and writing the backup will therefore be slower than it would be with approach (2). There's probably also a possibility that approach (2) would thrash the disk head back and forth between multiple files that are all being written at the same time, and approach (3) will therefore win by not thrashing the disk head. But, since spinning media are becoming less and less popular and are likely to have multiple disk heads under the hood when they are used, this is probably not too likely. I think your choice to go with approach (2) is probably reasonable, but I'm not sure whether everyone will agree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Aug 21, 2019 at 9:53 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> - BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA
> If the parallel option is there, then it will only do pg_start_backup, scans PGDATA and sends a list of file names.
So IIUC, this would mean that BASE_BACKUP without PARALLEL returns
tarfiles, and BASE_BACKUP with PARALLEL returns a result set with a
list of file names. I don't think that's a good approach. It's too
confusing to have one replication command that returns totally
different things depending on whether some option is given.
> - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given list.
> pg_basebackup will then send back a list of filenames in this command. This commands will be send by each worker and that worker will be getting the said files.
Seems reasonable, but I think you should just pass one file name and
use the command multiple times, once per file.
> - STOP_BACKUP
> when all workers finish then, pg_basebackup will send STOP_BACKUP command.
This also seems reasonable, but surely the matching command should
then be called START_BACKUP, not BASEBACKUP PARALLEL.
> I have done a basic proof of concenpt (POC), which is also attached. I would appreciate some input on this. So far, I am simply dividing the list equally and assigning them to worker processes. I intend to fine tune this by taking into consideration file sizes. Further to add tar format support, I am considering that each worker process, processes all files belonging to a tablespace in its list (i.e. creates and copies tar file), before it processes the next tablespace. As a result, this will create tar files that are disjointed with respect tablespace data. For example:
Instead of doing this, I suggest that you should just maintain a list
of all the files that need to be fetched and have each worker pull a
file from the head of the list and fetch it when it finishes receiving
the previous file. That way, if some connections go faster or slower
than others, the distribution of work ends up fairly even. If you
instead pre-distribute the work, you're guessing what's going to
happen in the future instead of just waiting to see what actually does
happen. Guessing isn't intrinsically bad, but guessing when you could
be sure of doing the right thing *is* bad.
If you want to be really fancy, you could start by sorting the files
in descending order of size, so that big files are fetched before
small ones. Since the largest possible file is 1GB and any database
where this feature is important is probably hundreds or thousands of
GB, this may not be very important. I suggest not worrying about it
for v1.
Your idea of sorting the files in descending order of size seems very appealing. I think we can do this and have the file divided among the workers one by one i.e. the first file in the list goes to worker 1, the second to process 2, and so on and so forth.
> Say, tablespace t1 has 20 files and we have 5 worker processes and tablespace t2 has 10. Ignoring all other factors for the sake of this example, each worker process will get a group of 4 files of t1 and 2 files of t2. Each process will create 2 tar files, one for t1 containing 4 files and another for t2 containing 2 files.
This is one of several possible approaches. If we're doing a
plain-format backup in parallel, we can just write each file where it
needs to go and call it good. But, with a tar-format backup, what
should we do? I can see three options:
1. Error! Tar format parallel backups are not supported.
2. Write multiple tar files. The user might reasonably expect that
they're going to end up with the same files at the end of the backup
regardless of whether they do it in parallel. A user with this
expectation will be disappointed.
3. Write one tar file. In this design, the workers have to take turns
writing to the tar file, so you need some synchronization around that.
Perhaps you'd have N threads that read and buffer a file, and N+1
buffers. Then you have one additional thread that reads the complete
files from the buffers and writes them to the tar file. There's
obviously some possibility that the writer won't be able to keep up
and writing the backup will therefore be slower than it would be with
approach (2).
There's probably also a possibility that approach (2) would thrash the
disk head back and forth between multiple files that are all being
written at the same time, and approach (3) will therefore win by not
thrashing the disk head. But, since spinning media are becoming less
and less popular and are likely to have multiple disk heads under the
hood when they are used, this is probably not too likely.
I think your choice to go with approach (2) is probably reasonable,
but I'm not sure whether everyone will agree.
Hi Robert,Thanks for the feedback. Please see the comments below:On Tue, Sep 24, 2019 at 10:53 PM Robert Haas <robertmhaas@gmail.com> wrote:On Wed, Aug 21, 2019 at 9:53 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> - BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA
> If the parallel option is there, then it will only do pg_start_backup, scans PGDATA and sends a list of file names.
So IIUC, this would mean that BASE_BACKUP without PARALLEL returns
tarfiles, and BASE_BACKUP with PARALLEL returns a result set with a
list of file names. I don't think that's a good approach. It's too
confusing to have one replication command that returns totally
different things depending on whether some option is given.Sure. I will add a separate command (START_BACKUP) for parallel.
> - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given list.
> pg_basebackup will then send back a list of filenames in this command. This commands will be send by each worker and that worker will be getting the said files.
Seems reasonable, but I think you should just pass one file name and
use the command multiple times, once per file.I considered this approach initially, however, I adopted the current strategy to avoid multiple round trips between the server and clients and save on query processing time by issuing a single command rather than multiple ones. Further fetching multiple files at once will also aid in supporting the tar format by utilising the existing ReceiveTarFile() function and will be able to create a tarball for per tablespace per worker.
> - STOP_BACKUP
> when all workers finish then, pg_basebackup will send STOP_BACKUP command.
This also seems reasonable, but surely the matching command should
then be called START_BACKUP, not BASEBACKUP PARALLEL.
> I have done a basic proof of concenpt (POC), which is also attached. I would appreciate some input on this. So far, I am simply dividing the list equally and assigning them to worker processes. I intend to fine tune this by taking into consideration file sizes. Further to add tar format support, I am considering that each worker process, processes all files belonging to a tablespace in its list (i.e. creates and copies tar file), before it processes the next tablespace. As a result, this will create tar files that are disjointed with respect tablespace data. For example:
Instead of doing this, I suggest that you should just maintain a list
of all the files that need to be fetched and have each worker pull a
file from the head of the list and fetch it when it finishes receiving
the previous file. That way, if some connections go faster or slower
than others, the distribution of work ends up fairly even. If you
instead pre-distribute the work, you're guessing what's going to
happen in the future instead of just waiting to see what actually does
happen. Guessing isn't intrinsically bad, but guessing when you could
be sure of doing the right thing *is* bad.
If you want to be really fancy, you could start by sorting the files
in descending order of size, so that big files are fetched before
small ones. Since the largest possible file is 1GB and any database
where this feature is important is probably hundreds or thousands of
GB, this may not be very important. I suggest not worrying about it
for v1.Ideally, I would like to support the tar format as well, which would be much easier to implement when fetching multiple files at once since that would enable using the existent functionality to be used without much change.
Your idea of sorting the files in descending order of size seems very appealing. I think we can do this and have the file divided among the workers one by one i.e. the first file in the list goes to worker 1, the second to process 2, and so on and so forth.
> Say, tablespace t1 has 20 files and we have 5 worker processes and tablespace t2 has 10. Ignoring all other factors for the sake of this example, each worker process will get a group of 4 files of t1 and 2 files of t2. Each process will create 2 tar files, one for t1 containing 4 files and another for t2 containing 2 files.
This is one of several possible approaches. If we're doing a
plain-format backup in parallel, we can just write each file where it
needs to go and call it good. But, with a tar-format backup, what
should we do? I can see three options:
1. Error! Tar format parallel backups are not supported.
2. Write multiple tar files. The user might reasonably expect that
they're going to end up with the same files at the end of the backup
regardless of whether they do it in parallel. A user with this
expectation will be disappointed.
3. Write one tar file. In this design, the workers have to take turns
writing to the tar file, so you need some synchronization around that.
Perhaps you'd have N threads that read and buffer a file, and N+1
buffers. Then you have one additional thread that reads the complete
files from the buffers and writes them to the tar file. There's
obviously some possibility that the writer won't be able to keep up
and writing the backup will therefore be slower than it would be with
approach (2).
There's probably also a possibility that approach (2) would thrash the
disk head back and forth between multiple files that are all being
written at the same time, and approach (3) will therefore win by not
thrashing the disk head. But, since spinning media are becoming less
and less popular and are likely to have multiple disk heads under the
hood when they are used, this is probably not too likely.
I think your choice to go with approach (2) is probably reasonable,
but I'm not sure whether everyone will agree.Yes for the tar format support, approach (2) is what I had in mind. Currently I'm working on the implementation and will share the patch in a couple of days.
--
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Attachment
On Fri, Sep 27, 2019 at 12:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote: >> > - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given list. >> > pg_basebackup will then send back a list of filenames in this command. This commands will be send by each worker andthat worker will be getting the said files. >> >> Seems reasonable, but I think you should just pass one file name and >> use the command multiple times, once per file. > > I considered this approach initially, however, I adopted the current strategy to avoid multiple round trips between theserver and clients and save on query processing time by issuing a single command rather than multiple ones. Further fetchingmultiple files at once will also aid in supporting the tar format by utilising the existing ReceiveTarFile() functionand will be able to create a tarball for per tablespace per worker. I think that sending multiple filenames on a line could save some time when there are lots of very small files, because then the round-trip overhead could be significant. However, if you've got mostly big files, I think this is going to be a loser. It'll be fine if you're able to divide the work exactly evenly, but that's pretty hard to do, because some workers may succeed in copying the data faster than others for a variety of reasons: some data is in memory, some data has to be read from disk, different data may need to be read from different disks that run at different speeds, not all the network connections may run at the same speed. Remember that the backup's not done until the last worker finishes, and so there may well be a significant advantage in terms of overall speed in putting some energy into making sure that they finish as close to each other in time as possible. To put that another way, the first time all the workers except one get done while the last one still has 10GB of data to copy, somebody's going to be unhappy. > Ideally, I would like to support the tar format as well, which would be much easier to implement when fetching multiplefiles at once since that would enable using the existent functionality to be used without much change. I think we should just have the client generate the tarfile. It'll require duplicating some code, but it's not actually that much code or that complicated from what I can see. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Sep 27, 2019 at 12:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> > - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given list.
>> > pg_basebackup will then send back a list of filenames in this command. This commands will be send by each worker and that worker will be getting the said files.
>>
>> Seems reasonable, but I think you should just pass one file name and
>> use the command multiple times, once per file.
>
> I considered this approach initially, however, I adopted the current strategy to avoid multiple round trips between the server and clients and save on query processing time by issuing a single command rather than multiple ones. Further fetching multiple files at once will also aid in supporting the tar format by utilising the existing ReceiveTarFile() function and will be able to create a tarball for per tablespace per worker.
I think that sending multiple filenames on a line could save some time
when there are lots of very small files, because then the round-trip
overhead could be significant.
However, if you've got mostly big files, I think this is going to be a
loser. It'll be fine if you're able to divide the work exactly evenly,
but that's pretty hard to do, because some workers may succeed in
copying the data faster than others for a variety of reasons: some
data is in memory, some data has to be read from disk, different data
may need to be read from different disks that run at different speeds,
not all the network connections may run at the same speed. Remember
that the backup's not done until the last worker finishes, and so
there may well be a significant advantage in terms of overall speed in
putting some energy into making sure that they finish as close to each
other in time as possible.
To put that another way, the first time all the workers except one get
done while the last one still has 10GB of data to copy, somebody's
going to be unhappy.
The START_BACKUP command will now return a sorted list of files in descending order based on file sizes. This way, the larger files will be on top of the list. hence these files will be assigned to workers one by one, making it so that the larger files will be copied before other files.
Attachment
On Fri, Oct 4, 2019 at 7:02 AM Asif Rehman <asifr.rehman@gmail.com> wrote: > Based on my understanding your main concern is that the files won't be distributed fairly i.e one worker might get a bigfile and take more time while others get done early with smaller files? In this approach I have created a list of filesin descending order based on there sizes so all the big size files will come at the top. The maximum file size in PGis 1GB so if we have four workers who are picking up file from the list one by one, the worst case scenario is that oneworker gets a file of 1GB to process while others get files of smaller size. However with this approach of descendingfiles based on size and handing it out to workers one by one, there is a very high likelihood of workers gettingwork evenly. does this address your concerns? Somewhat, but I'm not sure it's good enough. There are lots of reasons why two processes that are started at the same time with the same amount of work might not finish at the same time. I'm also not particularly excited about having the server do the sorting based on file size. Seems like that ought to be the client's job, if the client needs the sorting. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
bit big, so here are very initial comments to make the review process
easier.
1) Patch seems doing lot of code shuffling, I think it would be easy
to review if you can break the clean up patch separately.
Example:
a: setup_throttle
b: include_wal_files
2) As I can see this patch basically have three major phase.
c) Testcase
I would suggest, if you can break out in three as a separate patch that
would be nice. It will benefit in reviewing the patch.
3) In your patch you are preparing the backup manifest (file which
giving the information about the data files). Robert Haas, submitted
the backup manifests patch on another thread [1], and I think we
should use that patch to get the backup manifests for parallel backup.
Further, I will continue to review patch but meanwhile if you can
break the patches - so that review process be easier.
On Thu, Oct 3, 2019 at 6:40 PM Robert Haas <robertmhaas@gmail.com> wrote:On Fri, Sep 27, 2019 at 12:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> > - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given list.
>> > pg_basebackup will then send back a list of filenames in this command. This commands will be send by each worker and that worker will be getting the said files.
>>
>> Seems reasonable, but I think you should just pass one file name and
>> use the command multiple times, once per file.
>
> I considered this approach initially, however, I adopted the current strategy to avoid multiple round trips between the server and clients and save on query processing time by issuing a single command rather than multiple ones. Further fetching multiple files at once will also aid in supporting the tar format by utilising the existing ReceiveTarFile() function and will be able to create a tarball for per tablespace per worker.
I think that sending multiple filenames on a line could save some time
when there are lots of very small files, because then the round-trip
overhead could be significant.
However, if you've got mostly big files, I think this is going to be a
loser. It'll be fine if you're able to divide the work exactly evenly,
but that's pretty hard to do, because some workers may succeed in
copying the data faster than others for a variety of reasons: some
data is in memory, some data has to be read from disk, different data
may need to be read from different disks that run at different speeds,
not all the network connections may run at the same speed. Remember
that the backup's not done until the last worker finishes, and so
there may well be a significant advantage in terms of overall speed in
putting some energy into making sure that they finish as close to each
other in time as possible.
To put that another way, the first time all the workers except one get
done while the last one still has 10GB of data to copy, somebody's
going to be unhappy.I have updated the patch (see the attached patch) to include tablespace support, tar format support and all other backup base backup options to work in parallel mode as well. As previously suggested, I have removed BASE_BACKUP [PARALLEL] and have added START_BACKUP instead to start the backup. The tar format will write multiple tar files depending upon the number of workers specified. Also made all commands (START_BACKUP/SEND_FILES_CONTENT/STOP_BACKUP) to accept the base_backup_opt_list. This way the command-line options can also be provided to these commands. Since the command-line options don't change once the backup initiates, I went this way instead of storing them in shared state.
The START_BACKUP command will now return a sorted list of files in descending order based on file sizes. This way, the larger files will be on top of the list. hence these files will be assigned to workers one by one, making it so that the larger files will be copied before other files.Based on my understanding your main concern is that the files won't be distributed fairly i.e one worker might get a big file and take more time while others get done early with smaller files? In this approach I have created a list of files in descending order based on there sizes so all the big size files will come at the top. The maximum file size in PG is 1GB so if we have four workers who are picking up file from the list one by one, the worst case scenario is that one worker gets a file of 1GB to process while others get files of smaller size. However with this approach of descending files based on size and handing it out to workers one by one, there is a very high likelihood of workers getting work evenly. does this address your concerns?Furthermore the patch also includes the regression test. As t/010_pg_basebackup.pl test-case is testing base backup comprehensively, so I have duplicated it to "t/040_pg_basebackup_parallel.pl" and added parallel option in all of its tests, to make sure parallel mode works expectantly. The one thing that differs from base backup is the file checksum reporting. In parallel mode, the total number of checksum failures are not reported correctly however it will abort the backup whenever a checksum failure occurs. This is because processes are not maintaining any shared state. I assume that it's not much important to report total number of failures vs noticing the failure and aborting.--Asif Rehman
--
Thanks Asif for the patch. I am opting this for a review. Patch is
bit big, so here are very initial comments to make the review process
easier.
1) Patch seems doing lot of code shuffling, I think it would be easy
to review if you can break the clean up patch separately.
Example:
a: setup_throttle
b: include_wal_files
2) As I can see this patch basically have three major phase.a) Introducing new commands like START_BACKUP, SEND_FILES_CONTENT andSTOP_BACKUP.b) Implementation of actual parallel backup.
c) Testcase
I would suggest, if you can break out in three as a separate patch that
would be nice. It will benefit in reviewing the patch.
3) In your patch you are preparing the backup manifest (file which
giving the information about the data files). Robert Haas, submitted
the backup manifests patch on another thread [1], and I think we
should use that patch to get the backup manifests for parallel backup.
On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.rehman@gmail.com> wrote: > Sure. Though the backup manifest patch calculates and includes the checksum of backup files and is done > while the file is being transferred to the frontend-end. The manifest file itself is copied at the > very end of the backup. In parallel backup, I need the list of filenames before file contents are transferred, in > order to divide them into multiple workers. For that, the manifest file has to be available when START_BACKUP > is called. > > That means, backup manifest should support its creation while excluding the checksum during START_BACKUP(). > I also need the directory information as well for two reasons: > > - In plain format, base path has to exist before we can write the file. we can extract the base path from the file > but doing that for all files does not seem a good idea. > - base backup does not include the content of some directories but those directories although empty, are still > expected in PGDATA. > > I can make these changes part of parallel backup (which would be on top of backup manifest patch) or > these changes can be done as part of manifest patch and then parallel can use them. > > Robert what do you suggest? I think we should probably not use backup manifests here, actually. I initially thought that would be a good idea, but after further thought it seems like it just complicates the code to no real benefit. I suggest that the START_BACKUP command just return a result set, like a query, with perhaps four columns: file name, file type ('d' for directory or 'f' for file), file size, file mtime. pg_basebackup will ignore the mtime, but some other tools might find that useful information. I wonder if we should also split START_BACKUP (which should enter non-exclusive backup mode) from GET_FILE_LIST, in case some other client program wants to use one of those but not the other. I think that's probably a good idea, but not sure. I still think that the files should be requested one at a time, not a huge long list in a single command. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> Sure. Though the backup manifest patch calculates and includes the checksum of backup files and is done
> while the file is being transferred to the frontend-end. The manifest file itself is copied at the
> very end of the backup. In parallel backup, I need the list of filenames before file contents are transferred, in
> order to divide them into multiple workers. For that, the manifest file has to be available when START_BACKUP
> is called.
>
> That means, backup manifest should support its creation while excluding the checksum during START_BACKUP().
> I also need the directory information as well for two reasons:
>
> - In plain format, base path has to exist before we can write the file. we can extract the base path from the file
> but doing that for all files does not seem a good idea.
> - base backup does not include the content of some directories but those directories although empty, are still
> expected in PGDATA.
>
> I can make these changes part of parallel backup (which would be on top of backup manifest patch) or
> these changes can be done as part of manifest patch and then parallel can use them.
>
> Robert what do you suggest?
I think we should probably not use backup manifests here, actually. I
initially thought that would be a good idea, but after further thought
it seems like it just complicates the code to no real benefit.
I
suggest that the START_BACKUP command just return a result set, like a
query, with perhaps four columns: file name, file type ('d' for
directory or 'f' for file), file size, file mtime. pg_basebackup will
ignore the mtime, but some other tools might find that useful
information.
I wonder if we should also split START_BACKUP (which should enter
non-exclusive backup mode) from GET_FILE_LIST, in case some other
client program wants to use one of those but not the other. I think
that's probably a good idea, but not sure.
I still think that the files should be requested one at a time, not a
huge long list in a single command.
On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> Sure. Though the backup manifest patch calculates and includes the checksum of backup files and is done
> while the file is being transferred to the frontend-end. The manifest file itself is copied at the
> very end of the backup. In parallel backup, I need the list of filenames before file contents are transferred, in
> order to divide them into multiple workers. For that, the manifest file has to be available when START_BACKUP
> is called.
>
> That means, backup manifest should support its creation while excluding the checksum during START_BACKUP().
> I also need the directory information as well for two reasons:
>
> - In plain format, base path has to exist before we can write the file. we can extract the base path from the file
> but doing that for all files does not seem a good idea.
> - base backup does not include the content of some directories but those directories although empty, are still
> expected in PGDATA.
>
> I can make these changes part of parallel backup (which would be on top of backup manifest patch) or
> these changes can be done as part of manifest patch and then parallel can use them.
>
> Robert what do you suggest?
I think we should probably not use backup manifests here, actually. I
initially thought that would be a good idea, but after further thought
it seems like it just complicates the code to no real benefit. I
suggest that the START_BACKUP command just return a result set, like a
query, with perhaps four columns: file name, file type ('d' for
directory or 'f' for file), file size, file mtime. pg_basebackup will
ignore the mtime, but some other tools might find that useful
information.
I wonder if we should also split START_BACKUP (which should enter
non-exclusive backup mode) from GET_FILE_LIST, in case some other
client program wants to use one of those but not the other. I think
that's probably a good idea, but not sure.
I still think that the files should be requested one at a time, not a
huge long list in a single command.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
On Mon, Oct 7, 2019 at 9:43 AM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote: > What about have an API to get the single file or list of files? We will use a single file in > our application and other tools can get the benefit of list of files. That sounds a bit speculative to me. Who is to say that anyone will find that useful? I mean, I think it's fine and good to build the functionality that we need in a way that maximizes the likelihood that other tools can reuse that functionality, and I think we should do that. But I don't think it's smart to build functionality that we don't really need in the hope that somebody else will find it useful unless we're pretty sure that they actually will. I don't see that as being the case here; YMMV. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Oct 7, 2019 at 6:05 PM Robert Haas <robertmhaas@gmail.com> wrote:On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> Sure. Though the backup manifest patch calculates and includes the checksum of backup files and is done
> while the file is being transferred to the frontend-end. The manifest file itself is copied at the
> very end of the backup. In parallel backup, I need the list of filenames before file contents are transferred, in
> order to divide them into multiple workers. For that, the manifest file has to be available when START_BACKUP
> is called.
>
> That means, backup manifest should support its creation while excluding the checksum during START_BACKUP().
> I also need the directory information as well for two reasons:
>
> - In plain format, base path has to exist before we can write the file. we can extract the base path from the file
> but doing that for all files does not seem a good idea.
> - base backup does not include the content of some directories but those directories although empty, are still
> expected in PGDATA.
>
> I can make these changes part of parallel backup (which would be on top of backup manifest patch) or
> these changes can be done as part of manifest patch and then parallel can use them.
>
> Robert what do you suggest?
I think we should probably not use backup manifests here, actually. I
initially thought that would be a good idea, but after further thought
it seems like it just complicates the code to no real benefit.Okay.I
suggest that the START_BACKUP command just return a result set, like a
query, with perhaps four columns: file name, file type ('d' for
directory or 'f' for file), file size, file mtime. pg_basebackup will
ignore the mtime, but some other tools might find that useful
information.yes current patch already returns the result set. will add the additional information.
I wonder if we should also split START_BACKUP (which should enter
non-exclusive backup mode) from GET_FILE_LIST, in case some other
client program wants to use one of those but not the other. I think
that's probably a good idea, but not sure.Currently pg_basebackup does not enter in exclusive backup mode and other tools have touse pg_start_backup() and pg_stop_backup() functions to achieve that. Since we are breakingbackup into multiple command, I believe it would be a good idea to have this option. I will includeit in next revision of this patch.
I still think that the files should be requested one at a time, not a
huge long list in a single command.sure, will make the change.
now calls pg_start_backup and returns starting WAL position, tablespace header information and content of backup label file.
Initially I was using tmp files to store the backup_label content but that turns out to be bad idea, because there can be multiple
non-exclusive backups running. The backup label information is needed by stop_backup so pg_basebackup will send it as part
Attachment
On Mon, Oct 7, 2019 at 6:35 PM Asif Rehman <asifr.rehman@gmail.com> wrote:On Mon, Oct 7, 2019 at 6:05 PM Robert Haas <robertmhaas@gmail.com> wrote:On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> Sure. Though the backup manifest patch calculates and includes the checksum of backup files and is done
> while the file is being transferred to the frontend-end. The manifest file itself is copied at the
> very end of the backup. In parallel backup, I need the list of filenames before file contents are transferred, in
> order to divide them into multiple workers. For that, the manifest file has to be available when START_BACKUP
> is called.
>
> That means, backup manifest should support its creation while excluding the checksum during START_BACKUP().
> I also need the directory information as well for two reasons:
>
> - In plain format, base path has to exist before we can write the file. we can extract the base path from the file
> but doing that for all files does not seem a good idea.
> - base backup does not include the content of some directories but those directories although empty, are still
> expected in PGDATA.
>
> I can make these changes part of parallel backup (which would be on top of backup manifest patch) or
> these changes can be done as part of manifest patch and then parallel can use them.
>
> Robert what do you suggest?
I think we should probably not use backup manifests here, actually. I
initially thought that would be a good idea, but after further thought
it seems like it just complicates the code to no real benefit.Okay.I
suggest that the START_BACKUP command just return a result set, like a
query, with perhaps four columns: file name, file type ('d' for
directory or 'f' for file), file size, file mtime. pg_basebackup will
ignore the mtime, but some other tools might find that useful
information.yes current patch already returns the result set. will add the additional information.
I wonder if we should also split START_BACKUP (which should enter
non-exclusive backup mode) from GET_FILE_LIST, in case some other
client program wants to use one of those but not the other. I think
that's probably a good idea, but not sure.Currently pg_basebackup does not enter in exclusive backup mode and other tools have touse pg_start_backup() and pg_stop_backup() functions to achieve that. Since we are breakingbackup into multiple command, I believe it would be a good idea to have this option. I will includeit in next revision of this patch.
I still think that the files should be requested one at a time, not a
huge long list in a single command.sure, will make the change.I have refactored the functionality into multiple smaller patches in order to make the review process easier. I have divided the code into backend changes and pg_basebackup changes. Thebackend replication system now supports the following commands:- START_BACKUP- SEND_FILE_LIST- SEND_FILES_CONTENT- STOP_BACKUPThe START_BACKUP will not return the list of files, instead SEND_FILE_LIST is used for that. The START_BACKUP
now calls pg_start_backup and returns starting WAL position, tablespace header information and content of backup label file.
Initially I was using tmp files to store the backup_label content but that turns out to be bad idea, because there can be multiple
non-exclusive backups running. The backup label information is needed by stop_backup so pg_basebackup will send it as partof STOP_BACKUP.The SEND_FILE_LIST will return the list of files. It will be returned as resultset having four columns (filename, type, size, mtime).The SEND_FILES_CONTENT can now return the single file or multiple files as required. There is not much change required tosupport both, so I believe it will be much useable this way if other tools want to utilise it.As per suggestion from Robert, I am currently working on making changes in pg_basebackup to fetch files one by one. However that's not complete and the attach patchis still using the old method of multi-file fetching to test the backend commands. I will send an updated patch which will contain the changes on fetching file one by one.I wanted to share the backend patch to get some feedback in the mean time.Thanks,--Asif Rehman
I quickly tried to have a look at your 0001-refactor patch.Here are some comments:1. The patch fails to compile.Sorry if I am missing something, but am not able to understand why in newfunction collectTablespaces() you have added an extra parameter NULL whilecalling sendTablespace(), it fails the compilation :+ ti->size = infotbssize ? sendTablespace(fullpath, true, NULL) : -1;gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -Wno-unused-command-line-argument -g -g -O0 -Wall -Werror -I../../../../src/include -c -o xlog.o xlog.c -MMD -MP -MF .deps/xlog.Poxlog.c:12253:59: error: too many arguments to function call, expected 2, have 3ti->size = infotbssize ? sendTablespace(fullpath, true, NULL) : -1;~~~~~~~~~~~~~~ ^~~~2. I think the patch needs to run via pg_indent. It does not follow 80 columnwidth.e.g.+void+collectTablespaces(List **tablespaces, StringInfo tblspcmapfile, bool infotbssize, bool needtblspcmapfile)+{3.The comments in re-factored code appear to be redundant. example:Following comment:/* Setup and activate network throttling, if client requested it */appears thrice in the code, before calling setup_throttle(), in the prologue ofthe function setup_throttle(), and above the if() in that function.Similarly - the comment:/* Collect information about all tablespaces */in collectTablespaces().4.In function include_wal_files() why is the parameter TimeLineID i.e. endtlineeded. I don't see it being used in the function at all. I think you can safelyget rid of it.+include_wal_files(XLogRecPtr endptr, TimeLineID endtli)
Attachment
Attached are the updated patches.
I had a quick look over these changes and they look good overall.
However, here are my few review comments I caught while glancing the patches
0002 and 0003.
--- 0002 patch
1.
Can lsn option be renamed to start-wal-location? This will be more clear too.
2.
+typedef struct
+{
+ char name[MAXPGPATH];
+ char type;
+ int32 size;
+ time_t mtime;
+} BackupFile;
I think it will be good if we keep this structure in a common place so that
the client can also use it.
3.
+ SEND_FILE_LIST,
+ SEND_FILES_CONTENT,
Can above two commands renamed to SEND_BACKUP_MANIFEST and SEND_BACKUP_FILE
respectively?
The reason behind the first name change is, we are not getting only file lists
here instead we are getting a few more details with that too. And for others,
it will be inline with START_BACKUP/STOP_BACKUP/SEND_BACKUP_MANIFEST.
4.
Typos:
non-exlusive => non-exclusive
retured => returned
optionaly => optionally
nessery => necessary
totoal => total
--- 0003 patch
1.
+static int
+simple_list_length(SimpleStringList *list)
+{
+ int len = 0;
+ SimpleStringListCell *cell;
+
+ for (cell = list->head; cell; cell = cell->next, len++)
+ ;
+
+ return len;
+}
I think it will be good if it goes to simple_list.c. That will help in other
usages as well.
2.
Please revert these unnecessary changes:
@@ -1475,6 +1575,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
*/
snprintf(filename, sizeof(filename), "%s/%s", current_path,
copybuf);
+
if (filename[strlen(filename) - 1] == '/')
{
/*
@@ -1528,8 +1622,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
* can map them too.)
*/
filename[strlen(filename) - 1] = '\0'; /* Remove trailing slash */
-
mapped_tblspc_path = get_tablespace_mapping(©buf[157]);
+
if (symlink(mapped_tblspc_path, filename) != 0)
{
pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m",
3.
Typos:
retrive => retrieve
takecare => take care
tablespae => tablespace
4.
ParallelBackupEnd() function does not do anything for parallelism. Will it be
better to just rename it as EndBackup()?
5.
To pass a tablespace path to the server in SEND_FILES_CONTENT, you are reusing
a LABEL option, that seems odd. How about adding a new option for that?
6.
It will be good if we have some comments explaining what the function is
actually doing in its prologue. For functions like:
GetBackupFilesList()
ReceiveFiles()
create_workers_and_fetch()
Thanks
Thanks,--Asif Rehman
--
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
On Thu, Oct 17, 2019 at 10:51 AM Asif Rehman <asifr.rehman@gmail.com> wrote:Attached are the updated patches.
I had a quick look over these changes and they look good overall.
However, here are my few review comments I caught while glancing the patches
0002 and 0003.
--- 0002 patch
1.
Can lsn option be renamed to start-wal-location? This will be more clear too.
2.
+typedef struct
+{
+ char name[MAXPGPATH];
+ char type;
+ int32 size;
+ time_t mtime;
+} BackupFile;
I think it will be good if we keep this structure in a common place so that
the client can also use it.
3.
+ SEND_FILE_LIST,
+ SEND_FILES_CONTENT,
Can above two commands renamed to SEND_BACKUP_MANIFEST and SEND_BACKUP_FILE
respectively?
The reason behind the first name change is, we are not getting only file lists
here instead we are getting a few more details with that too. And for others,
it will be inline with START_BACKUP/STOP_BACKUP/SEND_BACKUP_MANIFEST.
4.
Typos:
non-exlusive => non-exclusive
retured => returned
optionaly => optionally
nessery => necessary
totoal => total
--- 0003 patch
1.
+static int
+simple_list_length(SimpleStringList *list)
+{
+ int len = 0;
+ SimpleStringListCell *cell;
+
+ for (cell = list->head; cell; cell = cell->next, len++)
+ ;
+
+ return len;
+}
I think it will be good if it goes to simple_list.c. That will help in other
usages as well.
2.
Please revert these unnecessary changes:
@@ -1475,6 +1575,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
*/
snprintf(filename, sizeof(filename), "%s/%s", current_path,
copybuf);
+
if (filename[strlen(filename) - 1] == '/')
{
/*
@@ -1528,8 +1622,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
* can map them too.)
*/
filename[strlen(filename) - 1] = '\0'; /* Remove trailing slash */
-
mapped_tblspc_path = get_tablespace_mapping(©buf[157]);
+
if (symlink(mapped_tblspc_path, filename) != 0)
{
pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m",
3.
Typos:
retrive => retrieve
takecare => take care
tablespae => tablespace
4.
ParallelBackupEnd() function does not do anything for parallelism. Will it be
better to just rename it as EndBackup()?
5.
To pass a tablespace path to the server in SEND_FILES_CONTENT, you are reusing
a LABEL option, that seems odd. How about adding a new option for that?
6.
It will be good if we have some comments explaining what the function is
actually doing in its prologue. For functions like:
GetBackupFilesList()
ReceiveFiles()
create_workers_and_fetch()
Thanks
Thanks,--Asif Rehman
--Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
I had a detailed discussion with Robert Haas at PostgreConf Europe about parallel backup.
We discussed the current state of the patch and what needs to be done to get the patch committed.
- The current patch uses a process to implement parallelism. There are many
reasons we need to use threads instead of processes. To start with, as this is a client utility it makes
more sense to use threads. The data needs to be shared amongst different threads and the main process,
network to identify which are the real bottlenecks. In general, we agreed that we could start with fetching
- There is also an ongoing debate in this thread that we should have one single tar file for all files or one
reduce the management of multiple files, but in case of one file per thread, we end up with many tar
files. Therefore we need to have one master thread which is responsible for writing on tar file and all
the other threads will receive the data from the network and stream to the master thread. This also
supports the idea of using a thread-based model rather than a process-based approach because it
requires too much data sharing between processes. If we cannot achieve this, then we can disable the
- In the case of data sharing, we need to try to avoid unnecessary locking and more suitable algorithm to
solve the reader-writer problem is required.
On Fri, Oct 18, 2019 at 4:12 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:On Thu, Oct 17, 2019 at 10:51 AM Asif Rehman <asifr.rehman@gmail.com> wrote:Attached are the updated patches.
I had a quick look over these changes and they look good overall.
However, here are my few review comments I caught while glancing the patches
0002 and 0003.
--- 0002 patch
1.
Can lsn option be renamed to start-wal-location? This will be more clear too.
2.
+typedef struct
+{
+ char name[MAXPGPATH];
+ char type;
+ int32 size;
+ time_t mtime;
+} BackupFile;
I think it will be good if we keep this structure in a common place so that
the client can also use it.
3.
+ SEND_FILE_LIST,
+ SEND_FILES_CONTENT,
Can above two commands renamed to SEND_BACKUP_MANIFEST and SEND_BACKUP_FILE
respectively?
The reason behind the first name change is, we are not getting only file lists
here instead we are getting a few more details with that too. And for others,
it will be inline with START_BACKUP/STOP_BACKUP/SEND_BACKUP_MANIFEST.
4.
Typos:
non-exlusive => non-exclusive
retured => returned
optionaly => optionally
nessery => necessary
totoal => total
--- 0003 patch
1.
+static int
+simple_list_length(SimpleStringList *list)
+{
+ int len = 0;
+ SimpleStringListCell *cell;
+
+ for (cell = list->head; cell; cell = cell->next, len++)
+ ;
+
+ return len;
+}
I think it will be good if it goes to simple_list.c. That will help in other
usages as well.
2.
Please revert these unnecessary changes:
@@ -1475,6 +1575,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
*/
snprintf(filename, sizeof(filename), "%s/%s", current_path,
copybuf);
+
if (filename[strlen(filename) - 1] == '/')
{
/*
@@ -1528,8 +1622,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
* can map them too.)
*/
filename[strlen(filename) - 1] = '\0'; /* Remove trailing slash */
-
mapped_tblspc_path = get_tablespace_mapping(©buf[157]);
+
if (symlink(mapped_tblspc_path, filename) != 0)
{
pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m",
3.
Typos:
retrive => retrieve
takecare => take care
tablespae => tablespace
4.
ParallelBackupEnd() function does not do anything for parallelism. Will it be
better to just rename it as EndBackup()?
5.
To pass a tablespace path to the server in SEND_FILES_CONTENT, you are reusing
a LABEL option, that seems odd. How about adding a new option for that?
6.
It will be good if we have some comments explaining what the function is
actually doing in its prologue. For functions like:
GetBackupFilesList()
ReceiveFiles()
create_workers_and_fetch()
Thanks
Thanks,--Asif Rehman
--Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
I had a detailed discussion with Robert Haas at PostgreConf Europe about parallel backup.
We discussed the current state of the patch and what needs to be done to get the patch committed.
- The current patch uses a process to implement parallelism. There are many
reasons we need to use threads instead of processes. To start with, as this is a client utility it makes
more sense to use threads. The data needs to be shared amongst different threads and the main process,handling that is simpler as compared to interprocess communication.
- Fetching a single file or multiple files was also discussed. We concluded in our discussion that weneed to benchmark to see if disk I/O is a bottleneck or not and if parallel writing gives usany benefit. This benchmark needs to be done on different hardware and different
network to identify which are the real bottlenecks. In general, we agreed that we could start with fetchingone file at a time but that will be revisited after the benchmarks are done.
- There is also an ongoing debate in this thread that we should have one single tar file for all files or oneTAR file per thread. I really want to have a single tar file because the main purpose of the TAR file is to
reduce the management of multiple files, but in case of one file per thread, we end up with many tar
files. Therefore we need to have one master thread which is responsible for writing on tar file and all
the other threads will receive the data from the network and stream to the master thread. This also
supports the idea of using a thread-based model rather than a process-based approach because it
requires too much data sharing between processes. If we cannot achieve this, then we can disable theTAR option for parallel backup in the first version.
- In the case of data sharing, we need to try to avoid unnecessary locking and more suitable algorithm to
solve the reader-writer problem is required.--Ibrar Ahmed
On Thu, Oct 24, 2019 at 3:21 PM Ibrar Ahmed <ibrar.ahmad@gmail.com> wrote:On Fri, Oct 18, 2019 at 4:12 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:On Thu, Oct 17, 2019 at 10:51 AM Asif Rehman <asifr.rehman@gmail.com> wrote:Attached are the updated patches.
I had a quick look over these changes and they look good overall.
However, here are my few review comments I caught while glancing the patches
0002 and 0003.
--- 0002 patch
1.
Can lsn option be renamed to start-wal-location? This will be more clear too.
2.
+typedef struct
+{
+ char name[MAXPGPATH];
+ char type;
+ int32 size;
+ time_t mtime;
+} BackupFile;
I think it will be good if we keep this structure in a common place so that
the client can also use it.
3.
+ SEND_FILE_LIST,
+ SEND_FILES_CONTENT,
Can above two commands renamed to SEND_BACKUP_MANIFEST and SEND_BACKUP_FILE
respectively?
The reason behind the first name change is, we are not getting only file lists
here instead we are getting a few more details with that too. And for others,
it will be inline with START_BACKUP/STOP_BACKUP/SEND_BACKUP_MANIFEST.
4.
Typos:
non-exlusive => non-exclusive
retured => returned
optionaly => optionally
nessery => necessary
totoal => total
--- 0003 patch
1.
+static int
+simple_list_length(SimpleStringList *list)
+{
+ int len = 0;
+ SimpleStringListCell *cell;
+
+ for (cell = list->head; cell; cell = cell->next, len++)
+ ;
+
+ return len;
+}
I think it will be good if it goes to simple_list.c. That will help in other
usages as well.
2.
Please revert these unnecessary changes:
@@ -1475,6 +1575,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
*/
snprintf(filename, sizeof(filename), "%s/%s", current_path,
copybuf);
+
if (filename[strlen(filename) - 1] == '/')
{
/*
@@ -1528,8 +1622,8 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
* can map them too.)
*/
filename[strlen(filename) - 1] = '\0'; /* Remove trailing slash */
-
mapped_tblspc_path = get_tablespace_mapping(©buf[157]);
+
if (symlink(mapped_tblspc_path, filename) != 0)
{
pg_log_error("could not create symbolic link from \"%s\" to \"%s\": %m",
3.
Typos:
retrive => retrieve
takecare => take care
tablespae => tablespace
4.
ParallelBackupEnd() function does not do anything for parallelism. Will it be
better to just rename it as EndBackup()?
5.
To pass a tablespace path to the server in SEND_FILES_CONTENT, you are reusing
a LABEL option, that seems odd. How about adding a new option for that?
6.
It will be good if we have some comments explaining what the function is
actually doing in its prologue. For functions like:
GetBackupFilesList()
ReceiveFiles()
create_workers_and_fetch()
Thanks
Thanks,--Asif Rehman
--Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
I had a detailed discussion with Robert Haas at PostgreConf Europe about parallel backup.
We discussed the current state of the patch and what needs to be done to get the patch committed.
- The current patch uses a process to implement parallelism. There are many
reasons we need to use threads instead of processes. To start with, as this is a client utility it makes
more sense to use threads. The data needs to be shared amongst different threads and the main process,handling that is simpler as compared to interprocess communication.Yes I agree. I have already converted the code to use threads instead of processes. This avoids the overheadof interprocess communication.With a single file fetching strategy, this requires communication between competing threads/processes. To handlethat in a multiprocess application, it requires IPC. The current approach of multiple threads instead of processesavoids this overhead.- Fetching a single file or multiple files was also discussed. We concluded in our discussion that weneed to benchmark to see if disk I/O is a bottleneck or not and if parallel writing gives usany benefit. This benchmark needs to be done on different hardware and different
network to identify which are the real bottlenecks. In general, we agreed that we could start with fetchingone file at a time but that will be revisited after the benchmarks are done.I'll share the updated patch in the next couple of days. After that, I'll work on benchmarking that indifferent environments that I have.
- There is also an ongoing debate in this thread that we should have one single tar file for all files or oneTAR file per thread. I really want to have a single tar file because the main purpose of the TAR file is to
reduce the management of multiple files, but in case of one file per thread, we end up with many tar
files. Therefore we need to have one master thread which is responsible for writing on tar file and all
the other threads will receive the data from the network and stream to the master thread. This also
supports the idea of using a thread-based model rather than a process-based approach because it
requires too much data sharing between processes. If we cannot achieve this, then we can disable theTAR option for parallel backup in the first version.I am in favour of disabling the tar format for the first version of parallel backup.
- In the case of data sharing, we need to try to avoid unnecessary locking and more suitable algorithm to
solve the reader-writer problem is required.--Ibrar Ahmed--Asif Rehman
Attachment
On Mon, Oct 28, 2019 at 10:03 AM Asif Rehman <asifr.rehman@gmail.com> wrote: > I have updated the patch to include the changes suggested by Jeevan. This patch also implements the thread workers insteadof > processes and fetches a single file at a time. The tar format has been disabled for first version of parallel backup. Looking at 0001-0003: It's not clear to me what the purpose of the start WAL location is supposed to be. As far as I can see, SendBackupFiles() stores it in a variable which is then used for exactly nothing, and nothing else uses it. It seems like that would be part of a potential incremental backup feature, but I don't see what it's got to do with parallel full backup. The tablespace_path option appears entirely unused, and I don't know why that should be necessary here, either. STORE_BACKUPFILE() seems like maybe it should be a function rather than a macro, and also probably be renamed, because it doesn't store files and the argument's not necessarily a file. SendBackupManifest() does not send a backup manifest in the sense contemplated by the email thread on that subject. It sends a file list. That seems like the right idea - IMHO, anyway - but you need to do a thorough renaming. I think it would be fine to decide that this facility won't support exclusive-mode backup. I don't think much of having both sendDir() and sendDir_(). The latter name is inconsistent with any naming convention we have, and there seems to be no reason not to just add an argument to sendDir() and change the callers. I think we should rename - perhaps as a preparatory patch - the sizeonly flag to dryrun, or something like that. The resource cleanup does not look right. You've included calls to PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, 0) in both StartBackup() and StopBackup(), but what happens if there is an error or even a clean shutdown of the connection in between? I think that there needs to be some change here to ensure that a walsender will always call base_backup_cleanup() when it exits; I think that'd probably remove the need for any PG_ENSURE_ERROR_CLEANUP calls at all, including ones we have already. This might also be something that could be done as a separate, prepatory refactoring patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Oct 28, 2019 at 10:03 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> I have updated the patch to include the changes suggested by Jeevan. This patch also implements the thread workers instead of
> processes and fetches a single file at a time. The tar format has been disabled for first version of parallel backup.
Looking at 0001-0003:
It's not clear to me what the purpose of the start WAL location is
supposed to be. As far as I can see, SendBackupFiles() stores it in a
variable which is then used for exactly nothing, and nothing else uses
it. It seems like that would be part of a potential incremental
backup feature, but I don't see what it's got to do with parallel full
backup.
SendBackupFiles() is using sendFIle we have to set a valid WAL location.
The tablespace_path option appears entirely unused, and I don't know
why that should be necessary here, either.
STORE_BACKUPFILE() seems like maybe it should be a function rather
than a macro, and also probably be renamed, because it doesn't store
files and the argument's not necessarily a file.
SendBackupManifest() does not send a backup manifest in the sense
contemplated by the email thread on that subject. It sends a file
list. That seems like the right idea - IMHO, anyway - but you need to
do a thorough renaming.
- Starts the backup process
SEND_BACKUP_FILELIST (Instead of SEND_BACKUP_MANIFEST)
- Sends the list of all files (along with file information such as filename, file type (directory/file/link),
file size and file mtime for each file) to be backed up.
SEND_BACKUP_FILES
- Sends one or more files to the client.
STOP_BACKUP
- Stops the backup process.
I'll update the function names accordingly after your confirmation. Of course, suggestions for
better names are welcome.
I think it would be fine to decide that this facility won't support
exclusive-mode backup.
I don't think much of having both sendDir() and sendDir_(). The latter
name is inconsistent with any naming convention we have, and there
seems to be no reason not to just add an argument to sendDir() and
change the callers.
I think we should rename - perhaps as a preparatory patch - the
sizeonly flag to dryrun, or something like that.
The resource cleanup does not look right. You've included calls to
PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, 0) in both StartBackup()
and StopBackup(), but what happens if there is an error or even a
clean shutdown of the connection in between? I think that there needs
to be some change here to ensure that a walsender will always call
base_backup_cleanup() when it exits; I think that'd probably remove
the need for any PG_ENSURE_ERROR_CLEANUP calls at all, including ones
we have already. This might also be something that could be done as a
separate, prepatory refactoring patch.
calls and replace it with before_shmem_exit handler. This way whenever backend process exits,
- otherwise in case of a clean shutdown (after calling do_pg_stop_backup) then base_backup_cleanup
On Mon, Oct 28, 2019 at 8:29 PM Robert Haas <robertmhaas@gmail.com> wrote:On Mon, Oct 28, 2019 at 10:03 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> I have updated the patch to include the changes suggested by Jeevan. This patch also implements the thread workers instead of
> processes and fetches a single file at a time. The tar format has been disabled for first version of parallel backup.
Looking at 0001-0003:
It's not clear to me what the purpose of the start WAL location is
supposed to be. As far as I can see, SendBackupFiles() stores it in a
variable which is then used for exactly nothing, and nothing else uses
it. It seems like that would be part of a potential incremental
backup feature, but I don't see what it's got to do with parallel full
backup.'startptr' is used by sendFile() during checksum verification. Since
SendBackupFiles() is using sendFIle we have to set a valid WAL location.
The tablespace_path option appears entirely unused, and I don't know
why that should be necessary here, either.This is to calculate the basepathlen. We need to exclude the tablespace location (orbase path) from the filename before it is sent to the client with sendFile call. I addedthis option primarily to avoid performing string manipulation on filename to extract thetablespace location and then calculate the basepathlen.Alternatively we can do it by extracting the base path from the received filename. Whatdo you suggest?
STORE_BACKUPFILE() seems like maybe it should be a function rather
than a macro, and also probably be renamed, because it doesn't store
files and the argument's not necessarily a file.Sure.
SendBackupManifest() does not send a backup manifest in the sense
contemplated by the email thread on that subject. It sends a file
list. That seems like the right idea - IMHO, anyway - but you need to
do a thorough renaming.I'm considering the following command names:START_BACKUP
- Starts the backup process
SEND_BACKUP_FILELIST (Instead of SEND_BACKUP_MANIFEST)
- Sends the list of all files (along with file information such as filename, file type (directory/file/link),
file size and file mtime for each file) to be backed up.
SEND_BACKUP_FILES
- Sends one or more files to the client.
STOP_BACKUP
- Stops the backup process.
I'll update the function names accordingly after your confirmation. Of course, suggestions for
better names are welcome.
I think it would be fine to decide that this facility won't support
exclusive-mode backup.Sure. Will drop this patch.
I don't think much of having both sendDir() and sendDir_(). The latter
name is inconsistent with any naming convention we have, and there
seems to be no reason not to just add an argument to sendDir() and
change the callers.
I think we should rename - perhaps as a preparatory patch - the
sizeonly flag to dryrun, or something like that.Sure, will take care of it.
The resource cleanup does not look right. You've included calls to
PG_ENSURE_ERROR_CLEANUP(base_backup_cleanup, 0) in both StartBackup()
and StopBackup(), but what happens if there is an error or even a
clean shutdown of the connection in between? I think that there needsto be some change here to ensure that a walsender will always call
base_backup_cleanup() when it exits; I think that'd probably remove
the need for any PG_ENSURE_ERROR_CLEANUP calls at all, including ones
we have already. This might also be something that could be done as a
separate, prepatory refactoring patch.You're right. I didn't handle this case properly. I will removed PG_ENSURE_ERROR_CLEANUP
calls and replace it with before_shmem_exit handler. This way whenever backend process exits,base_backup_cleanup will be called:- If it exists before calling the do_pg_stop_backup, base_backup_cleanup will take care of cleanup.
- otherwise in case of a clean shutdown (after calling do_pg_stop_backup) then base_backup_cleanupwill simply return without doing anything.
Attachment
- 0002-Rename-sizeonly-to-dryrun-for-few-functions-in-baseb_v4.patch
- 0001-remove-PG_ENSURE_ERROR_CLEANUP-macro-from-basebackup_v4.patch
- 0005-pg_basebackup-changes-for-parallel-backup_v4.patch
- 0004-backend-changes-for-parallel-backup_v4.patch
- 0003-Refactor-some-basebackup-code-to-increase-reusabilit_v4.patch
- 0006-parallel-backup-testcase_v4.patch
On Wed, Oct 30, 2019 at 10:16 AM Asif Rehman <asifr.rehman@gmail.com> wrote: > 'startptr' is used by sendFile() during checksum verification. Since > SendBackupFiles() is using sendFIle we have to set a valid WAL location. Ugh, global variables. Why are START_BACKUP, SEND_BACKUP_FILELIST, SEND_BACKUP_FILES, and STOP_BACKUP all using the same base_backup_opt_list production as BASE_BACKUP? Presumably most of those options are not applicable to most of those commands, and the productions should therefore be separated. You should add docs, too. I wouldn't have to guess what some of this stuff was for if you wrote documentation explaining what this stuff was for. :-) >> The tablespace_path option appears entirely unused, and I don't know >> why that should be necessary here, either. > > This is to calculate the basepathlen. We need to exclude the tablespace location (or > base path) from the filename before it is sent to the client with sendFile call. I added > this option primarily to avoid performing string manipulation on filename to extract the > tablespace location and then calculate the basepathlen. > > Alternatively we can do it by extracting the base path from the received filename. What > do you suggest? I don't think the server needs any information from the client in order to be able to exclude the tablespace location from the pathname. Whatever it needs to know, it should be able to figure out, just as it would in a non-parallel backup. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Oct 30, 2019 at 10:16 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> 'startptr' is used by sendFile() during checksum verification. Since
> SendBackupFiles() is using sendFIle we have to set a valid WAL location.
Ugh, global variables.
Why are START_BACKUP, SEND_BACKUP_FILELIST, SEND_BACKUP_FILES, and
STOP_BACKUP all using the same base_backup_opt_list production as
BASE_BACKUP? Presumably most of those options are not applicable to
most of those commands, and the productions should therefore be
separated.
You should add docs, too. I wouldn't have to guess what some of this
stuff was for if you wrote documentation explaining what this stuff
was for. :-)
>> The tablespace_path option appears entirely unused, and I don't know
>> why that should be necessary here, either.
>
> This is to calculate the basepathlen. We need to exclude the tablespace location (or
> base path) from the filename before it is sent to the client with sendFile call. I added
> this option primarily to avoid performing string manipulation on filename to extract the
> tablespace location and then calculate the basepathlen.
>
> Alternatively we can do it by extracting the base path from the received filename. What
> do you suggest?
I don't think the server needs any information from the client in
order to be able to exclude the tablespace location from the pathname.
Whatever it needs to know, it should be able to figure out, just as it
would in a non-parallel backup.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment
On Fri, Nov 1, 2019 at 8:53 PM Robert Haas <robertmhaas@gmail.com> wrote:On Wed, Oct 30, 2019 at 10:16 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> 'startptr' is used by sendFile() during checksum verification. Since
> SendBackupFiles() is using sendFIle we have to set a valid WAL location.
Ugh, global variables.
Why are START_BACKUP, SEND_BACKUP_FILELIST, SEND_BACKUP_FILES, and
STOP_BACKUP all using the same base_backup_opt_list production as
BASE_BACKUP? Presumably most of those options are not applicable to
most of those commands, and the productions should therefore be
separated.Are you expecting something like the attached patch? Basically I have reorganised the grammarrules so each command can have the options required by it.I was feeling a bit reluctant for this change because it may add some unwanted grammar rules inthe replication grammar. Since these commands are using the same options as base backup, maybe we could throw error inside the relevant functions on unwanted options?
You should add docs, too. I wouldn't have to guess what some of this
stuff was for if you wrote documentation explaining what this stuff
was for. :-)Yes I will add it in the next patch.
>> The tablespace_path option appears entirely unused, and I don't know
>> why that should be necessary here, either.
>
> This is to calculate the basepathlen. We need to exclude the tablespace location (or
> base path) from the filename before it is sent to the client with sendFile call. I added
> this option primarily to avoid performing string manipulation on filename to extract the
> tablespace location and then calculate the basepathlen.
>
> Alternatively we can do it by extracting the base path from the received filename. What
> do you suggest?
I don't think the server needs any information from the client in
order to be able to exclude the tablespace location from the pathname.
Whatever it needs to know, it should be able to figure out, just as it
would in a non-parallel backup.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
- STOP_BACKUP [ LABEL 'label' ] [ WAL ] [ NOWAIT ]
Attachment
- 0004-backend-changes-for-parallel-backup.patch
- 0005-pg_basebackup-changes-for-parallel-backup.patch
- 0001-remove-PG_ENSURE_ERROR_CLEANUP-macro-from-basebackup.patch
- 0002-Rename-sizeonly-to-dryrun-for-few-functions-in-baseb.patch
- 0003-Refactor-some-basebackup-code-to-increase-reusabilit.patch
- 0006-parallel-backup-testcase.patch
- 0007-parallel-backup-documentation.patch
- SEND_BACKUP_FILES ( 'FILE' [, ...] ) [ MAX_RATE rate ] [ NOVERIFY_CHECKSUMS ] [ START_WAL_LOCATION ]- SEND_BACKUP_FILELISTOn Mon, Nov 4, 2019 at 6:08 PM Asif Rehman <asifr.rehman@gmail.com> wrote:On Fri, Nov 1, 2019 at 8:53 PM Robert Haas <robertmhaas@gmail.com> wrote:On Wed, Oct 30, 2019 at 10:16 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> 'startptr' is used by sendFile() during checksum verification. Since
> SendBackupFiles() is using sendFIle we have to set a valid WAL location.
Ugh, global variables.
Why are START_BACKUP, SEND_BACKUP_FILELIST, SEND_BACKUP_FILES, and
STOP_BACKUP all using the same base_backup_opt_list production as
BASE_BACKUP? Presumably most of those options are not applicable to
most of those commands, and the productions should therefore be
separated.Are you expecting something like the attached patch? Basically I have reorganised the grammarrules so each command can have the options required by it.I was feeling a bit reluctant for this change because it may add some unwanted grammar rules inthe replication grammar. Since these commands are using the same options as base backup, maybe we could throw error inside the relevant functions on unwanted options?
You should add docs, too. I wouldn't have to guess what some of this
stuff was for if you wrote documentation explaining what this stuff
was for. :-)Yes I will add it in the next patch.
>> The tablespace_path option appears entirely unused, and I don't know
>> why that should be necessary here, either.
>
> This is to calculate the basepathlen. We need to exclude the tablespace location (or
> base path) from the filename before it is sent to the client with sendFile call. I added
> this option primarily to avoid performing string manipulation on filename to extract the
> tablespace location and then calculate the basepathlen.
>
> Alternatively we can do it by extracting the base path from the received filename. What
> do you suggest?
I don't think the server needs any information from the client in
order to be able to exclude the tablespace location from the pathname.
Whatever it needs to know, it should be able to figure out, just as it
would in a non-parallel backup.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyI have updated the replication grammar with some new rules to differentiate the options productionfor base backup and newly added commands.I have also created a separate patch to include the documentation changes. The current syntax is as below:- START_BACKUP [ LABEL 'label' ] [ PROGRESS ] [ FAST ] [ TABLESPACE_MAP ]
- STOP_BACKUP [ LABEL 'label' ] [ WAL ] [ NOWAIT ]
Attachment
- 0001-remove-PG_ENSURE_ERROR_CLEANUP-macro-from-basebackup_v6.patch
- 0003-Refactor-some-basebackup-code-to-increase-reusabilit_v6.patch
- 0005-pg_basebackup-changes-for-parallel-backup_v6.patch
- 0002-Rename-sizeonly-to-dryrun-for-few-functions-in-baseb_v6.patch
- 0004-backend-changes-for-parallel-backup_v6.patch
- 0006-parallel-backup-testcase_v6.patch
- 0007-parallel-backup-documentation_v6.patch
Sorry, I sent the wrong patches. Please see the correct version of the patches (_v6).
1.
+ XLogRecPtr wal_location;
Looking at the other field names in basebackup_options structure, let's use
wallocation instead. Or better startwallocation to be precise.
2.
+ int32 size;
Should we use size_t here?
3.
I am still not sure why we need SEND_BACKUP_FILELIST as a separate command.
Can't we return the file list with START_BACKUP itself?
4.
+ else if (
+#ifndef WIN32
+ S_ISLNK(statbuf.st_mode)
+#else
+ pgwin32_is_junction(pathbuf)
+#endif
+ )
+ {
+ /*
+ * If symlink, write it as a directory. file symlinks only allowed
+ * in pg_tblspc
+ */
+ statbuf.st_mode = S_IFDIR | pg_dir_create_mode;
+ _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf, false);
+ }
In normal backup mode, we skip the special file which is not a regular file or
a directory or a symlink inside pg_tblspc. But in your patch, above code,
treats it as a directory. Should parallel backup too skip such special files?
5.
Please keep header file inclusions in alphabetical order in basebackup.c and
pg_basebackup.c
6.
+ /*
+ * build query in form of: SEND_BACKUP_FILES ('base/1/1245/32683',
+ * 'base/1/1245/32683', ...) [options]
+ */
Please update these comments as we fetch one file at a time.
7.
+backup_file:
+ SCONST { $$ = (Node *) makeString($1); }
+ ;
+
Instead of having this rule with only one constant terminal, we can use
SCONST directly in backup_files_list. However, I don't see any issue with
this approach either, just trying to reduce the rules.
8.
Please indent code within 80 char limit at all applicable places.
9.
Please fix following typos:
identifing => identifying
optionaly => optionally
structre => structure
progrsss => progress
Retrive => Retrieve
direcotries => directories
=====
The other mail thread related to backup manifest [1], is creating a
backup_manifest file and sends that to the client which has optional
checksum and other details including filename, file size, mtime, etc.
There is a patch on the same thread which is then validating the backup too.
Since this patch too gets a file list from the server and has similar
details (except checksum), can somehow parallel backup use the backup-manifest
infrastructure from that patch?
When the parallel backup is in use, will there be a backup_manifest file
created too? I am just visualizing what will be the scenario when both these
features are checked-in.
[1] https://www.postgresql.org/message-id/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
--Asif Rehman
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
On Wed, Nov 27, 2019 at 3:38 AM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote: > I am still not sure why we need SEND_BACKUP_FILELIST as a separate command. > Can't we return the file list with START_BACKUP itself? I had the same thought, but I think it's better to keep them separate. Somebody might want to use the SEND_BACKUP_FILELIST command for something other than a backup (I actually think it should be called just SEND_FILE_LIST). Somebody might want to start a backup without getting a file list because they're going to copy the files at the FS level. Somebody might want to get a list of files to process after somebody else has started the backup on another connection. Or maybe nobody wants to do any of those things, but it doesn't seem to cost us much of anything to split the commands, so I think we should. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Nov 13, 2019 at 7:04 PM Asif Rehman <asifr.rehman@gmail.com> wrote:Sorry, I sent the wrong patches. Please see the correct version of the patches (_v6).Review comments on these patches:
1.
+ XLogRecPtr wal_location;
Looking at the other field names in basebackup_options structure, let's use
wallocation instead. Or better startwallocation to be precise.
2.
+ int32 size;
Should we use size_t here?
3.
I am still not sure why we need SEND_BACKUP_FILELIST as a separate command.
Can't we return the file list with START_BACKUP itself?
4.
+ else if (
+#ifndef WIN32
+ S_ISLNK(statbuf.st_mode)
+#else
+ pgwin32_is_junction(pathbuf)
+#endif
+ )
+ {
+ /*
+ * If symlink, write it as a directory. file symlinks only allowed
+ * in pg_tblspc
+ */
+ statbuf.st_mode = S_IFDIR | pg_dir_create_mode;
+ _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf, false);
+ }
In normal backup mode, we skip the special file which is not a regular file or
a directory or a symlink inside pg_tblspc. But in your patch, above code,
treats it as a directory. Should parallel backup too skip such special files?
I think that this function should not take care of any of the above. Instead, it should
5.
Please keep header file inclusions in alphabetical order in basebackup.c and
pg_basebackup.c
6.
+ /*
+ * build query in form of: SEND_BACKUP_FILES ('base/1/1245/32683',
+ * 'base/1/1245/32683', ...) [options]
+ */
Please update these comments as we fetch one file at a time.
7.
+backup_file:
+ SCONST { $$ = (Node *) makeString($1); }
+ ;
+
Instead of having this rule with only one constant terminal, we can use
SCONST directly in backup_files_list. However, I don't see any issue with
this approach either, just trying to reduce the rules.
8.
Please indent code within 80 char limit at all applicable places.
9.
Please fix following typos:
identifing => identifying
optionaly => optionally
structre => structure
progrsss => progress
Retrive => Retrieve
direcotries => directories
=====
The other mail thread related to backup manifest [1], is creating a
backup_manifest file and sends that to the client which has optional
checksum and other details including filename, file size, mtime, etc.
There is a patch on the same thread which is then validating the backup too.
Since this patch too gets a file list from the server and has similar
details (except checksum), can somehow parallel backup use the backup-manifest
infrastructure from that patch?
When the parallel backup is in use, will there be a backup_manifest file
created too? I am just visualizing what will be the scenario when both these
features are checked-in.
On Wed, Nov 27, 2019 at 3:38 AM Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
> I am still not sure why we need SEND_BACKUP_FILELIST as a separate command.
> Can't we return the file list with START_BACKUP itself?
I had the same thought, but I think it's better to keep them separate.
Somebody might want to use the SEND_BACKUP_FILELIST command for
something other than a backup (I actually think it should be called
just SEND_FILE_LIST)
- SEND_BACKUP_FILELIST --> SEND_FILE_LIST
. Somebody might want to start a backup without
getting a file list because they're going to copy the files at the FS
level. Somebody might want to get a list of files to process after
somebody else has started the backup on another connection. Or maybe
nobody wants to do any of those things, but it doesn't seem to cost us
much of anything to split the commands, so I think we should.
On Thu, Nov 28, 2019 at 12:57 AM Robert Haas <robertmhaas@gmail.com> wrote:On Wed, Nov 27, 2019 at 3:38 AM Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
> I am still not sure why we need SEND_BACKUP_FILELIST as a separate command.
> Can't we return the file list with START_BACKUP itself?
I had the same thought, but I think it's better to keep them separate.
Somebody might want to use the SEND_BACKUP_FILELIST command for
something other than a backup (I actually think it should be called
just SEND_FILE_LIST)Sure. Thanks for the recommendation. To keep the function names in sync, I intend to do following thefollowing renamings:- SEND_BACKUP_FILES --> SEND_FILES
- SEND_BACKUP_FILELIST --> SEND_FILE_LIST. Somebody might want to start a backup without
getting a file list because they're going to copy the files at the FS
level. Somebody might want to get a list of files to process after
somebody else has started the backup on another connection. Or maybe
nobody wants to do any of those things, but it doesn't seem to cost us
much of anything to split the commands, so I think we should.+1
Attachment
- 0007-parallel-backup-documentation_v7.patch
- 0004-Parallel-Backup-Backend-Replication-commands_v7.patch
- 0003-Refactor-some-basebackup-code-to-increase-reusabilit_v7.patch
- 0006-parallel-backup-testcase_v7.patch
- 0005-Parallel-Backup-pg_basebackup_v7.patch
- 0002-Rename-sizeonly-to-dryrun-for-few-functions-in-baseb_v7.patch
- 0001-removed-PG_ENSURE_ERROR_CLEANUP-macro-from-basebacku_v7.patch
On Thu, Dec 12, 2019 at 10:20 AM Asif Rehman <asifr.rehman@gmail.com> wrote: > I have updated the patches (v7 attached) and have taken care of all issues pointed by Jeevan, additionally > ran the pgindent on each patch. Furthermore, Command names have been renamed as suggested and I > have simplified the SendFiles function. Client can only request the regular files, any other kind such as > directories or symlinks will be skipped, the client will be responsible for taking care of such. Hi, Patch 0001 of this series conflicts with my recent commit 303640199d0436c5e7acdf50b837a027b5726594; that commit was actually inspired by some previous study of 0001. That being said, I think 0001 has the wrong idea. There's no reason that I can see why it should be correct to remove the PG_ENSURE_ERROR_CLEANUP calls from perform_base_backup(). It's true that if we register a long-lived before_shmem_exit hook, then the backup will get cleaned up even without the PG_ENSURE_ERROR_CLEANUP block, but there's also the question of the warning message. I think that our goal should be to emit the warning message about a backup being stopped too early if the user uses either pg_start_backup() or the new START_BACKUP command and does not end the backup with either pg_stop_backup() or the new STOP_BACKUP command -- but not if a single command that both starts and ends a backup, like BASE_BACKUP, is interrupted. To accomplish that goal in the wake of 303640199d0436c5e7acdf50b837a027b5726594, we need to temporarily register do_pg_abort_backup() as a before_shmem_exit() handler using PG_ENSURE_ERROR_CLEANUP() during commands like BASE_BACKUP() -- and for things like pg_start_backup() or the new START_BACKUP command, we just need to add a single call to register_persistent_abort_backup_handler(). So I think you can drop 0001, and then in the patch that actually introduces START_BACKUP, add the call to register_persistent_abort_backup_handler() before calling do_pg_start_backup(). Also in that patch, also adjust the warning text that do_pg_abort_backup() emits to be more generic e.g. "aborting backup due to backend exiting while a non-exclusive backup is in progress". 0003 creates three new functions, moving code from do_pg_start_backup() to a new function collectTablespaces() and from perform_base_backup() to new functions setup_throttle() and include_wal_files(). I'm skeptical about all of these changes. One general nitpick is that the way these function names are capitalized and punctuated does not seem to have been chosen very consistently; how about name_like_this() throughout? A bit more substantively: - collectTablespaces() is factored out of do_pg_start_backup() so that it can also be used by SendFileList(), but that means that a client is going to invoke START_BACKUP, indirectly calling collectTablespaces(), and then immediately afterward the client is probably going to call SEND_FILE_LIST, which will again call collectTablespaces(). That does not appear to be super-great. For one thing, it's duplicate work, although because SendFileList() is going to pass infotbssize as false, it's not a lot of duplicated work. Also, what happens if the two calls to collectTablespaces() return different answers due to concurrent CREATE/DROP TABLESPACE commands? Maybe it would all work out fine, but it seems like there is at least the possibility of bugs if different parts of the backup have different notions of what tablespaces exist. - setup_throttle() is factored out of perform_base_backup() so that it can be called in StartBackup() and StopBackup() and SendFiles(). This seems extremely odd. Why does it make any sense to give the user an option to activate throttling when *ending* a backup? Why does it make sense to give the user a chance to enable throttling *both* at the startup of a backup *and also* for each individual file. If we're going to support throttling here, it seems like it should be either a backup-level property or a file-level property, not both. - include_wal_files() is factored out of perform_base_backup() so that it can be called by StopBackup(). This seems like a poor design decision. The idea behind the BASE_BACKUP command is that you run that one command, and the server sends you everything. The idea in this new way of doing business is that the client requests the individual files it wants -- except for the WAL files, which are for some reason not requested individually but sent all together as part of the STOP_BACKUP response. It seems like it would be more consistent if the client were to decide which WAL files it needs and request them one by one, just as we do with other files. I think there's a common theme to all of these complaints, which is that you haven't done enough to move things that are the responsibility of the backend in the BASE_BACKUP model to the frontend in this model. I started wondering, for example, whether it might not be better to have the client rather than the server construct the tablespace_map file. After all, the client needs to get the list of files anyway (hence SEND_FILE_LIST) and if it's got that then it knows almost enough to construct the tablespace map. The only additional thing it needs is the full pathname to which the link points. But, it seems that we could fairly easily extend SEND_FILE_LIST to send, for files that are symbolic links, the target of the link, using a new column. Or alternatively, using a separate command, so that instead of just sending a single SEND_FILE_LIST command, the client might first ask for a tablespace list and then might ask for a list of files within each tablespace (e.g. LIST_TABLESPACES, then LIST_FILES <oid> for each tablespace, with 0 for the main tablespace, perhaps). I'm not sure which way is better. Similarly, for throttling, I have a hard time understanding how what you've got here is going to work reasonably. It looks like each client is just going to request whatever MAX_RATE the user specifies, but the result of that will be that the actual transfer rate is probably a multiple of the specified rate, approximately equal to the specified rate times the number of clients. That's probably not what the user wants. You could take the specified rate and divide it by the number of workers, but limiting each of 4 workers to a quarter of the rate will probably lead to a combined rate of less than than the specified rate, because if one worker doesn't use all of the bandwidth to which it's entitled, or even exits earlier than the others, the other workers don't get to go any faster as a result. Another problem is that, in the current approach, throttling applies overall to the entire backup, but in this approach, it is applied separately to each SEND_FILE command. In the current approach, if one file finishes a little faster or slower than anticipated, the next file in the tarball will be sent a little slower or faster to compensate. But in this approach, each SEND_FILES command is throttled separately, so this property is lost. Furthermore, while BASEBACKUP sends data continuously, this approach naturally involves pauses between commands. If files are large, that won't matter much, but if they're small and numerous, it will tend to cause the actual transfer rate to be less than the throttling rate. One potential way to solve this problem is... move it to the client side. Instead of making it the server's job not to send data too fast, make it the client's job not to receive data too fast. Let the server backends write as fast as they want, and on the pg_basebackup side, have the threads coordinate with each other so that they don't read data faster than the configured rate. That's not quite the same thing, though, because the server can get ahead by the size of the client's receive buffers plus whatever data is on the wire. I don't know whether that's a big enough problem to be worth caring about. If it is, then I think we need some server infrastructure to "group throttle" a group of cooperating backends. A general comment about 0004 is that it seems like you've proceeded by taking the code from perform_base_backup() and spreading it across several different functions without, necessarily, as much thought as is needed there. For instance, StartBackup() looks like just the beginning of perform_base_backup(). But, why shouldn't it instead look like pg_start_backup() -- in fact, a simplified version that only handles the non-exclusive backup case? Is the extra stuff it's doing really appropriate? I've already complained about the tablespace-related stuff here and the throttling, but there's more. Setting statrelpath here will probably break if somebody tries to use SEND_FILES without first calling START_BACKUP. Sending the backup_label file here is oddly asymmetric, because that's done by pg_stop_backup(), not pg_start_backup(). And similarly, StopBackup() looks like it's just the end of perform_base_backup(), but that's not pretty strange-looking too. Again, I've already complained about include_wal_files() being part of this, but there's also: + /* ... and pg_control after everything else. */ ...which (1) is an odd thing to say when this is the first thing this particular function is to send and (2) is another example of a sloppy division of labor between client and server; apparently, the client is supposed to know not to request pg_control, because the server is going to send it unsolicited. There's no particular reason to have this a special case. The client could just request it last. And then the server code wouldn't need a special case, and you wouldn't have this odd logic split between the client and the server. Overall, I think this needs a lot more work. The overall idea's not wrong, but there seem to be a very large number of details which, at least to me, do not seem to be correct. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Dec 12, 2019 at 10:20 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> I have updated the patches (v7 attached) and have taken care of all issues pointed by Jeevan, additionally
> ran the pgindent on each patch. Furthermore, Command names have been renamed as suggested and I
> have simplified the SendFiles function. Client can only request the regular files, any other kind such as
> directories or symlinks will be skipped, the client will be responsible for taking care of such.
Hi,
Patch 0001 of this series conflicts with my recent commit
303640199d0436c5e7acdf50b837a027b5726594; that commit was actually
inspired by some previous study of 0001. That being said, I think 0001
has the wrong idea. There's no reason that I can see why it should be
correct to remove the PG_ENSURE_ERROR_CLEANUP calls from
perform_base_backup(). It's true that if we register a long-lived
before_shmem_exit hook, then the backup will get cleaned up even
without the PG_ENSURE_ERROR_CLEANUP block, but there's also the
question of the warning message. I think that our goal should be to
emit the warning message about a backup being stopped too early if the
user uses either pg_start_backup() or the new START_BACKUP command and
does not end the backup with either pg_stop_backup() or the new
STOP_BACKUP command -- but not if a single command that both starts
and ends a backup, like BASE_BACKUP, is interrupted. To accomplish
that goal in the wake of 303640199d0436c5e7acdf50b837a027b5726594, we
need to temporarily register do_pg_abort_backup() as a
before_shmem_exit() handler using PG_ENSURE_ERROR_CLEANUP() during
commands like BASE_BACKUP() -- and for things like pg_start_backup()
or the new START_BACKUP command, we just need to add a single call to
register_persistent_abort_backup_handler().
So I think you can drop 0001, and then in the patch that actually
introduces START_BACKUP, add the call to
register_persistent_abort_backup_handler() before calling
do_pg_start_backup(). Also in that patch, also adjust the warning text
that do_pg_abort_backup() emits to be more generic e.g. "aborting
backup due to backend exiting while a non-exclusive backup is in
progress".
0003 creates three new functions, moving code from
do_pg_start_backup() to a new function collectTablespaces() and from
perform_base_backup() to new functions setup_throttle() and
include_wal_files(). I'm skeptical about all of these changes. One
general nitpick is that the way these function names are capitalized
and punctuated does not seem to have been chosen very consistently;
how about name_like_this() throughout? A bit more substantively:
- collectTablespaces() is factored out of do_pg_start_backup() so that
it can also be used by SendFileList(), but that means that a client is
going to invoke START_BACKUP, indirectly calling collectTablespaces(),
and then immediately afterward the client is probably going to call
SEND_FILE_LIST, which will again call collectTablespaces(). That does
not appear to be super-great. For one thing, it's duplicate work,
although because SendFileList() is going to pass infotbssize as false,
it's not a lot of duplicated work.
Also, what happens if the two calls
to collectTablespaces() return different answers due to concurrent
CREATE/DROP TABLESPACE commands? Maybe it would all work out fine, but
it seems like there is at least the possibility of bugs if different
parts of the backup have different notions of what tablespaces exist.
- setup_throttle() is factored out of perform_base_backup() so that it
can be called in StartBackup() and StopBackup() and SendFiles(). This
seems extremely odd. Why does it make any sense to give the user an
option to activate throttling when *ending* a backup? Why does it make
sense to give the user a chance to enable throttling *both* at the
startup of a backup *and also* for each individual file. If we're
going to support throttling here, it seems like it should be either a
backup-level property or a file-level property, not both.
- include_wal_files() is factored out of perform_base_backup() so that
it can be called by StopBackup(). This seems like a poor design
decision. The idea behind the BASE_BACKUP command is that you run that
one command, and the server sends you everything. The idea in this new
way of doing business is that the client requests the individual files
it wants -- except for the WAL files, which are for some reason not
requested individually but sent all together as part of the
STOP_BACKUP response. It seems like it would be more consistent if the
client were to decide which WAL files it needs and request them one by
one, just as we do with other files.
I think there's a common theme to all of these complaints, which is
that you haven't done enough to move things that are the
responsibility of the backend in the BASE_BACKUP model to the frontend
in this model. I started wondering, for example, whether it might not
be better to have the client rather than the server construct the
tablespace_map file. After all, the client needs to get the list of
files anyway (hence SEND_FILE_LIST) and if it's got that then it knows
almost enough to construct the tablespace map. The only additional
thing it needs is the full pathname to which the link points. But, it
seems that we could fairly easily extend SEND_FILE_LIST to send, for
files that are symbolic links, the target of the link, using a new
column. Or alternatively, using a separate command, so that instead of
just sending a single SEND_FILE_LIST command, the client might first
ask for a tablespace list and then might ask for a list of files
within each tablespace (e.g. LIST_TABLESPACES, then LIST_FILES <oid>
for each tablespace, with 0 for the main tablespace, perhaps). I'm not
sure which way is better.
Similarly, for throttling, I have a hard time understanding how what
you've got here is going to work reasonably. It looks like each client
is just going to request whatever MAX_RATE the user specifies, but the
result of that will be that the actual transfer rate is probably a
multiple of the specified rate, approximately equal to the specified
rate times the number of clients. That's probably not what the user
wants. You could take the specified rate and divide it by the number
of workers, but limiting each of 4 workers to a quarter of the rate
will probably lead to a combined rate of less than than the specified
rate, because if one worker doesn't use all of the bandwidth to which
it's entitled, or even exits earlier than the others, the other
workers don't get to go any faster as a result. Another problem is
that, in the current approach, throttling applies overall to the
entire backup, but in this approach, it is applied separately to each
SEND_FILE command. In the current approach, if one file finishes a
little faster or slower than anticipated, the next file in the tarball
will be sent a little slower or faster to compensate. But in this
approach, each SEND_FILES command is throttled separately, so this
property is lost. Furthermore, while BASEBACKUP sends data
continuously, this approach naturally involves pauses between
commands. If files are large, that won't matter much, but if they're
small and numerous, it will tend to cause the actual transfer rate to
be less than the throttling rate.
One potential way to solve this problem is... move it to the client
side. Instead of making it the server's job not to send data too fast,
make it the client's job not to receive data too fast. Let the server
backends write as fast as they want, and on the pg_basebackup side,
have the threads coordinate with each other so that they don't read
data faster than the configured rate. That's not quite the same thing,
though, because the server can get ahead by the size of the client's
receive buffers plus whatever data is on the wire. I don't know
whether that's a big enough problem to be worth caring about. If it
is, then I think we need some server infrastructure to "group
throttle" a group of cooperating backends.
A general comment about 0004 is that it seems like you've proceeded by
taking the code from perform_base_backup() and spreading it across
several different functions without, necessarily, as much thought as
is needed there. For instance, StartBackup() looks like just the
beginning of perform_base_backup(). But, why shouldn't it instead look
like pg_start_backup() -- in fact, a simplified version that only
handles the non-exclusive backup case? Is the extra stuff it's doing
really appropriate? I've already complained about the
tablespace-related stuff here and the throttling, but there's more.
Setting statrelpath here will probably break if somebody tries to use
SEND_FILES without first calling START_BACKUP. Sending the
backup_label file here is oddly asymmetric, because that's done by
pg_stop_backup(), not pg_start_backup(). And similarly, StopBackup()
looks like it's just the end of perform_base_backup(), but that's not
pretty strange-looking too. Again, I've already complained about
include_wal_files() being part of this, but there's also:
+ /* ... and pg_control after everything else. */
...which (1) is an odd thing to say when this is the first thing this
particular function is to send and (2) is another example of a sloppy
division of labor between client and server; apparently, the client is
supposed to know not to request pg_control, because the server is
going to send it unsolicited. There's no particular reason to have
this a special case. The client could just request it last. And then
the server code wouldn't need a special case, and you wouldn't have
this odd logic split between the client and the server.
Overall, I think this needs a lot more work. The overall idea's not
wrong, but there seem to be a very large number of details which, at
least to me, do not seem to be correct.
On Thu, Dec 19, 2019 at 10:47 PM Robert Haas <robertmhaas@gmail.com> wrote:On Thu, Dec 12, 2019 at 10:20 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> I have updated the patches (v7 attached) and have taken care of all issues pointed by Jeevan, additionally
> ran the pgindent on each patch. Furthermore, Command names have been renamed as suggested and I
> have simplified the SendFiles function. Client can only request the regular files, any other kind such as
> directories or symlinks will be skipped, the client will be responsible for taking care of such.
Hi,
Patch 0001 of this series conflicts with my recent commit
303640199d0436c5e7acdf50b837a027b5726594; that commit was actually
inspired by some previous study of 0001. That being said, I think 0001
has the wrong idea. There's no reason that I can see why it should be
correct to remove the PG_ENSURE_ERROR_CLEANUP calls from
perform_base_backup(). It's true that if we register a long-lived
before_shmem_exit hook, then the backup will get cleaned up even
without the PG_ENSURE_ERROR_CLEANUP block, but there's also the
question of the warning message. I think that our goal should be to
emit the warning message about a backup being stopped too early if the
user uses either pg_start_backup() or the new START_BACKUP command and
does not end the backup with either pg_stop_backup() or the new
STOP_BACKUP command -- but not if a single command that both starts
and ends a backup, like BASE_BACKUP, is interrupted. To accomplish
that goal in the wake of 303640199d0436c5e7acdf50b837a027b5726594, we
need to temporarily register do_pg_abort_backup() as a
before_shmem_exit() handler using PG_ENSURE_ERROR_CLEANUP() during
commands like BASE_BACKUP() -- and for things like pg_start_backup()
or the new START_BACKUP command, we just need to add a single call to
register_persistent_abort_backup_handler().
So I think you can drop 0001, and then in the patch that actually
introduces START_BACKUP, add the call to
register_persistent_abort_backup_handler() before calling
do_pg_start_backup(). Also in that patch, also adjust the warning text
that do_pg_abort_backup() emits to be more generic e.g. "aborting
backup due to backend exiting while a non-exclusive backup is in
progress".Sure. will do.0003 creates three new functions, moving code from
do_pg_start_backup() to a new function collectTablespaces() and from
perform_base_backup() to new functions setup_throttle() and
include_wal_files(). I'm skeptical about all of these changes. One
general nitpick is that the way these function names are capitalized
and punctuated does not seem to have been chosen very consistently;
how about name_like_this() throughout? A bit more substantively:
- collectTablespaces() is factored out of do_pg_start_backup() so that
it can also be used by SendFileList(), but that means that a client is
going to invoke START_BACKUP, indirectly calling collectTablespaces(),
and then immediately afterward the client is probably going to call
SEND_FILE_LIST, which will again call collectTablespaces(). That does
not appear to be super-great. For one thing, it's duplicate work,
although because SendFileList() is going to pass infotbssize as false,
it's not a lot of duplicated work.I'll remove this duplication by eliminating this call from START_BACKUP andSEND_FILE_LIST functions. More about this is explained later in this email.Also, what happens if the two calls
to collectTablespaces() return different answers due to concurrent
CREATE/DROP TABLESPACE commands? Maybe it would all work out fine, but
it seems like there is at least the possibility of bugs if different
parts of the backup have different notions of what tablespaces exist.The concurrent CREATE/DROP TABLESPACE commands, it can happen and willbe resolved by the WAL files collected for the backup. I don't think wecan do anything when objects are created or dropped in-between start andstop backup. BASE_BACKUPalso relies on the WAL files to handle such ascenario and does not error out when some relation files go away.
- setup_throttle() is factored out of perform_base_backup() so that it
can be called in StartBackup() and StopBackup() and SendFiles(). This
seems extremely odd. Why does it make any sense to give the user an
option to activate throttling when *ending* a backup? Why does it make
sense to give the user a chance to enable throttling *both* at the
startup of a backup *and also* for each individual file. If we're
going to support throttling here, it seems like it should be either a
backup-level property or a file-level property, not both.It's a file-level property only. Throttle functionality relies on globalvariables. StartBackup() and StopBackup() are calling setup_throttlefunction to disable the throttling.I should have been more explicit here by using -1 to setup_throttle,Illustrating that throttling is disabled, instead of using 'opt->maxrate'.(Although it defaults to -1 for these functions).I'll remove the setup_throttle() call for both functions.
- include_wal_files() is factored out of perform_base_backup() so that
it can be called by StopBackup(). This seems like a poor design
decision. The idea behind the BASE_BACKUP command is that you run that
one command, and the server sends you everything. The idea in this new
way of doing business is that the client requests the individual files
it wants -- except for the WAL files, which are for some reason not
requested individually but sent all together as part of the
STOP_BACKUP response. It seems like it would be more consistent if the
client were to decide which WAL files it needs and request them one by
one, just as we do with other files.As I understand you are suggesting to add another command to fetch thelist of WAL files which would be called by the client after executing stopbackup. Once the client gets that list, it starts requesting the WAL files oneby one.So I will add LIST_WAL_FILES command that will take start_lsn and end_lsnas arguments and return the list of WAL files between these LSNs.Something like this :LIST_WAL_FILES 'start_lsn' 'end_lsn';
I think there's a common theme to all of these complaints, which is
that you haven't done enough to move things that are the
responsibility of the backend in the BASE_BACKUP model to the frontend
in this model. I started wondering, for example, whether it might not
be better to have the client rather than the server construct the
tablespace_map file. After all, the client needs to get the list of
files anyway (hence SEND_FILE_LIST) and if it's got that then it knows
almost enough to construct the tablespace map. The only additional
thing it needs is the full pathname to which the link points. But, it
seems that we could fairly easily extend SEND_FILE_LIST to send, for
files that are symbolic links, the target of the link, using a new
column. Or alternatively, using a separate command, so that instead of
just sending a single SEND_FILE_LIST command, the client might first
ask for a tablespace list and then might ask for a list of files
within each tablespace (e.g. LIST_TABLESPACES, then LIST_FILES <oid>
for each tablespace, with 0 for the main tablespace, perhaps). I'm not
sure which way is better.do_pg_start_backup is collecting the tablespace information anyway tobuild the tablespace_map for BASE_BACKUP. So returning the same seemedbetter than adding a new command for the same information. hence multiplecalls to the collectTablespaces() [to be renamed to collect_tablespaces].tablespace_map can be constructed by the client, but then BASE_BACKUPis returning it as part of the full backup. If clients in parallel modeare to construct this themselves, these will seem like two differentapproaches. Perhaps this should be done for BASE_BACKUP aswell?I'll refactor the do_pg_start_backup function to remove the code relatedto tablespace information collection (to collect_tablespaces) andtablespace_map file creation, so that this function does not collect thisinformation unnecessarily. perform_base_backup function can collect andsend the tablespace information to the client and then the client canconstruct the tablespace_map file.I'll add a new command to fetch the list of tablespaces i.e. LIST_TABLESPACESwhich will return the tablespace information to the client for parallelmode. And will refactor START_BACKUP and STOP_BACKUP commands,so that they only do the specific job of putting the system in backup mode orout of it, nothing else.These commands should only return the start and endLSN to the client.
Similarly, for throttling, I have a hard time understanding how what
you've got here is going to work reasonably. It looks like each client
is just going to request whatever MAX_RATE the user specifies, but the
result of that will be that the actual transfer rate is probably a
multiple of the specified rate, approximately equal to the specified
rate times the number of clients. That's probably not what the user
wants. You could take the specified rate and divide it by the number
of workers, but limiting each of 4 workers to a quarter of the rate
will probably lead to a combined rate of less than than the specified
rate, because if one worker doesn't use all of the bandwidth to which
it's entitled, or even exits earlier than the others, the other
workers don't get to go any faster as a result. Another problem is
that, in the current approach, throttling applies overall to the
entire backup, but in this approach, it is applied separately to each
SEND_FILE command. In the current approach, if one file finishes a
little faster or slower than anticipated, the next file in the tarball
will be sent a little slower or faster to compensate. But in this
approach, each SEND_FILES command is throttled separately, so this
property is lost. Furthermore, while BASEBACKUP sends data
continuously, this approach naturally involves pauses between
commands. If files are large, that won't matter much, but if they're
small and numerous, it will tend to cause the actual transfer rate to
be less than the throttling rate.
One potential way to solve this problem is... move it to the client
side. Instead of making it the server's job not to send data too fast,
make it the client's job not to receive data too fast. Let the server
backends write as fast as they want, and on the pg_basebackup side,
have the threads coordinate with each other so that they don't read
data faster than the configured rate. That's not quite the same thing,
though, because the server can get ahead by the size of the client's
receive buffers plus whatever data is on the wire. I don't know
whether that's a big enough problem to be worth caring about. If it
is, then I think we need some server infrastructure to "group
throttle" a group of cooperating backends.That was a mistake in my code. maxrate should've been equally dividedamongst all threads. I agree that we should move this to the client-side.When a thread exits, its share should also be equally divided amongstthe remaining threads (i.e. recalculate maxrate for each remaining thread).Say we have 4 running threads with each allocation 25% of the bandwidth.Thread 1 exits. We recalculate bandwidth and assign the remaining 3 threads33.33% each. This solves one problem that you had identified. However,it doesn't solve where one (or more) thread is not fully consuming theirallocated share. I'm not really sure how we can solve it properly. Suggestionsare welcome.
A general comment about 0004 is that it seems like you've proceeded by
taking the code from perform_base_backup() and spreading it across
several different functions without, necessarily, as much thought as
is needed there. For instance, StartBackup() looks like just the
beginning of perform_base_backup(). But, why shouldn't it instead look
like pg_start_backup() -- in fact, a simplified version that only
handles the non-exclusive backup case? Is the extra stuff it's doing
really appropriate? I've already complained about the
tablespace-related stuff here and the throttling, but there's more.
Setting statrelpath here will probably break if somebody tries to use
SEND_FILES without first calling START_BACKUP. Sending the
backup_label file here is oddly asymmetric, because that's done by
pg_stop_backup(), not pg_start_backup(). And similarly, StopBackup()
looks like it's just the end of perform_base_backup(), but that's not
pretty strange-looking too. Again, I've already complained about
include_wal_files() being part of this, but there's also:
+ /* ... and pg_control after everything else. */
...which (1) is an odd thing to say when this is the first thing this
particular function is to send and (2) is another example of a sloppy
division of labor between client and server; apparently, the client is
supposed to know not to request pg_control, because the server is
going to send it unsolicited. There's no particular reason to have
this a special case. The client could just request it last. And then
the server code wouldn't need a special case, and you wouldn't have
this odd logic split between the client and the server.
Overall, I think this needs a lot more work. The overall idea's not
wrong, but there seem to be a very large number of details which, at
least to me, do not seem to be correct.Thank you Robert for the detailed review. I really appreciate your insightsand very precise feedback.After the changes suggested above, the design on a high level will look somethinglike this:=== SEQUENTIAL EXECUTION ===START_BACKUP [LABEL | FAST]- Starts backup on the server- Returns the start LSN to clientLIST_TABLESPACES- Sends a list of all tables spaces to clientLoops over LIST_TABLESPACES- LIST_FILES [tablespace]- Sends file list for the given tablespace- Create a list of all files=== PARALLEL EXECUTION ===Thread loop until the list of files is exhaustedSEND_FILE <file(s)> [CHECKSUM | WAL_START_LOCATION]- If the checksum is enabled then WAL_START_LOCATION is required.- Can request server to send one or more files but we are requesting one at a time- Pick next file from list of files- Threads sleep after the list is exhausted- All threads are sleeping=== SEQUENTIAL EXECUTION ===STOP_BACKUP [NOWAIT]- Stops backup mode- Return end LSNIf --wal-method=fetch thenLIST_WAL_FILES 'start_lsn' 'end_lsn'- Sends a list of WAL files between start LSN and end LSN=== PARALLEL EXECUTION ===Thread loop until the list of WAL files is exhaustedSEND_FILE <WAL file>- Can request server to send one or more files but we are requesting one WAL file at a time- Pick next file from list of WAL files- Threads terminate and set their status as completed/terminated=== SEQUENTIAL EXECUTION ===Cleanup
Attachment
Here are the the updated patches, taking care of the issues pointedearlier. This patch adds the following commands (with specified option):START_BACKUP [LABEL '<label>'] [FAST]STOP_BACKUP [NOWAIT]LIST_TABLESPACES [PROGRESS]LIST_FILES [TABLESPACE]LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']SEND_FILES '(' FILE, FILE... ')' [START_WAL_LOCATION 'X/X'][NOVERIFY_CHECKSUMS]Parallel backup is not making any use of tablespace map, so I haveremoved that option from the above commands. There is a patch pendingto remove the exclusive backup; we can further refactor the do_pg_start_backupfunction at that time, to remove the tablespace information and move thecreation of tablespace_map file to the client.I have disabled the maxrate option for parallel backup. I intend to sendout a separate patch for it. Robert previously suggested to implementthrottling on the client-side. I found the original email thread [1]where throttling was proposed and added to the server. In that thread,it was originally implemented on the client-side, but per many suggestions,it was moved to server-side.So, I have a few suggestions on how we can implement this:1- have another option for pg_basebackup (i.e. per-worker-maxrate) wherethe user could choose the bandwidth allocation for each worker. This approachcan be implemented on the client-side as well as on the server-side.2- have the maxrate, be divided among workers equally at first. and thelet the main thread keep adjusting it whenever one of the workers finishes.I believe this would only be possible if we handle throttling on the client.Also, as I understand it, implementing this will introduce additional mutexfor handling of bandwidth consumption data so that rate may be adjustedaccording to data received by threads.--Asif Rehman
--
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Hi Asif,On Thu, Jan 30, 2020 at 7:10 PM Asif Rehman <asifr.rehman@gmail.com> wrote:Here are the the updated patches, taking care of the issues pointedearlier. This patch adds the following commands (with specified option):START_BACKUP [LABEL '<label>'] [FAST]STOP_BACKUP [NOWAIT]LIST_TABLESPACES [PROGRESS]LIST_FILES [TABLESPACE]LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']SEND_FILES '(' FILE, FILE... ')' [START_WAL_LOCATION 'X/X'][NOVERIFY_CHECKSUMS]Parallel backup is not making any use of tablespace map, so I haveremoved that option from the above commands. There is a patch pendingto remove the exclusive backup; we can further refactor the do_pg_start_backupfunction at that time, to remove the tablespace information and move thecreation of tablespace_map file to the client.I have disabled the maxrate option for parallel backup. I intend to sendout a separate patch for it. Robert previously suggested to implementthrottling on the client-side. I found the original email thread [1]where throttling was proposed and added to the server. In that thread,it was originally implemented on the client-side, but per many suggestions,it was moved to server-side.So, I have a few suggestions on how we can implement this:1- have another option for pg_basebackup (i.e. per-worker-maxrate) wherethe user could choose the bandwidth allocation for each worker. This approachcan be implemented on the client-side as well as on the server-side.2- have the maxrate, be divided among workers equally at first. and thelet the main thread keep adjusting it whenever one of the workers finishes.I believe this would only be possible if we handle throttling on the client.Also, as I understand it, implementing this will introduce additional mutexfor handling of bandwidth consumption data so that rate may be adjustedaccording to data received by threads.--Asif RehmanThe latest changes look good to me. However, the patch set is missing the documentation.Please add those.Thanks
--Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Attachment
Thanks Jeevan. Here is the documentation patch.On Mon, Feb 10, 2020 at 6:49 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:Hi Asif,On Thu, Jan 30, 2020 at 7:10 PM Asif Rehman <asifr.rehman@gmail.com> wrote:Here are the the updated patches, taking care of the issues pointedearlier. This patch adds the following commands (with specified option):START_BACKUP [LABEL '<label>'] [FAST]STOP_BACKUP [NOWAIT]LIST_TABLESPACES [PROGRESS]LIST_FILES [TABLESPACE]LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']SEND_FILES '(' FILE, FILE... ')' [START_WAL_LOCATION 'X/X'][NOVERIFY_CHECKSUMS]Parallel backup is not making any use of tablespace map, so I haveremoved that option from the above commands. There is a patch pendingto remove the exclusive backup; we can further refactor the do_pg_start_backupfunction at that time, to remove the tablespace information and move thecreation of tablespace_map file to the client.I have disabled the maxrate option for parallel backup. I intend to sendout a separate patch for it. Robert previously suggested to implementthrottling on the client-side. I found the original email thread [1]where throttling was proposed and added to the server. In that thread,it was originally implemented on the client-side, but per many suggestions,it was moved to server-side.So, I have a few suggestions on how we can implement this:1- have another option for pg_basebackup (i.e. per-worker-maxrate) wherethe user could choose the bandwidth allocation for each worker. This approachcan be implemented on the client-side as well as on the server-side.2- have the maxrate, be divided among workers equally at first. and thelet the main thread keep adjusting it whenever one of the workers finishes.I believe this would only be possible if we handle throttling on the client.Also, as I understand it, implementing this will introduce additional mutexfor handling of bandwidth consumption data so that rate may be adjustedaccording to data received by threads.--Asif RehmanThe latest changes look good to me. However, the patch set is missing the documentation.Please add those.Thanks
--Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company----Asif Rehman
--if backup failed, backup directory is not getting removed.
[edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=9 -D /tmp/test_bkp/bkp6
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
[edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=8 -D /tmp/test_bkp/bkp6
pg_basebackup: error: directory "/tmp/test_bkp/bkp6" exists but is not empty
--giving large number of jobs leading segmentation fault.
./pg_basebackup -p 5432 --jobs=1000 -D /tmp/t3
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
.
.
.
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: could not fork new process for connection: Resource temporarily unavailable
could not fork new process for connection: Resource temporarily unavailable
pg_basebackup: error: failed to create thread: Resource temporarily unavailable
Segmentation fault (core dumped)
--stack-trace
gdb -q -c core.11824 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 --jobs=1000 -D /tmp/test_bkp/bkp10'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=140503120623360, thread_return=0x0) at pthread_join.c:46
46 if (INVALID_NOT_TERMINATED_TD_P (pd))
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 pthread_join (threadid=140503120623360, thread_return=0x0) at pthread_join.c:46
#1 0x0000000000408e21 in cleanup_workers () at pg_basebackup.c:2840
#2 0x0000000000403846 in disconnect_atexit () at pg_basebackup.c:316
#3 0x0000003921235a02 in __run_exit_handlers (status=1) at exit.c:78
#4 exit (status=1) at exit.c:100
#5 0x0000000000408aa6 in create_parallel_workers (backupinfo=0x1a4b8c0) at pg_basebackup.c:2713
#6 0x0000000000407946 in BaseBackup () at pg_basebackup.c:2127
#7 0x000000000040895c in main (argc=6, argv=0x7ffd566f4718) at pg_basebackup.c:2668
--with tablespace is in the same directory as data, parallel_backup crashed
[edb@localhost bin]$ ./initdb -D /tmp/data
[edb@localhost bin]$ ./pg_ctl -D /tmp/data -l /tmp/logfile start
[edb@localhost bin]$ mkdir /tmp/ts
[edb@localhost bin]$ ./psql postgres
psql (13devel)
Type "help" for help.
postgres=# create tablespace ts location '/tmp/ts';
CREATE TABLESPACE
postgres=# create table tx (a int) tablespace ts;
CREATE TABLE
postgres=# \q
[edb@localhost bin]$ ./pg_basebackup -j 2 -D /tmp/tts -T /tmp/ts=/tmp/ts1
Segmentation fault (core dumped)
--stack-trace
[edb@localhost bin]$ gdb -q -c core.15778 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -j 2 -D /tmp/tts -T /tmp/ts=/tmp/ts1'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000409442 in get_backup_filelist (conn=0x140cb20, backupInfo=0x14210a0) at pg_basebackup.c:3000
3000 backupInfo->curr->next = file;
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x0000000000409442 in get_backup_filelist (conn=0x140cb20, backupInfo=0x14210a0) at pg_basebackup.c:3000
#1 0x0000000000408b56 in parallel_backup_run (backupinfo=0x14210a0) at pg_basebackup.c:2739
#2 0x0000000000407955 in BaseBackup () at pg_basebackup.c:2128
#3 0x000000000040895c in main (argc=7, argv=0x7ffca2910c58) at pg_basebackup.c:2668
(gdb)
Hi,I have created a commitfest entry.On Mon, Feb 17, 2020 at 1:39 PM Asif Rehman <asifr.rehman@gmail.com> wrote:Thanks Jeevan. Here is the documentation patch.On Mon, Feb 10, 2020 at 6:49 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:Hi Asif,On Thu, Jan 30, 2020 at 7:10 PM Asif Rehman <asifr.rehman@gmail.com> wrote:Here are the the updated patches, taking care of the issues pointedearlier. This patch adds the following commands (with specified option):START_BACKUP [LABEL '<label>'] [FAST]STOP_BACKUP [NOWAIT]LIST_TABLESPACES [PROGRESS]LIST_FILES [TABLESPACE]LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']SEND_FILES '(' FILE, FILE... ')' [START_WAL_LOCATION 'X/X'][NOVERIFY_CHECKSUMS]Parallel backup is not making any use of tablespace map, so I haveremoved that option from the above commands. There is a patch pendingto remove the exclusive backup; we can further refactor the do_pg_start_backupfunction at that time, to remove the tablespace information and move thecreation of tablespace_map file to the client.I have disabled the maxrate option for parallel backup. I intend to sendout a separate patch for it. Robert previously suggested to implementthrottling on the client-side. I found the original email thread [1]where throttling was proposed and added to the server. In that thread,it was originally implemented on the client-side, but per many suggestions,it was moved to server-side.So, I have a few suggestions on how we can implement this:1- have another option for pg_basebackup (i.e. per-worker-maxrate) wherethe user could choose the bandwidth allocation for each worker. This approachcan be implemented on the client-side as well as on the server-side.2- have the maxrate, be divided among workers equally at first. and thelet the main thread keep adjusting it whenever one of the workers finishes.I believe this would only be possible if we handle throttling on the client.Also, as I understand it, implementing this will introduce additional mutexfor handling of bandwidth consumption data so that rate may be adjustedaccording to data received by threads.--Asif RehmanThe latest changes look good to me. However, the patch set is missing the documentation.Please add those.Thanks
--Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company----Asif Rehman----Asif Rehman
Hi AsifI have started testing this feature. I have applied v6 patch on commit a069218163704c44a8996e7e98e765c56e2b9c8e (30 Jan).I got few observations, please take a look.
--if backup failed, backup directory is not getting removed.
[edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=9 -D /tmp/test_bkp/bkp6
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
[edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=8 -D /tmp/test_bkp/bkp6
pg_basebackup: error: directory "/tmp/test_bkp/bkp6" exists but is not empty
--giving large number of jobs leading segmentation fault.
./pg_basebackup -p 5432 --jobs=1000 -D /tmp/t3
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
.
.
.
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: could not fork new process for connection: Resource temporarily unavailable
could not fork new process for connection: Resource temporarily unavailable
pg_basebackup: error: failed to create thread: Resource temporarily unavailable
Segmentation fault (core dumped)
--stack-trace
gdb -q -c core.11824 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 --jobs=1000 -D /tmp/test_bkp/bkp10'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=140503120623360, thread_return=0x0) at pthread_join.c:46
46 if (INVALID_NOT_TERMINATED_TD_P (pd))
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 pthread_join (threadid=140503120623360, thread_return=0x0) at pthread_join.c:46
#1 0x0000000000408e21 in cleanup_workers () at pg_basebackup.c:2840
#2 0x0000000000403846 in disconnect_atexit () at pg_basebackup.c:316
#3 0x0000003921235a02 in __run_exit_handlers (status=1) at exit.c:78
#4 exit (status=1) at exit.c:100
#5 0x0000000000408aa6 in create_parallel_workers (backupinfo=0x1a4b8c0) at pg_basebackup.c:2713
#6 0x0000000000407946 in BaseBackup () at pg_basebackup.c:2127
#7 0x000000000040895c in main (argc=6, argv=0x7ffd566f4718) at pg_basebackup.c:2668
--with tablespace is in the same directory as data, parallel_backup crashed
[edb@localhost bin]$ ./initdb -D /tmp/data
[edb@localhost bin]$ ./pg_ctl -D /tmp/data -l /tmp/logfile start
[edb@localhost bin]$ mkdir /tmp/ts
[edb@localhost bin]$ ./psql postgres
psql (13devel)
Type "help" for help.
postgres=# create tablespace ts location '/tmp/ts';
CREATE TABLESPACE
postgres=# create table tx (a int) tablespace ts;
CREATE TABLE
postgres=# \q
[edb@localhost bin]$ ./pg_basebackup -j 2 -D /tmp/tts -T /tmp/ts=/tmp/ts1
Segmentation fault (core dumped)
--stack-trace
[edb@localhost bin]$ gdb -q -c core.15778 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -j 2 -D /tmp/tts -T /tmp/ts=/tmp/ts1'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000409442 in get_backup_filelist (conn=0x140cb20, backupInfo=0x14210a0) at pg_basebackup.c:3000
3000 backupInfo->curr->next = file;
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x0000000000409442 in get_backup_filelist (conn=0x140cb20, backupInfo=0x14210a0) at pg_basebackup.c:3000
#1 0x0000000000408b56 in parallel_backup_run (backupinfo=0x14210a0) at pg_basebackup.c:2739
#2 0x0000000000407955 in BaseBackup () at pg_basebackup.c:2128
#3 0x000000000040895c in main (argc=7, argv=0x7ffca2910c58) at pg_basebackup.c:2668
(gdb)
Attachment
- 0001-Rename-sizeonly-to-dryrun-for-few-functions-in-baseb_v9.patch
- 0004-Parallel-Backup-pg_basebackup_v9.patch
- 0002-Refactor-some-backup-code-to-increase-reusability.-T_v9.patch
- 0003-Parallel-Backup-Backend-Replication-commands_v9.patch
- 0005-parallel-backup-testcase_v9.patch
- 0006-parallel-backup-documentation_v9.patch
On Wed, Mar 11, 2020 at 2:38 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi AsifI have started testing this feature. I have applied v6 patch on commit a069218163704c44a8996e7e98e765c56e2b9c8e (30 Jan).I got few observations, please take a look.
--if backup failed, backup directory is not getting removed.
[edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=9 -D /tmp/test_bkp/bkp6
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
[edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=8 -D /tmp/test_bkp/bkp6
pg_basebackup: error: directory "/tmp/test_bkp/bkp6" exists but is not empty
--giving large number of jobs leading segmentation fault.
./pg_basebackup -p 5432 --jobs=1000 -D /tmp/t3
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
.
.
.
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: could not fork new process for connection: Resource temporarily unavailable
could not fork new process for connection: Resource temporarily unavailable
pg_basebackup: error: failed to create thread: Resource temporarily unavailable
Segmentation fault (core dumped)
--stack-trace
gdb -q -c core.11824 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 --jobs=1000 -D /tmp/test_bkp/bkp10'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=140503120623360, thread_return=0x0) at pthread_join.c:46
46 if (INVALID_NOT_TERMINATED_TD_P (pd))
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 pthread_join (threadid=140503120623360, thread_return=0x0) at pthread_join.c:46
#1 0x0000000000408e21 in cleanup_workers () at pg_basebackup.c:2840
#2 0x0000000000403846 in disconnect_atexit () at pg_basebackup.c:316
#3 0x0000003921235a02 in __run_exit_handlers (status=1) at exit.c:78
#4 exit (status=1) at exit.c:100
#5 0x0000000000408aa6 in create_parallel_workers (backupinfo=0x1a4b8c0) at pg_basebackup.c:2713
#6 0x0000000000407946 in BaseBackup () at pg_basebackup.c:2127
#7 0x000000000040895c in main (argc=6, argv=0x7ffd566f4718) at pg_basebackup.c:2668
--with tablespace is in the same directory as data, parallel_backup crashed
[edb@localhost bin]$ ./initdb -D /tmp/data
[edb@localhost bin]$ ./pg_ctl -D /tmp/data -l /tmp/logfile start
[edb@localhost bin]$ mkdir /tmp/ts
[edb@localhost bin]$ ./psql postgres
psql (13devel)
Type "help" for help.
postgres=# create tablespace ts location '/tmp/ts';
CREATE TABLESPACE
postgres=# create table tx (a int) tablespace ts;
CREATE TABLE
postgres=# \q
[edb@localhost bin]$ ./pg_basebackup -j 2 -D /tmp/tts -T /tmp/ts=/tmp/ts1
Segmentation fault (core dumped)
--stack-trace
[edb@localhost bin]$ gdb -q -c core.15778 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -j 2 -D /tmp/tts -T /tmp/ts=/tmp/ts1'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000409442 in get_backup_filelist (conn=0x140cb20, backupInfo=0x14210a0) at pg_basebackup.c:3000
3000 backupInfo->curr->next = file;
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x0000000000409442 in get_backup_filelist (conn=0x140cb20, backupInfo=0x14210a0) at pg_basebackup.c:3000
#1 0x0000000000408b56 in parallel_backup_run (backupinfo=0x14210a0) at pg_basebackup.c:2739
#2 0x0000000000407955 in BaseBackup () at pg_basebackup.c:2128
#3 0x000000000040895c in main (argc=7, argv=0x7ffca2910c58) at pg_basebackup.c:2668
(gdb)Thanks Rajkumar. I have fixed the above issues and have rebased the patch to the latest master (b7f64c64).(V9 of the patches are attached).--Asif Rehman
Thanks for the patches.I have verified reported issues with new patches, issues are fixed now.I got another observation where If a new slot name given without -C option, it leads to server crash error.[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 4 -D /tmp/bkp --slot test_bkp_slotpg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test_bkp_slot" does not existpg_basebackup: error: could not list backup files: server closed the connection unexpectedlyThis probably means the server terminated abnormallybefore or while processing the request.pg_basebackup: removing data directory "/tmp/bkp"
On Mon, Mar 16, 2020 at 11:08 AM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:Thanks for the patches.I have verified reported issues with new patches, issues are fixed now.I got another observation where If a new slot name given without -C option, it leads to server crash error.[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 4 -D /tmp/bkp --slot test_bkp_slotpg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test_bkp_slot" does not existpg_basebackup: error: could not list backup files: server closed the connection unexpectedlyThis probably means the server terminated abnormallybefore or while processing the request.pg_basebackup: removing data directory "/tmp/bkp"It seems to be an expected behavior. The START_BACKUP command has been executed, andpg_basebackup tries to start a WAL streaming process with a non-existent slot, which results inan error. So the backup is aborted while terminating all other processes.
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test_bkp_slot" does not exist
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory "/tmp/bkp"
--Asif Rehman
Thanks Rajkumar. I have fixed the above issues and have rebased the patch to the latest master (b7f64c64).(V9 of the patches are attached).
I had a further review of the patches and here are my few observations:
1.
+/*
+ * stop_backup() - ends an online backup
+ *
+ * The function is called at the end of an online backup. It sends out pg_control
+ * file, optionally WAL segments and ending WAL location.
+ */
Comments seem out-dated.
2. With parallel jobs, maxrate is now not supported. Since we are now asking
data in multiple threads throttling seems important here. Can you please
explain why have you disabled that?
3. As we are always fetching a single file and as Robert suggested, let rename
SEND_FILES to SEND_FILE instead.
4. Does this work on Windows? I mean does pthread_create() work on Windows?
I asked this as I see that pgbench has its own implementation for
pthread_create() for WIN32 but this patch doesn't.
5. Typos:
tablspace => tablespace
safly => safely
6. parallel_backup_run() needs some comments explaining the states it goes
through PB_* states.
7.
+ case PB_FETCH_REL_FILES: /* fetch files from server */
+ if (backupinfo->activeworkers == 0)
+ {
+ backupinfo->backupstate = PB_STOP_BACKUP;
+ free_filelist(backupinfo);
+ }
+ break;
+ case PB_FETCH_WAL_FILES: /* fetch WAL files from server */
+ if (backupinfo->activeworkers == 0)
+ {
+ backupinfo->backupstate = PB_BACKUP_COMPLETE;
+ }
+ break;
Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Phone: +91 20 66449694
Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb
This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message.
On testing further, I found when taking backup with -R, pg_basebackup crashed
[edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values ('parallel_backup with -R recovery-conf');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R
Segmentation fault (core dumped)
stack trace looks the same as it was on earlier reported crash with tablespace.
--stack trace
[edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
3175 backupinfo->curr = fetchfile->next;
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
#1 0x0000000000408a9e in worker_run (arg=0xc1e458) at pg_basebackup.c:2715
#2 0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at pthread_create.c:301
#3 0x00000039212e8c4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
Hi Asif,ThanksThanks Rajkumar. I have fixed the above issues and have rebased the patch to the latest master (b7f64c64).(V9 of the patches are attached).
I had a further review of the patches and here are my few observations:
1.
+/*
+ * stop_backup() - ends an online backup
+ *
+ * The function is called at the end of an online backup. It sends out pg_control
+ * file, optionally WAL segments and ending WAL location.
+ */
Comments seem out-dated.
2. With parallel jobs, maxrate is now not supported. Since we are now asking
data in multiple threads throttling seems important here. Can you please
explain why have you disabled that?
3. As we are always fetching a single file and as Robert suggested, let rename
SEND_FILES to SEND_FILE instead.
4. Does this work on Windows? I mean does pthread_create() work on Windows?
I asked this as I see that pgbench has its own implementation for
pthread_create() for WIN32 but this patch doesn't.
5. Typos:
tablspace => tablespace
safly => safely
6. parallel_backup_run() needs some comments explaining the states it goes
through PB_* states.
7.
+ case PB_FETCH_REL_FILES: /* fetch files from server */
+ if (backupinfo->activeworkers == 0)
+ {
+ backupinfo->backupstate = PB_STOP_BACKUP;
+ free_filelist(backupinfo);
+ }
+ break;
+ case PB_FETCH_WAL_FILES: /* fetch WAL files from server */
+ if (backupinfo->activeworkers == 0)
+ {
+ backupinfo->backupstate = PB_BACKUP_COMPLETE;
+ }
+ break;
Why free_filelist() is not called in PB_FETCH_WAL_FILES case?--Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Phone: +91 20 66449694
Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb
This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message.
waiting for server to start.... done
server started
[edb@localhost bin]$
[edb@localhost bin]$ mkdir /tmp/tblsp
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp location '/tmp/tblsp';"
CREATE TABLESPACE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create database testdb tablespace tblsp;"
CREATE DATABASE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a text);"
CREATE TABLE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl values ('parallel_backup with tablespace');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T /tmp/tblsp=/tmp/tblsp_bkp --jobs 2
[edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p 5555" start
waiting for server to start.... done
server started
[edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
oid | spcname | spcowner | spcacl | spcoptions
-------+------------+----------+--------+------------
1663 | pg_default | 10 | |
16384 | tblsp | 10 | |
(2 rows)
[edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
psql: error: could not connect to server: FATAL: "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
DETAIL: File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is missing.
[edb@localhost bin]$
[edb@localhost bin]$ ls data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
[edb@localhost bin]$ ls /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
ls: cannot access /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or directory
Hi Asif,
On testing further, I found when taking backup with -R, pg_basebackup crashedthis crash is not consistently reproducible.[edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a text);"CREATE TABLE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values ('parallel_backup with -R recovery-conf');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R
Segmentation fault (core dumped)
stack trace looks the same as it was on earlier reported crash with tablespace.
--stack trace
[edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
3175 backupinfo->curr = fetchfile->next;
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
#1 0x0000000000408a9e in worker_run (arg=0xc1e458) at pg_basebackup.c:2715
#2 0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at pthread_create.c:301
#3 0x00000039212e8c4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115(gdb)Thanks & Regards,Rajkumar RaghuwanshiOn Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:Hi Asif,ThanksThanks Rajkumar. I have fixed the above issues and have rebased the patch to the latest master (b7f64c64).(V9 of the patches are attached).
I had a further review of the patches and here are my few observations:
1.
+/*
+ * stop_backup() - ends an online backup
+ *
+ * The function is called at the end of an online backup. It sends out pg_control
+ * file, optionally WAL segments and ending WAL location.
+ */
Comments seem out-dated.
2. With parallel jobs, maxrate is now not supported. Since we are now asking
data in multiple threads throttling seems important here. Can you please
explain why have you disabled that?
3. As we are always fetching a single file and as Robert suggested, let rename
SEND_FILES to SEND_FILE instead.
4. Does this work on Windows? I mean does pthread_create() work on Windows?
I asked this as I see that pgbench has its own implementation for
pthread_create() for WIN32 but this patch doesn't.
5. Typos:
tablspace => tablespace
safly => safely
6. parallel_backup_run() needs some comments explaining the states it goes
through PB_* states.
7.
+ case PB_FETCH_REL_FILES: /* fetch files from server */
+ if (backupinfo->activeworkers == 0)
+ {
+ backupinfo->backupstate = PB_STOP_BACKUP;
+ free_filelist(backupinfo);
+ }
+ break;
+ case PB_FETCH_WAL_FILES: /* fetch WAL files from server */
+ if (backupinfo->activeworkers == 0)
+ {
+ backupinfo->backupstate = PB_BACKUP_COMPLETE;
+ }
+ break;
Why free_filelist() is not called in PB_FETCH_WAL_FILES case?--Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Phone: +91 20 66449694
Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb
This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message.
While testing further I observed parallel backup is not able to take backup of standby server.
mkdir /tmp/archive_dir
echo "archive_mode='on'">> data/postgresql.conf
echo "archive_command='cp %p /tmp/archive_dir/%f'">> data/postgresql.conf
./pg_ctl -D data -l logs start
./pg_basebackup -p 5432 -Fp -R -D /tmp/slave
echo "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">> /tmp/slave/postgresql.conf
echo "restore_command='cp /tmp/archive_dir/%f %p'">> /tmp/slave/postgresql.conf
echo "promote_trigger_file='/tmp/failover.log'">> /tmp/slave/postgresql.conf
./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c
[edb@localhost bin]$ ./psql postgres -p 5432 -c "select pg_is_in_recovery();"
pg_is_in_recovery
-------------------
f
(1 row)
[edb@localhost bin]$ ./psql postgres -p 5433 -c "select pg_is_in_recovery();"
pg_is_in_recovery
-------------------
t
(1 row)
[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 6
pg_basebackup: error: could not list backup files: ERROR: the standby was promoted during online backup
HINT: This means that the backup being taken is corrupt and should not be used. Try taking another online backup.
pg_basebackup: removing data directory "/tmp/bkp_s"
#same is working fine without parallel backup
[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
[edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION
/tmp/bkp_s/PG_VERSION
Hi Asif,In another scenarios, bkp data is corrupted for tablespace. again this is not reproducible everytime,but If I am running the same set of commands I am getting the same error.[edb@localhost bin]$ ./pg_ctl -D data -l logfile start
waiting for server to start.... done
server started
[edb@localhost bin]$
[edb@localhost bin]$ mkdir /tmp/tblsp
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp location '/tmp/tblsp';"
CREATE TABLESPACE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create database testdb tablespace tblsp;"
CREATE DATABASE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a text);"
CREATE TABLE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl values ('parallel_backup with tablespace');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T /tmp/tblsp=/tmp/tblsp_bkp --jobs 2
[edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p 5555" start
waiting for server to start.... done
server started
[edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
oid | spcname | spcowner | spcacl | spcoptions
-------+------------+----------+--------+------------
1663 | pg_default | 10 | |
16384 | tblsp | 10 | |
(2 rows)
[edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
psql: error: could not connect to server: FATAL: "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
DETAIL: File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is missing.
[edb@localhost bin]$
[edb@localhost bin]$ ls data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
[edb@localhost bin]$ ls /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
ls: cannot access /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or directoryThanks & Regards,Rajkumar RaghuwanshiOn Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif,
On testing further, I found when taking backup with -R, pg_basebackup crashedthis crash is not consistently reproducible.[edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a text);"CREATE TABLE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values ('parallel_backup with -R recovery-conf');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R
Segmentation fault (core dumped)
stack trace looks the same as it was on earlier reported crash with tablespace.
--stack trace
[edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
3175 backupinfo->curr = fetchfile->next;
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
#1 0x0000000000408a9e in worker_run (arg=0xc1e458) at pg_basebackup.c:2715
#2 0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at pthread_create.c:301
#3 0x00000039212e8c4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115(gdb)Thanks & Regards,Rajkumar RaghuwanshiOn Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:Hi Asif,ThanksThanks Rajkumar. I have fixed the above issues and have rebased the patch to the latest master (b7f64c64).(V9 of the patches are attached).
I had a further review of the patches and here are my few observations:
1.
+/*
+ * stop_backup() - ends an online backup
+ *
+ * The function is called at the end of an online backup. It sends out pg_control
+ * file, optionally WAL segments and ending WAL location.
+ */
Comments seem out-dated.
2. With parallel jobs, maxrate is now not supported. Since we are now asking
data in multiple threads throttling seems important here. Can you please
explain why have you disabled that?
3. As we are always fetching a single file and as Robert suggested, let rename
SEND_FILES to SEND_FILE instead.
4. Does this work on Windows? I mean does pthread_create() work on Windows?
I asked this as I see that pgbench has its own implementation for
pthread_create() for WIN32 but this patch doesn't.
5. Typos:
tablspace => tablespace
safly => safely
6. parallel_backup_run() needs some comments explaining the states it goes
through PB_* states.
7.
+ case PB_FETCH_REL_FILES: /* fetch files from server */
+ if (backupinfo->activeworkers == 0)
+ {
+ backupinfo->backupstate = PB_STOP_BACKUP;
+ free_filelist(backupinfo);
+ }
+ break;
+ case PB_FETCH_WAL_FILES: /* fetch WAL files from server */
+ if (backupinfo->activeworkers == 0)
+ {
+ backupinfo->backupstate = PB_BACKUP_COMPLETE;
+ }
+ break;
Why free_filelist() is not called in PB_FETCH_WAL_FILES case?--Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Phone: +91 20 66449694
Website: www.enterprisedb.com
EnterpriseDB Blog: http://blogs.enterprisedb.com/
Follow us on Twitter: http://www.twitter.com/enterprisedb
This e-mail message (and any attachment) is intended for the use of the individual or entity to whom it is addressed. This message contains information from EnterpriseDB Corporation that may be privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient or authorized to receive this for the intended recipient, any use, dissemination, distribution, retention, archiving, or copying of this communication is strictly prohibited. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and delete this message.
Hi Asif,
While testing further I observed parallel backup is not able to take backup of standby server.
mkdir /tmp/archive_dir
echo "archive_mode='on'">> data/postgresql.conf
echo "archive_command='cp %p /tmp/archive_dir/%f'">> data/postgresql.conf
./pg_ctl -D data -l logs start
./pg_basebackup -p 5432 -Fp -R -D /tmp/slave
echo "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">> /tmp/slave/postgresql.conf
echo "restore_command='cp /tmp/archive_dir/%f %p'">> /tmp/slave/postgresql.conf
echo "promote_trigger_file='/tmp/failover.log'">> /tmp/slave/postgresql.conf
./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c
[edb@localhost bin]$ ./psql postgres -p 5432 -c "select pg_is_in_recovery();"
pg_is_in_recovery
-------------------
f
(1 row)
[edb@localhost bin]$ ./psql postgres -p 5433 -c "select pg_is_in_recovery();"
pg_is_in_recovery
-------------------
t
(1 row)
[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 6
pg_basebackup: error: could not list backup files: ERROR: the standby was promoted during online backup
HINT: This means that the backup being taken is corrupt and should not be used. Try taking another online backup.
pg_basebackup: removing data directory "/tmp/bkp_s"
#same is working fine without parallel backup
[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
[edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION
/tmp/bkp_s/PG_VERSIONThanks & Regards,Rajkumar RaghuwanshiOn Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif,In another scenarios, bkp data is corrupted for tablespace. again this is not reproducible everytime,but If I am running the same set of commands I am getting the same error.[edb@localhost bin]$ ./pg_ctl -D data -l logfile start
waiting for server to start.... done
server started
[edb@localhost bin]$
[edb@localhost bin]$ mkdir /tmp/tblsp
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp location '/tmp/tblsp';"
CREATE TABLESPACE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create database testdb tablespace tblsp;"
CREATE DATABASE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a text);"
CREATE TABLE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl values ('parallel_backup with tablespace');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T /tmp/tblsp=/tmp/tblsp_bkp --jobs 2
[edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p 5555" start
waiting for server to start.... done
server started
[edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
oid | spcname | spcowner | spcacl | spcoptions
-------+------------+----------+--------+------------
1663 | pg_default | 10 | |
16384 | tblsp | 10 | |
(2 rows)
[edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
psql: error: could not connect to server: FATAL: "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
DETAIL: File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is missing.
[edb@localhost bin]$
[edb@localhost bin]$ ls data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
[edb@localhost bin]$ ls /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
ls: cannot access /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or directoryThanks & Regards,Rajkumar RaghuwanshiOn Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif,
On testing further, I found when taking backup with -R, pg_basebackup crashedthis crash is not consistently reproducible.[edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a text);"CREATE TABLE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values ('parallel_backup with -R recovery-conf');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R
Segmentation fault (core dumped)
stack trace looks the same as it was on earlier reported crash with tablespace.
--stack trace
[edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
3175 backupinfo->curr = fetchfile->next;
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
#1 0x0000000000408a9e in worker_run (arg=0xc1e458) at pg_basebackup.c:2715
#2 0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at pthread_create.c:301
#3 0x00000039212e8c4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115(gdb)Thanks & Regards,Rajkumar RaghuwanshiOn Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:Hi Asif,Thanks Rajkumar. I have fixed the above issues and have rebased the patch to the latest master (b7f64c64).(V9 of the patches are attached).
I had a further review of the patches and here are my few observations:
1.
+/*
+ * stop_backup() - ends an online backup
+ *
+ * The function is called at the end of an online backup. It sends out pg_control
+ * file, optionally WAL segments and ending WAL location.
+ */
Comments seem out-dated.
2. With parallel jobs, maxrate is now not supported. Since we are now asking
data in multiple threads throttling seems important here. Can you please
explain why have you disabled that?
3. As we are always fetching a single file and as Robert suggested, let rename
SEND_FILES to SEND_FILE instead.
4. Does this work on Windows? I mean does pthread_create() work on Windows?
I asked this as I see that pgbench has its own implementation for
pthread_create() for WIN32 but this patch doesn't.
5. Typos:
tablspace => tablespace
safly => safely
6. parallel_backup_run() needs some comments explaining the states it goes
through PB_* states.
7.
+ case PB_FETCH_REL_FILES: /* fetch files from server */
+ if (backupinfo->activeworkers == 0)
+ {
+ backupinfo->backupstate = PB_STOP_BACKUP;
+ free_filelist(backupinfo);
+ }
+ break;
+ case PB_FETCH_WAL_FILES: /* fetch WAL files from server */
+ if (backupinfo->activeworkers == 0)
+ {
+ backupinfo->backupstate = PB_BACKUP_COMPLETE;
+ }
+ break;
Why free_filelist() is not called in PB_FETCH_WAL_FILES case?
The corrupted tablespace and crash, reported by Rajkumar, have been fixed. A pointer
Attached is the updated set of patches. AFAIK, to complete parallel backup feature
1- parallel backup does not work with a standby server. In parallel backup, the server
2- throttling. Robert previously suggested that we implement throttling on the client-side.
So, it was better to have a consensus before moving the throttle function to the client.
It seems to me that we have to maintain a shared state in order to support taking backup
Since multiple pg_basebackup can be running at the same time, maintaining a shared state
So proceeding on with this patch, I will be working on:
- adding a shared state to handle backup from the standby.
Attachment
On Wed, Mar 25, 2020 at 12:22 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif,
While testing further I observed parallel backup is not able to take backup of standby server.
mkdir /tmp/archive_dir
echo "archive_mode='on'">> data/postgresql.conf
echo "archive_command='cp %p /tmp/archive_dir/%f'">> data/postgresql.conf
./pg_ctl -D data -l logs start
./pg_basebackup -p 5432 -Fp -R -D /tmp/slave
echo "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">> /tmp/slave/postgresql.conf
echo "restore_command='cp /tmp/archive_dir/%f %p'">> /tmp/slave/postgresql.conf
echo "promote_trigger_file='/tmp/failover.log'">> /tmp/slave/postgresql.conf
./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c
[edb@localhost bin]$ ./psql postgres -p 5432 -c "select pg_is_in_recovery();"
pg_is_in_recovery
-------------------
f
(1 row)
[edb@localhost bin]$ ./psql postgres -p 5433 -c "select pg_is_in_recovery();"
pg_is_in_recovery
-------------------
t
(1 row)
[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 6
pg_basebackup: error: could not list backup files: ERROR: the standby was promoted during online backup
HINT: This means that the backup being taken is corrupt and should not be used. Try taking another online backup.
pg_basebackup: removing data directory "/tmp/bkp_s"
#same is working fine without parallel backup
[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
[edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION
/tmp/bkp_s/PG_VERSIONThanks & Regards,Rajkumar RaghuwanshiOn Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif,In another scenarios, bkp data is corrupted for tablespace. again this is not reproducible everytime,but If I am running the same set of commands I am getting the same error.[edb@localhost bin]$ ./pg_ctl -D data -l logfile start
waiting for server to start.... done
server started
[edb@localhost bin]$
[edb@localhost bin]$ mkdir /tmp/tblsp
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp location '/tmp/tblsp';"
CREATE TABLESPACE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create database testdb tablespace tblsp;"
CREATE DATABASE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a text);"
CREATE TABLE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl values ('parallel_backup with tablespace');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T /tmp/tblsp=/tmp/tblsp_bkp --jobs 2
[edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p 5555" start
waiting for server to start.... done
server started
[edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
oid | spcname | spcowner | spcacl | spcoptions
-------+------------+----------+--------+------------
1663 | pg_default | 10 | |
16384 | tblsp | 10 | |
(2 rows)
[edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
psql: error: could not connect to server: FATAL: "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
DETAIL: File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is missing.
[edb@localhost bin]$
[edb@localhost bin]$ ls data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
[edb@localhost bin]$ ls /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
ls: cannot access /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or directoryThanks & Regards,Rajkumar RaghuwanshiOn Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif,
On testing further, I found when taking backup with -R, pg_basebackup crashedthis crash is not consistently reproducible.[edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a text);"CREATE TABLE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values ('parallel_backup with -R recovery-conf');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R
Segmentation fault (core dumped)
stack trace looks the same as it was on earlier reported crash with tablespace.
--stack trace
[edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
3175 backupinfo->curr = fetchfile->next;
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
#1 0x0000000000408a9e in worker_run (arg=0xc1e458) at pg_basebackup.c:2715
#2 0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at pthread_create.c:301
#3 0x00000039212e8c4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115(gdb)Thanks & Regards,Rajkumar RaghuwanshiOn Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:Hi Asif,Thanks Rajkumar. I have fixed the above issues and have rebased the patch to the latest master (b7f64c64).(V9 of the patches are attached).
I had a further review of the patches and here are my few observations:
1.
+/*
+ * stop_backup() - ends an online backup
+ *
+ * The function is called at the end of an online backup. It sends out pg_control
+ * file, optionally WAL segments and ending WAL location.
+ */
Comments seem out-dated.Fixed.
2. With parallel jobs, maxrate is now not supported. Since we are now asking
data in multiple threads throttling seems important here. Can you please
explain why have you disabled that?
3. As we are always fetching a single file and as Robert suggested, let rename
SEND_FILES to SEND_FILE instead.Yes, we are fetching a single file. However, SEND_FILES is still capable of fetching multiple files in onego, that's why the name.
4. Does this work on Windows? I mean does pthread_create() work on Windows?
I asked this as I see that pgbench has its own implementation for
pthread_create() for WIN32 but this patch doesn't.patch is updated to add support for the Windows platform.
5. Typos:
tablspace => tablespace
safly => safelyDone.6. parallel_backup_run() needs some comments explaining the states it goes
through PB_* states.
7.
+ case PB_FETCH_REL_FILES: /* fetch files from server */
+ if (backupinfo->activeworkers == 0)
+ {
+ backupinfo->backupstate = PB_STOP_BACKUP;
+ free_filelist(backupinfo);
+ }
+ break;
+ case PB_FETCH_WAL_FILES: /* fetch WAL files from server */
+ if (backupinfo->activeworkers == 0)
+ {
+ backupinfo->backupstate = PB_BACKUP_COMPLETE;
+ }
+ break;Done.
Why free_filelist() is not called in PB_FETCH_WAL_FILES case?Done.
The corrupted tablespace and crash, reported by Rajkumar, have been fixed. A pointervariable remained uninitialized which in turn caused the system to misbehave.
Attached is the updated set of patches. AFAIK, to complete parallel backup featureset, there remain three sub-features:
1- parallel backup does not work with a standby server. In parallel backup, the serverspawns multiple processes and there is no shared state being maintained. So currently,no way to tell multiple processes if the standby was promoted during the backup sincethe START_BACKUP was called.
2- throttling. Robert previously suggested that we implement throttling on the client-side.However, I found a previous discussion where it was advocated to be added to thebackend instead[1].
So, it was better to have a consensus before moving the throttle function to the client.That’s why for the time being I have disabled it and have asked for suggestions on itto move forward.
It seems to me that we have to maintain a shared state in order to support taking backupfrom standby. Also, there is a new feature recently committed for backup progressreporting in the backend (pg_stat_progress_basebackup). This functionality was recentlyadded via this commit ID: e65497df. For parallel backup to update these stats, a sharedstate will be required.
Since multiple pg_basebackup can be running at the same time, maintaining a shared statecan become a little complex, unless we disallow taking multiple parallel backups.
So proceeding on with this patch, I will be working on:- throttling to be implemented on the client-side.
- adding a shared state to handle backup from the standby.--Asif Rehman
Thanks Asif,I have re-verified reported issue. expect standby backup, others are fixed.
Thanks & Regards,Rajkumar RaghuwanshiOn Fri, Mar 27, 2020 at 11:04 PM Asif Rehman <asifr.rehman@gmail.com> wrote:On Wed, Mar 25, 2020 at 12:22 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif,
While testing further I observed parallel backup is not able to take backup of standby server.
mkdir /tmp/archive_dir
echo "archive_mode='on'">> data/postgresql.conf
echo "archive_command='cp %p /tmp/archive_dir/%f'">> data/postgresql.conf
./pg_ctl -D data -l logs start
./pg_basebackup -p 5432 -Fp -R -D /tmp/slave
echo "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">> /tmp/slave/postgresql.conf
echo "restore_command='cp /tmp/archive_dir/%f %p'">> /tmp/slave/postgresql.conf
echo "promote_trigger_file='/tmp/failover.log'">> /tmp/slave/postgresql.conf
./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c
[edb@localhost bin]$ ./psql postgres -p 5432 -c "select pg_is_in_recovery();"
pg_is_in_recovery
-------------------
f
(1 row)
[edb@localhost bin]$ ./psql postgres -p 5433 -c "select pg_is_in_recovery();"
pg_is_in_recovery
-------------------
t
(1 row)
[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 6
pg_basebackup: error: could not list backup files: ERROR: the standby was promoted during online backup
HINT: This means that the backup being taken is corrupt and should not be used. Try taking another online backup.
pg_basebackup: removing data directory "/tmp/bkp_s"
#same is working fine without parallel backup
[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
[edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION
/tmp/bkp_s/PG_VERSIONThanks & Regards,Rajkumar RaghuwanshiOn Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif,In another scenarios, bkp data is corrupted for tablespace. again this is not reproducible everytime,but If I am running the same set of commands I am getting the same error.[edb@localhost bin]$ ./pg_ctl -D data -l logfile start
waiting for server to start.... done
server started
[edb@localhost bin]$
[edb@localhost bin]$ mkdir /tmp/tblsp
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp location '/tmp/tblsp';"
CREATE TABLESPACE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create database testdb tablespace tblsp;"
CREATE DATABASE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a text);"
CREATE TABLE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl values ('parallel_backup with tablespace');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T /tmp/tblsp=/tmp/tblsp_bkp --jobs 2
[edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p 5555" start
waiting for server to start.... done
server started
[edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
oid | spcname | spcowner | spcacl | spcoptions
-------+------------+----------+--------+------------
1663 | pg_default | 10 | |
16384 | tblsp | 10 | |
(2 rows)
[edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
psql: error: could not connect to server: FATAL: "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
DETAIL: File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is missing.
[edb@localhost bin]$
[edb@localhost bin]$ ls data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
[edb@localhost bin]$ ls /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
ls: cannot access /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or directoryThanks & Regards,Rajkumar RaghuwanshiOn Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif,
On testing further, I found when taking backup with -R, pg_basebackup crashedthis crash is not consistently reproducible.[edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a text);"CREATE TABLE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values ('parallel_backup with -R recovery-conf');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R
Segmentation fault (core dumped)
stack trace looks the same as it was on earlier reported crash with tablespace.
--stack trace
[edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
3175 backupinfo->curr = fetchfile->next;
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
#1 0x0000000000408a9e in worker_run (arg=0xc1e458) at pg_basebackup.c:2715
#2 0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at pthread_create.c:301
#3 0x00000039212e8c4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115(gdb)Thanks & Regards,Rajkumar RaghuwanshiOn Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:Hi Asif,Thanks Rajkumar. I have fixed the above issues and have rebased the patch to the latest master (b7f64c64).(V9 of the patches are attached).
I had a further review of the patches and here are my few observations:
1.
+/*
+ * stop_backup() - ends an online backup
+ *
+ * The function is called at the end of an online backup. It sends out pg_control
+ * file, optionally WAL segments and ending WAL location.
+ */
Comments seem out-dated.Fixed.
2. With parallel jobs, maxrate is now not supported. Since we are now asking
data in multiple threads throttling seems important here. Can you please
explain why have you disabled that?
3. As we are always fetching a single file and as Robert suggested, let rename
SEND_FILES to SEND_FILE instead.Yes, we are fetching a single file. However, SEND_FILES is still capable of fetching multiple files in onego, that's why the name.
4. Does this work on Windows? I mean does pthread_create() work on Windows?
I asked this as I see that pgbench has its own implementation for
pthread_create() for WIN32 but this patch doesn't.patch is updated to add support for the Windows platform.
5. Typos:
tablspace => tablespace
safly => safelyDone.6. parallel_backup_run() needs some comments explaining the states it goes
through PB_* states.
7.
+ case PB_FETCH_REL_FILES: /* fetch files from server */
+ if (backupinfo->activeworkers == 0)
+ {
+ backupinfo->backupstate = PB_STOP_BACKUP;
+ free_filelist(backupinfo);
+ }
+ break;
+ case PB_FETCH_WAL_FILES: /* fetch WAL files from server */
+ if (backupinfo->activeworkers == 0)
+ {
+ backupinfo->backupstate = PB_BACKUP_COMPLETE;
+ }
+ break;Done.
Why free_filelist() is not called in PB_FETCH_WAL_FILES case?Done.
The corrupted tablespace and crash, reported by Rajkumar, have been fixed. A pointervariable remained uninitialized which in turn caused the system to misbehave.
Attached is the updated set of patches. AFAIK, to complete parallel backup featureset, there remain three sub-features:
1- parallel backup does not work with a standby server. In parallel backup, the serverspawns multiple processes and there is no shared state being maintained. So currently,no way to tell multiple processes if the standby was promoted during the backup sincethe START_BACKUP was called.
2- throttling. Robert previously suggested that we implement throttling on the client-side.However, I found a previous discussion where it was advocated to be added to thebackend instead[1].
So, it was better to have a consensus before moving the throttle function to the client.That’s why for the time being I have disabled it and have asked for suggestions on itto move forward.
It seems to me that we have to maintain a shared state in order to support taking backupfrom standby. Also, there is a new feature recently committed for backup progressreporting in the backend (pg_stat_progress_basebackup). This functionality was recentlyadded via this commit ID: e65497df. For parallel backup to update these stats, a sharedstate will be required.
Since multiple pg_basebackup can be running at the same time, maintaining a shared statecan become a little complex, unless we disallow taking multiple parallel backups.
So proceeding on with this patch, I will be working on:- throttling to be implemented on the client-side.
- adding a shared state to handle backup from the standby.--Asif Rehman
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.ca
[edb@localhost bin]$ ./pgbench -i -s 200 -h localhost -p 5432 postgres
[edb@localhost bin]$ ./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C2000270 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57849"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (3) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (4) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (5) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (6) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (7) created
pg_basebackup: write-ahead log end point: 0/C3000050
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: base backup completed
[edb@localhost bin]$ ./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C20001C0 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57848"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (3) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (4) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (5) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (6) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (7) created
pg_basebackup: write-ahead log end point: 0/C2000348
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: base backup completed
pg_ctl: directory "/home/edb/Desktop/backup1" is not a database cluster directory
On Mon, Mar 30, 2020 at 3:44 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:Thanks Asif,I have re-verified reported issue. expect standby backup, others are fixed.Yes As Asif mentioned he is working on the standby issue and adding bandwidth throttling functionality to parallel backup.It would be good to get some feedback on Asif previous email from Robert on the design considerations for stand-by server support and throttling. I believe all the other points mentioned by Robert in this thread are addressed by Asif so it would be good to hear about any other concerns that are not addressed.Thanks,-- AhsanThanks & Regards,Rajkumar RaghuwanshiOn Fri, Mar 27, 2020 at 11:04 PM Asif Rehman <asifr.rehman@gmail.com> wrote:On Wed, Mar 25, 2020 at 12:22 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif,
While testing further I observed parallel backup is not able to take backup of standby server.
mkdir /tmp/archive_dir
echo "archive_mode='on'">> data/postgresql.conf
echo "archive_command='cp %p /tmp/archive_dir/%f'">> data/postgresql.conf
./pg_ctl -D data -l logs start
./pg_basebackup -p 5432 -Fp -R -D /tmp/slave
echo "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">> /tmp/slave/postgresql.conf
echo "restore_command='cp /tmp/archive_dir/%f %p'">> /tmp/slave/postgresql.conf
echo "promote_trigger_file='/tmp/failover.log'">> /tmp/slave/postgresql.conf
./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c
[edb@localhost bin]$ ./psql postgres -p 5432 -c "select pg_is_in_recovery();"
pg_is_in_recovery
-------------------
f
(1 row)
[edb@localhost bin]$ ./psql postgres -p 5433 -c "select pg_is_in_recovery();"
pg_is_in_recovery
-------------------
t
(1 row)
[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 6
pg_basebackup: error: could not list backup files: ERROR: the standby was promoted during online backup
HINT: This means that the backup being taken is corrupt and should not be used. Try taking another online backup.
pg_basebackup: removing data directory "/tmp/bkp_s"
#same is working fine without parallel backup
[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
[edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION
/tmp/bkp_s/PG_VERSIONThanks & Regards,Rajkumar RaghuwanshiOn Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif,In another scenarios, bkp data is corrupted for tablespace. again this is not reproducible everytime,but If I am running the same set of commands I am getting the same error.[edb@localhost bin]$ ./pg_ctl -D data -l logfile start
waiting for server to start.... done
server started
[edb@localhost bin]$
[edb@localhost bin]$ mkdir /tmp/tblsp
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp location '/tmp/tblsp';"
CREATE TABLESPACE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create database testdb tablespace tblsp;"
CREATE DATABASE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a text);"
CREATE TABLE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl values ('parallel_backup with tablespace');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T /tmp/tblsp=/tmp/tblsp_bkp --jobs 2
[edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p 5555" start
waiting for server to start.... done
server started
[edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
oid | spcname | spcowner | spcacl | spcoptions
-------+------------+----------+--------+------------
1663 | pg_default | 10 | |
16384 | tblsp | 10 | |
(2 rows)
[edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
psql: error: could not connect to server: FATAL: "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
DETAIL: File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is missing.
[edb@localhost bin]$
[edb@localhost bin]$ ls data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
[edb@localhost bin]$ ls /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
ls: cannot access /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or directoryThanks & Regards,Rajkumar RaghuwanshiOn Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif,
On testing further, I found when taking backup with -R, pg_basebackup crashedthis crash is not consistently reproducible.[edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a text);"CREATE TABLE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values ('parallel_backup with -R recovery-conf');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R
Segmentation fault (core dumped)
stack trace looks the same as it was on earlier reported crash with tablespace.
--stack trace
[edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
3175 backupinfo->curr = fetchfile->next;
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
#1 0x0000000000408a9e in worker_run (arg=0xc1e458) at pg_basebackup.c:2715
#2 0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at pthread_create.c:301
#3 0x00000039212e8c4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115(gdb)Thanks & Regards,Rajkumar RaghuwanshiOn Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:Hi Asif,Thanks Rajkumar. I have fixed the above issues and have rebased the patch to the latest master (b7f64c64).(V9 of the patches are attached).
I had a further review of the patches and here are my few observations:
1.
+/*
+ * stop_backup() - ends an online backup
+ *
+ * The function is called at the end of an online backup. It sends out pg_control
+ * file, optionally WAL segments and ending WAL location.
+ */
Comments seem out-dated.Fixed.
2. With parallel jobs, maxrate is now not supported. Since we are now asking
data in multiple threads throttling seems important here. Can you please
explain why have you disabled that?
3. As we are always fetching a single file and as Robert suggested, let rename
SEND_FILES to SEND_FILE instead.Yes, we are fetching a single file. However, SEND_FILES is still capable of fetching multiple files in onego, that's why the name.
4. Does this work on Windows? I mean does pthread_create() work on Windows?
I asked this as I see that pgbench has its own implementation for
pthread_create() for WIN32 but this patch doesn't.patch is updated to add support for the Windows platform.
5. Typos:
tablspace => tablespace
safly => safelyDone.6. parallel_backup_run() needs some comments explaining the states it goes
through PB_* states.
7.
+ case PB_FETCH_REL_FILES: /* fetch files from server */
+ if (backupinfo->activeworkers == 0)
+ {
+ backupinfo->backupstate = PB_STOP_BACKUP;
+ free_filelist(backupinfo);
+ }
+ break;
+ case PB_FETCH_WAL_FILES: /* fetch WAL files from server */
+ if (backupinfo->activeworkers == 0)
+ {
+ backupinfo->backupstate = PB_BACKUP_COMPLETE;
+ }
+ break;Done.
Why free_filelist() is not called in PB_FETCH_WAL_FILES case?Done.
The corrupted tablespace and crash, reported by Rajkumar, have been fixed. A pointervariable remained uninitialized which in turn caused the system to misbehave.
Attached is the updated set of patches. AFAIK, to complete parallel backup featureset, there remain three sub-features:
1- parallel backup does not work with a standby server. In parallel backup, the serverspawns multiple processes and there is no shared state being maintained. So currently,no way to tell multiple processes if the standby was promoted during the backup sincethe START_BACKUP was called.
2- throttling. Robert previously suggested that we implement throttling on the client-side.However, I found a previous discussion where it was advocated to be added to thebackend instead[1].
So, it was better to have a consensus before moving the throttle function to the client.That’s why for the time being I have disabled it and have asked for suggestions on itto move forward.
It seems to me that we have to maintain a shared state in order to support taking backupfrom standby. Also, there is a new feature recently committed for backup progressreporting in the backend (pg_stat_progress_basebackup). This functionality was recentlyadded via this commit ID: e65497df. For parallel backup to update these stats, a sharedstate will be required.
Since multiple pg_basebackup can be running at the same time, maintaining a shared statecan become a little complex, unless we disallow taking multiple parallel backups.
So proceeding on with this patch, I will be working on:- throttling to be implemented on the client-side.
- adding a shared state to handle backup from the standby.--Asif Rehman--Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.ca
[edb@localhost bin]$
^[[A[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 1/F1000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_58692"
pg_basebackup: backup worker (0) created
….
…..
…..
pg_basebackup: backup worker (1017) created
pg_basebackup: error: could not connect to server: could not look up local user ID 1000: Too many open files
pg_basebackup: backup worker (1018) created
pg_basebackup: error: could not connect to server: could not look up local user ID 1000: Too many open files
…
…
…
pg_basebackup: error: could not connect to server: could not look up local user ID 1000: Too many open files
pg_basebackup: backup worker (1989) created
pg_basebackup: error: could not create file "/home/edb/Desktop/backup//global/4183": Too many open files
pg_basebackup: error: could not create file "/home/edb/Desktop/backup//global/3592": Too many open files
pg_basebackup: error: could not create file "/home/edb/Desktop/backup//global/4177": Too many open files
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
base pg_commit_ts pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_xact
global pg_dynshmem pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc pg_wal
[edb@localhost bin]$
Hi Asif,My colleague Kashif Zeeshan reported an issue off-list, posting here, please take a look.When executing two backups at the same time, getting FATAL error due to max_wal_senders and instead of exit Backup got completedAnd when tried to start the server from the backup cluster, getting error.
[edb@localhost bin]$ ./pgbench -i -s 200 -h localhost -p 5432 postgres
[edb@localhost bin]$ ./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C2000270 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57849"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (3) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (4) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (5) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (6) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (7) created
pg_basebackup: write-ahead log end point: 0/C3000050
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: base backup completed
[edb@localhost bin]$ ./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C20001C0 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57848"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (3) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (4) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (5) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (6) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: backup worker (7) created
pg_basebackup: write-ahead log end point: 0/C2000348
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: base backup completed[edb@localhost bin]$ ./pg_ctl -D /home/edb/Desktop/backup1/ -o "-p 5438" start
pg_ctl: directory "/home/edb/Desktop/backup1" is not a database cluster directoryThanks & Regards,Rajkumar RaghuwanshiOn Mon, Mar 30, 2020 at 6:28 PM Ahsan Hadi <ahsan.hadi@gmail.com> wrote:On Mon, Mar 30, 2020 at 3:44 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:Thanks Asif,I have re-verified reported issue. expect standby backup, others are fixed.Yes As Asif mentioned he is working on the standby issue and adding bandwidth throttling functionality to parallel backup.It would be good to get some feedback on Asif previous email from Robert on the design considerations for stand-by server support and throttling. I believe all the other points mentioned by Robert in this thread are addressed by Asif so it would be good to hear about any other concerns that are not addressed.Thanks,-- AhsanThanks & Regards,Rajkumar RaghuwanshiOn Fri, Mar 27, 2020 at 11:04 PM Asif Rehman <asifr.rehman@gmail.com> wrote:On Wed, Mar 25, 2020 at 12:22 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif,
While testing further I observed parallel backup is not able to take backup of standby server.
mkdir /tmp/archive_dir
echo "archive_mode='on'">> data/postgresql.conf
echo "archive_command='cp %p /tmp/archive_dir/%f'">> data/postgresql.conf
./pg_ctl -D data -l logs start
./pg_basebackup -p 5432 -Fp -R -D /tmp/slave
echo "primary_conninfo='host=127.0.0.1 port=5432 user=edb'">> /tmp/slave/postgresql.conf
echo "restore_command='cp /tmp/archive_dir/%f %p'">> /tmp/slave/postgresql.conf
echo "promote_trigger_file='/tmp/failover.log'">> /tmp/slave/postgresql.conf
./pg_ctl -D /tmp/slave -l /tmp/slave_logs -o "-p 5433" start -c
[edb@localhost bin]$ ./psql postgres -p 5432 -c "select pg_is_in_recovery();"
pg_is_in_recovery
-------------------
f
(1 row)
[edb@localhost bin]$ ./psql postgres -p 5433 -c "select pg_is_in_recovery();"
pg_is_in_recovery
-------------------
t
(1 row)
[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 6
pg_basebackup: error: could not list backup files: ERROR: the standby was promoted during online backup
HINT: This means that the backup being taken is corrupt and should not be used. Try taking another online backup.
pg_basebackup: removing data directory "/tmp/bkp_s"
#same is working fine without parallel backup
[edb@localhost bin]$ ./pg_basebackup -p 5433 -D /tmp/bkp_s --jobs 1
[edb@localhost bin]$ ls /tmp/bkp_s/PG_VERSION
/tmp/bkp_s/PG_VERSIONThanks & Regards,Rajkumar RaghuwanshiOn Thu, Mar 19, 2020 at 4:11 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif,In another scenarios, bkp data is corrupted for tablespace. again this is not reproducible everytime,but If I am running the same set of commands I am getting the same error.[edb@localhost bin]$ ./pg_ctl -D data -l logfile start
waiting for server to start.... done
server started
[edb@localhost bin]$
[edb@localhost bin]$ mkdir /tmp/tblsp
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp location '/tmp/tblsp';"
CREATE TABLESPACE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create database testdb tablespace tblsp;"
CREATE DATABASE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "create table testtbl (a text);"
CREATE TABLE
[edb@localhost bin]$ ./psql testdb -p 5432 -c "insert into testtbl values ('parallel_backup with tablespace');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/bkp -T /tmp/tblsp=/tmp/tblsp_bkp --jobs 2
[edb@localhost bin]$ ./pg_ctl -D /tmp/bkp -l /tmp/bkp_logs -o "-p 5555" start
waiting for server to start.... done
server started
[edb@localhost bin]$ ./psql postgres -p 5555 -c "select * from pg_tablespace where spcname like 'tblsp%' or spcname = 'pg_default'";
oid | spcname | spcowner | spcacl | spcoptions
-------+------------+----------+--------+------------
1663 | pg_default | 10 | |
16384 | tblsp | 10 | |
(2 rows)
[edb@localhost bin]$ ./psql testdb -p 5555 -c "select * from testtbl";
psql: error: could not connect to server: FATAL: "pg_tblspc/16384/PG_13_202003051/16385" is not a valid data directory
DETAIL: File "pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION" is missing.
[edb@localhost bin]$
[edb@localhost bin]$ ls data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
data/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
[edb@localhost bin]$ ls /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION
ls: cannot access /tmp/bkp/pg_tblspc/16384/PG_13_202003051/16385/PG_VERSION: No such file or directoryThanks & Regards,Rajkumar RaghuwanshiOn Mon, Mar 16, 2020 at 6:19 PM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:Hi Asif,
On testing further, I found when taking backup with -R, pg_basebackup crashedthis crash is not consistently reproducible.[edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a text);"CREATE TABLE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values ('parallel_backup with -R recovery-conf');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R
Segmentation fault (core dumped)
stack trace looks the same as it was on earlier reported crash with tablespace.
--stack trace
[edb@localhost bin]$ gdb -q -c core.37915 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 -j 2 -D /tmp/test_bkp/bkp -R'.
Program terminated with signal 11, Segmentation fault.
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
3175 backupinfo->curr = fetchfile->next;
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x00000000004099ee in worker_get_files (wstate=0xc1e458) at pg_basebackup.c:3175
#1 0x0000000000408a9e in worker_run (arg=0xc1e458) at pg_basebackup.c:2715
#2 0x0000003921a07aa1 in start_thread (arg=0x7f72207c0700) at pthread_create.c:301
#3 0x00000039212e8c4d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115(gdb)Thanks & Regards,Rajkumar RaghuwanshiOn Mon, Mar 16, 2020 at 2:14 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:Hi Asif,Thanks Rajkumar. I have fixed the above issues and have rebased the patch to the latest master (b7f64c64).(V9 of the patches are attached).
I had a further review of the patches and here are my few observations:
1.
+/*
+ * stop_backup() - ends an online backup
+ *
+ * The function is called at the end of an online backup. It sends out pg_control
+ * file, optionally WAL segments and ending WAL location.
+ */
Comments seem out-dated.Fixed.
2. With parallel jobs, maxrate is now not supported. Since we are now asking
data in multiple threads throttling seems important here. Can you please
explain why have you disabled that?
3. As we are always fetching a single file and as Robert suggested, let rename
SEND_FILES to SEND_FILE instead.Yes, we are fetching a single file. However, SEND_FILES is still capable of fetching multiple files in onego, that's why the name.
4. Does this work on Windows? I mean does pthread_create() work on Windows?
I asked this as I see that pgbench has its own implementation for
pthread_create() for WIN32 but this patch doesn't.patch is updated to add support for the Windows platform.
5. Typos:
tablspace => tablespace
safly => safelyDone.6. parallel_backup_run() needs some comments explaining the states it goes
through PB_* states.
7.
+ case PB_FETCH_REL_FILES: /* fetch files from server */
+ if (backupinfo->activeworkers == 0)
+ {
+ backupinfo->backupstate = PB_STOP_BACKUP;
+ free_filelist(backupinfo);
+ }
+ break;
+ case PB_FETCH_WAL_FILES: /* fetch WAL files from server */
+ if (backupinfo->activeworkers == 0)
+ {
+ backupinfo->backupstate = PB_BACKUP_COMPLETE;
+ }
+ break;Done.
Why free_filelist() is not called in PB_FETCH_WAL_FILES case?Done.
The corrupted tablespace and crash, reported by Rajkumar, have been fixed. A pointervariable remained uninitialized which in turn caused the system to misbehave.
Attached is the updated set of patches. AFAIK, to complete parallel backup featureset, there remain three sub-features:
1- parallel backup does not work with a standby server. In parallel backup, the serverspawns multiple processes and there is no shared state being maintained. So currently,no way to tell multiple processes if the standby was promoted during the backup sincethe START_BACKUP was called.
2- throttling. Robert previously suggested that we implement throttling on the client-side.However, I found a previous discussion where it was advocated to be added to thebackend instead[1].
So, it was better to have a consensus before moving the throttle function to the client.That’s why for the time being I have disabled it and have asked for suggestions on itto move forward.
It seems to me that we have to maintain a shared state in order to support taking backupfrom standby. Also, there is a new feature recently committed for backup progressreporting in the backend (pg_stat_progress_basebackup). This functionality was recentlyadded via this commit ID: e65497df. For parallel backup to update these stats, a sharedstate will be required.
Since multiple pg_basebackup can be running at the same time, maintaining a shared statecan become a little complex, unless we disallow taking multiple parallel backups.
So proceeding on with this patch, I will be working on:- throttling to be implemented on the client-side.
- adding a shared state to handle backup from the standby.--Asif Rehman--Highgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.ca
--

On Fri, Mar 27, 2020 at 1:34 PM Asif Rehman <asifr.rehman@gmail.com> wrote: > Yes, we are fetching a single file. However, SEND_FILES is still capable of fetching multiple files in one > go, that's why the name. I don't see why it should work that way. If we're fetching individual files, why have an unused capability to fetch multiple files? > 1- parallel backup does not work with a standby server. In parallel backup, the server > spawns multiple processes and there is no shared state being maintained. So currently, > no way to tell multiple processes if the standby was promoted during the backup since > the START_BACKUP was called. Why would you need to do that? As long as the process where STOP_BACKUP can do the check, that seems good enough. > 2- throttling. Robert previously suggested that we implement throttling on the client-side. > However, I found a previous discussion where it was advocated to be added to the > backend instead[1]. > > So, it was better to have a consensus before moving the throttle function to the client. > That’s why for the time being I have disabled it and have asked for suggestions on it > to move forward. > > It seems to me that we have to maintain a shared state in order to support taking backup > from standby. Also, there is a new feature recently committed for backup progress > reporting in the backend (pg_stat_progress_basebackup). This functionality was recently > added via this commit ID: e65497df. For parallel backup to update these stats, a shared > state will be required. I've come around to the view that a shared state is a good idea and that throttling on the server-side makes more sense. I'm not clear on whether we need shared state only for throttling or whether we need it for more than that. Another possible reason might be for the progress-reporting stuff that just got added. > Since multiple pg_basebackup can be running at the same time, maintaining a shared state > can become a little complex, unless we disallow taking multiple parallel backups. I do not see why it would be necessary to disallow taking multiple parallel backups. You just need to have multiple copies of the shared state and a way to decide which one to use for any particular backup. I guess that is a little complex, but only a little. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
The backup failed with errors "error: could not connect to server: could not look up local user ID 1000: Too many open files" when the max_wal_senders was set to 2000.The errors generated for the workers starting from backup worke=1017.
On Thu, Apr 2, 2020 at 7:30 AM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:The backup failed with errors "error: could not connect to server: could not look up local user ID 1000: Too many open files" when the max_wal_senders was set to 2000.The errors generated for the workers starting from backup worke=1017.It wasn't the fact that you set max_wal_senders to 2000. It was the fact that you specified 1990 parallel workers. By so doing, you overloaded the machine, which is why everything failed. That's to be expected.
--
--

On Thu, Apr 2, 2020 at 7:55 AM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote: > Thanks alot Robert, > In this case the backup folder was not being emptied as the backup was failed, the cleanup should be done in this casetoo. Does it fail to clean up the backup folder in all cases where the backup failed, or just in this case? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Apr 2, 2020 at 7:55 AM Kashif Zeeshan
<kashif.zeeshan@enterprisedb.com> wrote:
> Thanks alot Robert,
> In this case the backup folder was not being emptied as the backup was failed, the cleanup should be done in this case too.
Does it fail to clean up the backup folder in all cases where the
backup failed, or just in this case?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--

Does it fail to clean up the backup folder in all cases where the
backup failed, or just in this case?The cleanup is done in the cases I have seen so far with base pg_basebackup functionality (not including the parallel backup feature) with the message "pg_basebackup: removing contents of data directory"A similar case was also fixed for parallel backup reported by Rajkumar where the contents of the backup folder were not cleaned up after the error.
On Fri, Mar 27, 2020 at 1:34 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
> Yes, we are fetching a single file. However, SEND_FILES is still capable of fetching multiple files in one
> go, that's why the name.
I don't see why it should work that way. If we're fetching individual
files, why have an unused capability to fetch multiple files?
> 1- parallel backup does not work with a standby server. In parallel backup, the server
> spawns multiple processes and there is no shared state being maintained. So currently,
> no way to tell multiple processes if the standby was promoted during the backup since
> the START_BACKUP was called.
Why would you need to do that? As long as the process where
STOP_BACKUP can do the check, that seems good enough.
> 2- throttling. Robert previously suggested that we implement throttling on the client-side.
> However, I found a previous discussion where it was advocated to be added to the
> backend instead[1].
>
> So, it was better to have a consensus before moving the throttle function to the client.
> That’s why for the time being I have disabled it and have asked for suggestions on it
> to move forward.
>
> It seems to me that we have to maintain a shared state in order to support taking backup
> from standby. Also, there is a new feature recently committed for backup progress
> reporting in the backend (pg_stat_progress_basebackup). This functionality was recently
> added via this commit ID: e65497df. For parallel backup to update these stats, a shared
> state will be required.
I've come around to the view that a shared state is a good idea and
that throttling on the server-side makes more sense. I'm not clear on
whether we need shared state only for throttling or whether we need it
for more than that. Another possible reason might be for the
progress-reporting stuff that just got added.
> Since multiple pg_basebackup can be running at the same time, maintaining a shared state
> can become a little complex, unless we disallow taking multiple parallel backups.
I do not see why it would be necessary to disallow taking multiple
parallel backups. You just need to have multiple copies of the shared
state and a way to decide which one to use for any particular backup.
I guess that is a little complex, but only a little.
There are two possible options:
(1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
(2) (Preferred Option) Use the WAL start location as the BackupID.
This BackupID should be given back as a response to start backup command. All client workers
must append this ID to all parallel backup replication commands. So that we can use this identifier
to search for that particular backup. Does that sound good?
On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com> wrote: >> Why would you need to do that? As long as the process where >> STOP_BACKUP can do the check, that seems good enough. > > Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is > in progress. So if the backup is a large one, early error detection would be much beneficial. > This is the current behavior of non-parallel backup as well. Because non-parallel backup does not feature early detection of this error, it is not necessary to make parallel backup do so. Indeed, it is undesirable. If you want to fix that problem, do it on a separate thread in a separate patch. A patch proposing to make parallel backup inconsistent in behavior with non-parallel backup will be rejected, at least if I have anything to say about it. TBH, fixing this doesn't seem like an urgent problem to me. The current situation is not great, but promotions ought to be relatively infrequent, so I'm not sure it's a huge problem in practice. It is also worth considering whether the right fix is to figure out how to make that case actually work, rather than just making it fail quicker. I don't currently understand the reason for the prohibition so I can't express an intelligent opinion on what the right answer is here, but it seems like it ought to be investigated before somebody goes and builds a bunch of infrastructure to make the error more timely. > Okay, then I will add the shared state. And since we are adding the shared state, we can use > that for throttling, progress-reporting and standby early error checking. Please propose a grammar here for all the new replication commands you plan to add before going and implement everything. That will make it easier to hash out the design without forcing you to keep changing the code. Your design should include a sketch of how several sets of coordinating backends taking several concurrent parallel backups will end up with one shared state per parallel backup. > There are two possible options: > > (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR > (2) (Preferred Option) Use the WAL start location as the BackupID. > > This BackupID should be given back as a response to start backup command. All client workers > must append this ID to all parallel backup replication commands. So that we can use this identifier > to search for that particular backup. Does that sound good? Using the WAL start location as the backup ID seems like it might be problematic -- could a single checkpoint not end up as the start location for multiple backups started at the same time? Whether that's possible now or not, it seems unwise to hard-wire that assumption into the wire protocol. I was thinking that perhaps the client should generate a unique backup ID, e.g. leader does: START_BACKUP unique_backup_id [options]... And then others do: JOIN_BACKUP unique_backup_id My thought is that you will have a number of shared memory structure equal to max_wal_senders, each one large enough to hold the shared state for one backup. The shared state will include char[NAMEDATALEN-or-something] which will be used to hold the backup ID. START_BACKUP would allocate one and copy the name into it; JOIN_BACKUP would search for one by name. If you want to generate the name on the server side, then I suppose START_BACKUP would return a result set that includes the backup ID, and clients would have to specify that same backup ID when invoking JOIN_BACKUP. The rest would stay the same. I am not sure which way is better. Either way, the backup ID should be something long and hard to guess, not e.g. the leader processes' PID. I think we should generate it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the result to get a string. That way there's almost no risk of two backup IDs colliding accidentally, and even if we somehow had a malicious user trying to screw up somebody else's parallel backup by choosing a colliding backup ID, it would be pretty hard to have any success. A user with enough access to do that sort of thing can probably cause a lot worse problems anyway, but it seems pretty easy to guard against intentional collisions robustly here, so I think we should. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> Why would you need to do that? As long as the process where
>> STOP_BACKUP can do the check, that seems good enough.
>
> Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
> in progress. So if the backup is a large one, early error detection would be much beneficial.
> This is the current behavior of non-parallel backup as well.
Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.
TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.
Non-parallel backup already does the early error checking. I only intended
to make parallel behave the same as non-parallel here. So, I agree with
you that the behavior of parallel backup should be consistent with the
non-parallel one. Please see the code snippet below from
basebackup.c:sendDir()
/*
* Check if the postmaster has signaled us to exit, and abort with an
* error in that case. The error handler further up will call
* do_pg_abort_backup() for us. Also check that if the backup was
* started while still in recovery, the server wasn't promoted.
* do_pg_stop_backup() will check that too, but it's better to stop
* the backup early than continue to the end and fail there.
*/
CHECK_FOR_INTERRUPTS();
if (RecoveryInProgress() != backup_started_in_recovery)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("the standby was promoted during online backup"),
errhint("This means that the backup being taken is corrupt "
"and should not be used. "
"Try taking another online backup.")));
> Okay, then I will add the shared state. And since we are adding the shared state, we can use
> that for throttling, progress-reporting and standby early error checking.
Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.
> There are two possible options:
>
> (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> (2) (Preferred Option) Use the WAL start location as the BackupID.
>
> This BackupID should be given back as a response to start backup command. All client workers
> must append this ID to all parallel backup replication commands. So that we can use this identifier
> to search for that particular backup. Does that sound good?
Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.
I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:
START_BACKUP unique_backup_id [options]...
And then others do:
JOIN_BACKUP unique_backup_id
My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.
If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.
to make workers find the relevant shared state. There won't be any need for changing
the grammar for any other command. The START_BACKUP can return the unique_backup_id
in the result set.
I am thinking of the following struct for shared state:
typedef struct
{
char backupid[NAMEDATALEN];
XLogRecPtr startptr;
slock_t lock;
int64 throttling_counter;
bool backup_started_in_recovery;
} BackupSharedState;
[edb@localhost bin]$ mkdir /home/edb/tbl1
[edb@localhost bin]$ mkdir /home/edb/tbl_res
[edb@localhost bin]$
postgres=# create tablespace tbl1 location '/home/edb/tbl1';
CREATE TABLESPACE
postgres=#
postgres=# create table t1 (a int) tablespace tbl1;
CREATE TABLE
postgres=# insert into t1 values(100);
INSERT 0 1
postgres=# insert into t1 values(200);
INSERT 0 1
postgres=# insert into t1 values(300);
INSERT 0 1
postgres=#
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 2 -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test" does not exist
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: write-ahead log end point: 0/2E000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child thread exited with error 1
[edb@localhost bin]$
backup folder not cleaned
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label global pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION pg_xact postgresql.conf
base pg_commit_ts pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_wal postgresql.auto.conf
[edb@localhost bin]$
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test999" does not exist
pg_basebackup: write-ahead log end point: 0/2B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory " /home/edb/Desktop/backup"
pg_basebackup: changes to tablespace directories will not be undone
On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote:On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> Why would you need to do that? As long as the process where
>> STOP_BACKUP can do the check, that seems good enough.
>
> Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
> in progress. So if the backup is a large one, early error detection would be much beneficial.
> This is the current behavior of non-parallel backup as well.
Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.
TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.Non-parallel backup already does the early error checking. I only intended
to make parallel behave the same as non-parallel here. So, I agree with
you that the behavior of parallel backup should be consistent with the
non-parallel one. Please see the code snippet below from
basebackup.c:sendDir()
/*
* Check if the postmaster has signaled us to exit, and abort with an
* error in that case. The error handler further up will call
* do_pg_abort_backup() for us. Also check that if the backup was
* started while still in recovery, the server wasn't promoted.
* do_pg_stop_backup() will check that too, but it's better to stop
* the backup early than continue to the end and fail there.
*/
CHECK_FOR_INTERRUPTS();
if (RecoveryInProgress() != backup_started_in_recovery)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("the standby was promoted during online backup"),
errhint("This means that the backup being taken is corrupt "
"and should not be used. "
"Try taking another online backup.")));
> Okay, then I will add the shared state. And since we are adding the shared state, we can use
> that for throttling, progress-reporting and standby early error checking.
Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.
> There are two possible options:
>
> (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> (2) (Preferred Option) Use the WAL start location as the BackupID.
>
> This BackupID should be given back as a response to start backup command. All client workers
> must append this ID to all parallel backup replication commands. So that we can use this identifier
> to search for that particular backup. Does that sound good?
Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.
I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:
START_BACKUP unique_backup_id [options]...
And then others do:
JOIN_BACKUP unique_backup_id
My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.
If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.Okay so If we are to add another replication command ‘JOIN_BACKUP unique_backup_id’
to make workers find the relevant shared state. There won't be any need for changing
the grammar for any other command. The START_BACKUP can return the unique_backup_id
in the result set.
I am thinking of the following struct for shared state:typedef struct
{
char backupid[NAMEDATALEN];
XLogRecPtr startptr;
slock_t lock;
int64 throttling_counter;
bool backup_started_in_recovery;
} BackupSharedState;
The shared state structure entries would be maintained by a shared hash table.There will be one structure per parallel backup. Since a single parallel backupcan engage more than one wal sender, so I think max_wal_senders might be a littletoo much; perhaps max_wal_senders/2 since there will be at least 2 connectionsper parallel backup? Alternatively, we can set a new GUC that defines the maximumnumber of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’perhaps, or we can make it user-configurable.The key would be “backupid=hex_encode(pg_random_strong(16))”Checking for Standby Promotion:At the START_BACKUP command, we initialize BackupSharedState.backup_started_in_recoveryand keep checking it whenever send_file () is called to send a new file.Throttling:BackupSharedState.throttling_counter - The throttling logic remains the sameas for non-parallel backup with the exception that multiple threads will now beupdating it. So in parallel backup, this will represent the overall bytes thathave been transferred. So the workers would sleep if they have exceeded thelimit. Hence, the shared state carries a lock to safely update the throttlingvalue atomically.Progress Reporting:Although I think we should add progress-reporting for parallel backup as aseparate patch. The relevant entries for progress-reporting such as‘backup_total’ and ‘backup_streamed’ would be then added to this structureas well.Grammar:There is a change in the resultset being returned for START_BACKUP command;unique_backup_id is added. Additionally, JOIN_BACKUP replication command isadded. SEND_FILES has been renamed to SEND_FILE. There are no other changesto the grammar.START_BACKUP [LABEL '<label>'] [FAST]- returns startptr, tli, backup_label, unique_backup_idSTOP_BACKUP [NOWAIT]- returns startptr, tli, backup_labelJOIN_BACKUP ‘unique_backup_id’- attaches a shared state identified by ‘unique_backup_id’ to a backend process.LIST_TABLESPACES [PROGRESS]LIST_FILES [TABLESPACE]LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]--Asif Rehman
--

After recent backup manifest addition, patches needed to rebase and
reconsideration of a few things like making sure that parallel backup creates
a manifest file correctly or not etc.
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Hi AsifWhen a non-existent slot is used with tablespace then correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.Steps=======edb@localhost bin]$
[edb@localhost bin]$ mkdir /home/edb/tbl1
[edb@localhost bin]$ mkdir /home/edb/tbl_res
[edb@localhost bin]$
postgres=# create tablespace tbl1 location '/home/edb/tbl1';
CREATE TABLESPACE
postgres=#
postgres=# create table t1 (a int) tablespace tbl1;
CREATE TABLE
postgres=# insert into t1 values(100);
INSERT 0 1
postgres=# insert into t1 values(200);
INSERT 0 1
postgres=# insert into t1 values(300);
INSERT 0 1
postgres=#
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 2 -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test" does not exist
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: write-ahead log end point: 0/2E000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child thread exited with error 1
[edb@localhost bin]$
backup folder not cleaned
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label global pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION pg_xact postgresql.conf
base pg_commit_ts pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_wal postgresql.auto.conf
[edb@localhost bin]$If the same case is executed without the parallel backup patch then the backup folder is cleaned after the error is displayed.[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test999
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test999" does not exist
pg_basebackup: write-ahead log end point: 0/2B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory " /home/edb/Desktop/backup"
pg_basebackup: changes to tablespace directories will not be undone
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57337"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_5590"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
[edb@localhost bin]$
On Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com> wrote:On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote:On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> Why would you need to do that? As long as the process where
>> STOP_BACKUP can do the check, that seems good enough.
>
> Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
> in progress. So if the backup is a large one, early error detection would be much beneficial.
> This is the current behavior of non-parallel backup as well.
Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.
TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.Non-parallel backup already does the early error checking. I only intended
to make parallel behave the same as non-parallel here. So, I agree with
you that the behavior of parallel backup should be consistent with the
non-parallel one. Please see the code snippet below from
basebackup.c:sendDir()
/*
* Check if the postmaster has signaled us to exit, and abort with an
* error in that case. The error handler further up will call
* do_pg_abort_backup() for us. Also check that if the backup was
* started while still in recovery, the server wasn't promoted.
* do_pg_stop_backup() will check that too, but it's better to stop
* the backup early than continue to the end and fail there.
*/
CHECK_FOR_INTERRUPTS();
if (RecoveryInProgress() != backup_started_in_recovery)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("the standby was promoted during online backup"),
errhint("This means that the backup being taken is corrupt "
"and should not be used. "
"Try taking another online backup.")));
> Okay, then I will add the shared state. And since we are adding the shared state, we can use
> that for throttling, progress-reporting and standby early error checking.
Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.
> There are two possible options:
>
> (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> (2) (Preferred Option) Use the WAL start location as the BackupID.
>
> This BackupID should be given back as a response to start backup command. All client workers
> must append this ID to all parallel backup replication commands. So that we can use this identifier
> to search for that particular backup. Does that sound good?
Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.
I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:
START_BACKUP unique_backup_id [options]...
And then others do:
JOIN_BACKUP unique_backup_id
My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.
If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.Okay so If we are to add another replication command ‘JOIN_BACKUP unique_backup_id’
to make workers find the relevant shared state. There won't be any need for changing
the grammar for any other command. The START_BACKUP can return the unique_backup_id
in the result set.
I am thinking of the following struct for shared state:typedef struct
{
char backupid[NAMEDATALEN];
XLogRecPtr startptr;
slock_t lock;
int64 throttling_counter;
bool backup_started_in_recovery;
} BackupSharedState;
The shared state structure entries would be maintained by a shared hash table.There will be one structure per parallel backup. Since a single parallel backupcan engage more than one wal sender, so I think max_wal_senders might be a littletoo much; perhaps max_wal_senders/2 since there will be at least 2 connectionsper parallel backup? Alternatively, we can set a new GUC that defines the maximumnumber of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’perhaps, or we can make it user-configurable.The key would be “backupid=hex_encode(pg_random_strong(16))”Checking for Standby Promotion:At the START_BACKUP command, we initialize BackupSharedState.backup_started_in_recoveryand keep checking it whenever send_file () is called to send a new file.Throttling:BackupSharedState.throttling_counter - The throttling logic remains the sameas for non-parallel backup with the exception that multiple threads will now beupdating it. So in parallel backup, this will represent the overall bytes thathave been transferred. So the workers would sleep if they have exceeded thelimit. Hence, the shared state carries a lock to safely update the throttlingvalue atomically.Progress Reporting:Although I think we should add progress-reporting for parallel backup as aseparate patch. The relevant entries for progress-reporting such as‘backup_total’ and ‘backup_streamed’ would be then added to this structureas well.Grammar:There is a change in the resultset being returned for START_BACKUP command;unique_backup_id is added. Additionally, JOIN_BACKUP replication command isadded. SEND_FILES has been renamed to SEND_FILE. There are no other changesto the grammar.START_BACKUP [LABEL '<label>'] [FAST]- returns startptr, tli, backup_label, unique_backup_idSTOP_BACKUP [NOWAIT]- returns startptr, tli, backup_labelJOIN_BACKUP ‘unique_backup_id’- attaches a shared state identified by ‘unique_backup_id’ to a backend process.LIST_TABLESPACES [PROGRESS]LIST_FILES [TABLESPACE]LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]--Asif Rehman
--Regards====================================Kashif ZeeshanLead Quality Assurance Engineer / ManagerEnterpriseDB CorporationThe Enterprise Postgres Company
--

LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:Hi AsifWhen a non-existent slot is used with tablespace then correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.Steps=======edb@localhost bin]$
[edb@localhost bin]$ mkdir /home/edb/tbl1
[edb@localhost bin]$ mkdir /home/edb/tbl_res
[edb@localhost bin]$
postgres=# create tablespace tbl1 location '/home/edb/tbl1';
CREATE TABLESPACE
postgres=#
postgres=# create table t1 (a int) tablespace tbl1;
CREATE TABLE
postgres=# insert into t1 values(100);
INSERT 0 1
postgres=# insert into t1 values(200);
INSERT 0 1
postgres=# insert into t1 values(300);
INSERT 0 1
postgres=#
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 2 -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test" does not exist
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: write-ahead log end point: 0/2E000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child thread exited with error 1
[edb@localhost bin]$
backup folder not cleaned
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label global pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION pg_xact postgresql.conf
base pg_commit_ts pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_wal postgresql.auto.conf
[edb@localhost bin]$If the same case is executed without the parallel backup patch then the backup folder is cleaned after the error is displayed.[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test999
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test999" does not exist
pg_basebackup: write-ahead log end point: 0/2B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory " /home/edb/Desktop/backup"
pg_basebackup: changes to tablespace directories will not be undoneHi AsifA similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup. I think one bug fix will solve all these cases where clean up is not done when parallel backup is failed.[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57337"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$Same case when executed on pg_basebackup without the Parallel backup patch then proper clean up is done.[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_5590"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
[edb@localhost bin]$ThanksOn Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com> wrote:On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote:On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> Why would you need to do that? As long as the process where
>> STOP_BACKUP can do the check, that seems good enough.
>
> Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
> in progress. So if the backup is a large one, early error detection would be much beneficial.
> This is the current behavior of non-parallel backup as well.
Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.
TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.Non-parallel backup already does the early error checking. I only intended
to make parallel behave the same as non-parallel here. So, I agree with
you that the behavior of parallel backup should be consistent with the
non-parallel one. Please see the code snippet below from
basebackup.c:sendDir()
/*
* Check if the postmaster has signaled us to exit, and abort with an
* error in that case. The error handler further up will call
* do_pg_abort_backup() for us. Also check that if the backup was
* started while still in recovery, the server wasn't promoted.
* do_pg_stop_backup() will check that too, but it's better to stop
* the backup early than continue to the end and fail there.
*/
CHECK_FOR_INTERRUPTS();
if (RecoveryInProgress() != backup_started_in_recovery)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("the standby was promoted during online backup"),
errhint("This means that the backup being taken is corrupt "
"and should not be used. "
"Try taking another online backup.")));
> Okay, then I will add the shared state. And since we are adding the shared state, we can use
> that for throttling, progress-reporting and standby early error checking.
Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.
> There are two possible options:
>
> (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> (2) (Preferred Option) Use the WAL start location as the BackupID.
>
> This BackupID should be given back as a response to start backup command. All client workers
> must append this ID to all parallel backup replication commands. So that we can use this identifier
> to search for that particular backup. Does that sound good?
Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.
I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:
START_BACKUP unique_backup_id [options]...
And then others do:
JOIN_BACKUP unique_backup_id
My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.
If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.Okay so If we are to add another replication command ‘JOIN_BACKUP unique_backup_id’
to make workers find the relevant shared state. There won't be any need for changing
the grammar for any other command. The START_BACKUP can return the unique_backup_id
in the result set.
I am thinking of the following struct for shared state:typedef struct
{
char backupid[NAMEDATALEN];
XLogRecPtr startptr;
slock_t lock;
int64 throttling_counter;
bool backup_started_in_recovery;
} BackupSharedState;
The shared state structure entries would be maintained by a shared hash table.There will be one structure per parallel backup. Since a single parallel backupcan engage more than one wal sender, so I think max_wal_senders might be a littletoo much; perhaps max_wal_senders/2 since there will be at least 2 connectionsper parallel backup? Alternatively, we can set a new GUC that defines the maximumnumber of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’perhaps, or we can make it user-configurable.The key would be “backupid=hex_encode(pg_random_strong(16))”Checking for Standby Promotion:At the START_BACKUP command, we initialize BackupSharedState.backup_started_in_recoveryand keep checking it whenever send_file () is called to send a new file.Throttling:BackupSharedState.throttling_counter - The throttling logic remains the sameas for non-parallel backup with the exception that multiple threads will now beupdating it. So in parallel backup, this will represent the overall bytes thathave been transferred. So the workers would sleep if they have exceeded thelimit. Hence, the shared state carries a lock to safely update the throttlingvalue atomically.Progress Reporting:Although I think we should add progress-reporting for parallel backup as aseparate patch. The relevant entries for progress-reporting such as‘backup_total’ and ‘backup_streamed’ would be then added to this structureas well.Grammar:There is a change in the resultset being returned for START_BACKUP command;unique_backup_id is added. Additionally, JOIN_BACKUP replication command isadded. SEND_FILES has been renamed to SEND_FILE. There are no other changesto the grammar.START_BACKUP [LABEL '<label>'] [FAST]- returns startptr, tli, backup_label, unique_backup_idSTOP_BACKUP [NOWAIT]- returns startptr, tli, backup_labelJOIN_BACKUP ‘unique_backup_id’- attaches a shared state identified by ‘unique_backup_id’ to a backend process.LIST_TABLESPACES [PROGRESS]LIST_FILES [TABLESPACE]LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]--Asif Rehman
--Regards====================================Kashif ZeeshanLead Quality Assurance Engineer / ManagerEnterpriseDB CorporationThe Enterprise Postgres Company
--Regards====================================Kashif ZeeshanLead Quality Assurance Engineer / ManagerEnterpriseDB CorporationThe Enterprise Postgres Company
Attachment
- This will generate a unique backupid using pg_strong_random(16) and hex-encodedHi,Thanks, Kashif and Rajkumar. I have fixed the reported issues.I have added the shared state as previously described. The new grammar changesare as follows:START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]it. which is then returned as the result set.- It will also create a shared state and add it to the hashtable. The hash table size is setto BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically, I think it'ssufficient initial size. max_wal_senders is not used, because it can be set to quite alarge values.JOIN_BACKUP 'backup_id'- finds 'backup_id' in hashtable and attaches it to server process.SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]- renamed SEND_FILES to SEND_FILE- removed START_WAL_LOCATION from this because 'startptr' is now accessible throughshared state.There is no change in other commands:STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']The current patches (v11) have been rebased to the latest master. The backup manifest is enabledby default, so I have disabled it for parallel backup mode and have generated a warning so thatuser is aware of it and not expect it in the backup.
----Asif Rehman
--
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
On Tue, Apr 7, 2020 at 10:14 PM Asif Rehman <asifr.rehman@gmail.com> wrote:- This will generate a unique backupid using pg_strong_random(16) and hex-encodedHi,Thanks, Kashif and Rajkumar. I have fixed the reported issues.I have added the shared state as previously described. The new grammar changesare as follows:START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]it. which is then returned as the result set.- It will also create a shared state and add it to the hashtable. The hash table size is setto BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically, I think it'ssufficient initial size. max_wal_senders is not used, because it can be set to quite alarge values.JOIN_BACKUP 'backup_id'- finds 'backup_id' in hashtable and attaches it to server process.SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]- renamed SEND_FILES to SEND_FILE- removed START_WAL_LOCATION from this because 'startptr' is now accessible throughshared state.There is no change in other commands:STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']The current patches (v11) have been rebased to the latest master. The backup manifest is enabledby default, so I have disabled it for parallel backup mode and have generated a warning so thatuser is aware of it and not expect it in the backup.So, are you working on to make it work? I don't think a parallel backup feature should be creating a backup with no manifest.
On Fri, Apr 3, 2020 at 4:46 AM Asif Rehman <asifr.rehman@gmail.com> wrote: > Non-parallel backup already does the early error checking. I only intended > to make parallel behave the same as non-parallel here. So, I agree with > you that the behavior of parallel backup should be consistent with the > non-parallel one. Please see the code snippet below from > basebackup.c:sendDir() Oh, OK. So then we need to preserve that behavior, I think. Sorry, I didn't realize the check was happening there. > I am thinking of the following struct for shared state: >> typedef struct >> { >> char backupid[NAMEDATALEN]; >> XLogRecPtr startptr; >> slock_t lock; >> int64 throttling_counter; >> bool backup_started_in_recovery; >> } BackupSharedState; Looks broadly reasonable. Can anything other than lock and throttling_counter change while it's running? If not, how about using pg_atomic_uint64 for the throttling counter, and dropping lock? If that gets too complicated it's OK to keep it as you have it. > The shared state structure entries would be maintained by a shared hash table. > There will be one structure per parallel backup. Since a single parallel backup > can engage more than one wal sender, so I think max_wal_senders might be a little > too much; perhaps max_wal_senders/2 since there will be at least 2 connections > per parallel backup? Alternatively, we can set a new GUC that defines the maximum > number of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’ > perhaps, or we can make it user-configurable. I don't think you need a hash table. Linear search should be fine. And I see no point in dividing max_wal_senders by 2 either. The default is *10*. You'd need to increase that by more than an order of magnitude for a hash table to be needed, and more than that for the shared memory consumption to matter. > The key would be “backupid=hex_encode(pg_random_strong(16))” wfm > Progress Reporting: > Although I think we should add progress-reporting for parallel backup as a > separate patch. The relevant entries for progress-reporting such as > ‘backup_total’ and ‘backup_streamed’ would be then added to this structure > as well. I mean, you can separate it for review if you wish, but it would need to be committed together. > START_BACKUP [LABEL '<label>'] [FAST] > - returns startptr, tli, backup_label, unique_backup_id OK. But what if I want to use this interface for a non-parallel backup? > STOP_BACKUP [NOWAIT] > - returns startptr, tli, backup_label I don't think it makes sense for STOP_BACKUP to return the same values that START_BACKUP already returned. Presumably STOP_BACKUP should return the end LSN. It could also return the backup label and tablespace map files, as the corresponding SQL function does, unless there's some better way of returning those in this case. > JOIN_BACKUP ‘unique_backup_id’ > - attaches a shared state identified by ‘unique_backup_id’ to a backend process. OK. > LIST_TABLESPACES [PROGRESS] OK. > LIST_FILES [TABLESPACE] OK. > LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X'] Why not just LIST_WAL_FILES 'startptr' 'endptr'? > SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS] Why parens? That seems useless. Maybe it would make sense to have SEND_DATA_FILE 'datafilename' and SEND_WAL_FILE 'walfilename' as separate commands. But not sure. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Apr 7, 2020 at 1:25 PM Asif Rehman <asifr.rehman@gmail.com> wrote: > I will, however parallel backup is already quite a large patch. So I think we should first > agree on the current work before adding a backup manifest and progress-reporting support. It's going to be needed for commit, but it may make sense for us to do more review of what you've got here before we worry about it. I'm gonna try to find some time for that as soon as I can. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
error: patch failed: src/backend/storage/ipc/ipci.c:147
error: src/backend/storage/ipc/ipci.c: patch does not apply
Segmentation fault (core dumped)
Loaded symbols for /lib64/libselinux.so.1
Core was generated by `./pg_basebackup -D /tmp/bkp --no-estimate-size --jobs=2'.
Program terminated with signal 11, Segmentation fault.
#0 ____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=<value optimized out>, loc=0x392158ee40) at ../stdlib/strtol_l.c:298
298 while (ISSPACE (*s))
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 ____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=<value optimized out>, loc=0x392158ee40) at ../stdlib/strtol_l.c:298
#1 0x0000003921233b30 in atoi (nptr=<value optimized out>) at atoi.c:28
#2 0x000000000040841e in main (argc=5, argv=0x7ffeaa6fb968) at pg_basebackup.c:2526
On Tue, Apr 7, 2020 at 1:25 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
> I will, however parallel backup is already quite a large patch. So I think we should first
> agree on the current work before adding a backup manifest and progress-reporting support.
It's going to be needed for commit, but it may make sense for us to do
more review of what you've got here before we worry about it.
I'm gonna try to find some time for that as soon as I can.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi Asif,Thanks for new patches.Patches need to be rebased on head. Getting a failure while applying the 0003 patch.edb@localhost postgresql]$ git apply v11/0003-Parallel-Backup-Backend-Replication-commands.patch
error: patch failed: src/backend/storage/ipc/ipci.c:147
error: src/backend/storage/ipc/ipci.c: patch does not applyI have applied v11 patches on commit - 23ba3b5ee278847e4fad913b80950edb2838fd35 to test further.pg_basebackup has a new option "--no-estimate-size", pg_basebackup crashes when using this option.[edb@localhost bin]$ ./pg_basebackup -D /tmp/bkp --no-estimate-size --jobs=2
Segmentation fault (core dumped)--stacktrace[edb@localhost bin]$ gdb -q -c core.80438 pg_basebackup
Loaded symbols for /lib64/libselinux.so.1
Core was generated by `./pg_basebackup -D /tmp/bkp --no-estimate-size --jobs=2'.
Program terminated with signal 11, Segmentation fault.
#0 ____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=<value optimized out>, loc=0x392158ee40) at ../stdlib/strtol_l.c:298
298 while (ISSPACE (*s))
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 ____strtol_l_internal (nptr=0x0, endptr=0x0, base=10, group=<value optimized out>, loc=0x392158ee40) at ../stdlib/strtol_l.c:298
#1 0x0000003921233b30 in atoi (nptr=<value optimized out>) at atoi.c:28
#2 0x000000000040841e in main (argc=5, argv=0x7ffeaa6fb968) at pg_basebackup.c:2526Thanks & Regards,Rajkumar RaghuwanshiOn Tue, Apr 7, 2020 at 11:07 PM Robert Haas <robertmhaas@gmail.com> wrote:On Tue, Apr 7, 2020 at 1:25 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
> I will, however parallel backup is already quite a large patch. So I think we should first
> agree on the current work before adding a backup manifest and progress-reporting support.
It's going to be needed for commit, but it may make sense for us to do
more review of what you've got here before we worry about it.
I'm gonna try to find some time for that as soon as I can.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Attachment
- This will generate a unique backupid using pg_strong_random(16) and hex-encodedHi,Thanks, Kashif and Rajkumar. I have fixed the reported issues.I have added the shared state as previously described. The new grammar changesare as follows:START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]it. which is then returned as the result set.- It will also create a shared state and add it to the hashtable. The hash table size is setto BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically, I think it'ssufficient initial size. max_wal_senders is not used, because it can be set to quite alarge values.JOIN_BACKUP 'backup_id'- finds 'backup_id' in hashtable and attaches it to server process.SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]- renamed SEND_FILES to SEND_FILE- removed START_WAL_LOCATION from this because 'startptr' is now accessible throughshared state.There is no change in other commands:STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']The current patches (v11) have been rebased to the latest master. The backup manifest is enabledby default, so I have disabled it for parallel backup mode and have generated a warning so thatuser is aware of it and not expect it in the backup.
[edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D /home/edb/Desktop/backup/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_9925"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
….
….
pg_basebackup: backup worker (1014) created
pg_basebackup: backup worker (1015) created
pg_basebackup: backup worker (1016) created
pg_basebackup: backup worker (1017) created
pg_basebackup: error: could not connect to server: could not look up local user ID 1000: Too many open files
Segmentation fault
[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup /tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 13219]
[New LWP 13222]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 1990 -D /home/edb/Desktop/backup/'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47 if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1 0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
#2 0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
#3 0x00007f2226f76a49 in __run_exit_handlers (status=1, listp=0x7f22272f86c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#4 0x00007f2226f76a95 in __GI_exit (status=<optimized out>) at exit.c:99
#5 0x0000000000408c54 in create_parallel_workers (backupinfo=0x952ca0) at pg_basebackup.c:2811
#6 0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
#7 0x0000000000408b4d in main (argc=6, argv=0x7ffe3dabc718) at pg_basebackup.c:2765
(gdb)
b) When executing two backups at the same time, getting FATAL error due to max_wal_senders and instead of exit Backup got completed.
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 1/DA000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_17066"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
Segmentation fault (core dumped)
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup /tmp/cores/core.pg_basebackup.17041.localhost.localdomain.1586353696
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 17041]
[New LWP 17067]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47 if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1 0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
#2 0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
#3 0x00007f051edc1a49 in __run_exit_handlers (status=1, listp=0x7f051f1436c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#4 0x00007f051edc1a95 in __GI_exit (status=<optimized out>) at exit.c:99
#5 0x0000000000408c54 in create_parallel_workers (backupinfo=0x1c6dca0) at pg_basebackup.c:2811
#6 0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
#7 0x0000000000408b4d in main (argc=6, argv=0x7ffdb76a6d68) at pg_basebackup.c:2765
(gdb)
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_16235"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
base pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_xact postgresql.conf
pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION postgresql.auto.conf
[edb@localhost bin]$
[edb@localhost bin]$
On Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:Hi AsifWhen a non-existent slot is used with tablespace then correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.Steps=======edb@localhost bin]$
[edb@localhost bin]$ mkdir /home/edb/tbl1
[edb@localhost bin]$ mkdir /home/edb/tbl_res
[edb@localhost bin]$
postgres=# create tablespace tbl1 location '/home/edb/tbl1';
CREATE TABLESPACE
postgres=#
postgres=# create table t1 (a int) tablespace tbl1;
CREATE TABLE
postgres=# insert into t1 values(100);
INSERT 0 1
postgres=# insert into t1 values(200);
INSERT 0 1
postgres=# insert into t1 values(300);
INSERT 0 1
postgres=#
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 2 -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test" does not exist
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: write-ahead log end point: 0/2E000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child thread exited with error 1
[edb@localhost bin]$
backup folder not cleaned
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label global pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION pg_xact postgresql.conf
base pg_commit_ts pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_wal postgresql.auto.conf
[edb@localhost bin]$If the same case is executed without the parallel backup patch then the backup folder is cleaned after the error is displayed.[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test999
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test999" does not exist
pg_basebackup: write-ahead log end point: 0/2B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory " /home/edb/Desktop/backup"
pg_basebackup: changes to tablespace directories will not be undoneHi AsifA similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup. I think one bug fix will solve all these cases where clean up is not done when parallel backup is failed.[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57337"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$Same case when executed on pg_basebackup without the Parallel backup patch then proper clean up is done.[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_5590"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
[edb@localhost bin]$ThanksOn Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com> wrote:On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote:On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> Why would you need to do that? As long as the process where
>> STOP_BACKUP can do the check, that seems good enough.
>
> Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
> in progress. So if the backup is a large one, early error detection would be much beneficial.
> This is the current behavior of non-parallel backup as well.
Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.
TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.Non-parallel backup already does the early error checking. I only intended
to make parallel behave the same as non-parallel here. So, I agree with
you that the behavior of parallel backup should be consistent with the
non-parallel one. Please see the code snippet below from
basebackup.c:sendDir()
/*
* Check if the postmaster has signaled us to exit, and abort with an
* error in that case. The error handler further up will call
* do_pg_abort_backup() for us. Also check that if the backup was
* started while still in recovery, the server wasn't promoted.
* do_pg_stop_backup() will check that too, but it's better to stop
* the backup early than continue to the end and fail there.
*/
CHECK_FOR_INTERRUPTS();
if (RecoveryInProgress() != backup_started_in_recovery)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("the standby was promoted during online backup"),
errhint("This means that the backup being taken is corrupt "
"and should not be used. "
"Try taking another online backup.")));
> Okay, then I will add the shared state. And since we are adding the shared state, we can use
> that for throttling, progress-reporting and standby early error checking.
Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.
> There are two possible options:
>
> (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> (2) (Preferred Option) Use the WAL start location as the BackupID.
>
> This BackupID should be given back as a response to start backup command. All client workers
> must append this ID to all parallel backup replication commands. So that we can use this identifier
> to search for that particular backup. Does that sound good?
Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.
I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:
START_BACKUP unique_backup_id [options]...
And then others do:
JOIN_BACKUP unique_backup_id
My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.
If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.Okay so If we are to add another replication command ‘JOIN_BACKUP unique_backup_id’
to make workers find the relevant shared state. There won't be any need for changing
the grammar for any other command. The START_BACKUP can return the unique_backup_id
in the result set.
I am thinking of the following struct for shared state:typedef struct
{
char backupid[NAMEDATALEN];
XLogRecPtr startptr;
slock_t lock;
int64 throttling_counter;
bool backup_started_in_recovery;
} BackupSharedState;
The shared state structure entries would be maintained by a shared hash table.There will be one structure per parallel backup. Since a single parallel backupcan engage more than one wal sender, so I think max_wal_senders might be a littletoo much; perhaps max_wal_senders/2 since there will be at least 2 connectionsper parallel backup? Alternatively, we can set a new GUC that defines the maximumnumber of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’perhaps, or we can make it user-configurable.The key would be “backupid=hex_encode(pg_random_strong(16))”Checking for Standby Promotion:At the START_BACKUP command, we initialize BackupSharedState.backup_started_in_recoveryand keep checking it whenever send_file () is called to send a new file.Throttling:BackupSharedState.throttling_counter - The throttling logic remains the sameas for non-parallel backup with the exception that multiple threads will now beupdating it. So in parallel backup, this will represent the overall bytes thathave been transferred. So the workers would sleep if they have exceeded thelimit. Hence, the shared state carries a lock to safely update the throttlingvalue atomically.Progress Reporting:Although I think we should add progress-reporting for parallel backup as aseparate patch. The relevant entries for progress-reporting such as‘backup_total’ and ‘backup_streamed’ would be then added to this structureas well.Grammar:There is a change in the resultset being returned for START_BACKUP command;unique_backup_id is added. Additionally, JOIN_BACKUP replication command isadded. SEND_FILES has been renamed to SEND_FILE. There are no other changesto the grammar.START_BACKUP [LABEL '<label>'] [FAST]- returns startptr, tli, backup_label, unique_backup_idSTOP_BACKUP [NOWAIT]- returns startptr, tli, backup_labelJOIN_BACKUP ‘unique_backup_id’- attaches a shared state identified by ‘unique_backup_id’ to a backend process.LIST_TABLESPACES [PROGRESS]LIST_FILES [TABLESPACE]LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]--Asif Rehman
--Regards====================================Kashif ZeeshanLead Quality Assurance Engineer / ManagerEnterpriseDB CorporationThe Enterprise Postgres Company
--Regards====================================Kashif ZeeshanLead Quality Assurance Engineer / ManagerEnterpriseDB CorporationThe Enterprise Postgres Company----Asif Rehman
--

On Tue, Apr 7, 2020 at 9:44 PM Asif Rehman <asifr.rehman@gmail.com> wrote:- This will generate a unique backupid using pg_strong_random(16) and hex-encodedHi,Thanks, Kashif and Rajkumar. I have fixed the reported issues.I have added the shared state as previously described. The new grammar changesare as follows:START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]it. which is then returned as the result set.- It will also create a shared state and add it to the hashtable. The hash table size is setto BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically, I think it'ssufficient initial size. max_wal_senders is not used, because it can be set to quite alarge values.JOIN_BACKUP 'backup_id'- finds 'backup_id' in hashtable and attaches it to server process.SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]- renamed SEND_FILES to SEND_FILE- removed START_WAL_LOCATION from this because 'startptr' is now accessible throughshared state.There is no change in other commands:STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']The current patches (v11) have been rebased to the latest master. The backup manifest is enabledby default, so I have disabled it for parallel backup mode and have generated a warning so thatuser is aware of it and not expect it in the backup.Hi AsifI have verified the bug fixes, one bug is fixed and working now as expectedFor the verification of the other bug fixes faced following issues, please have a look.1) Following bug fixes mentioned below are generating segmentation fault.Please note for reference I have added a description only as steps were given in previous emails of each bug I tried to verify the fix. Backtrace is also added with each case which points to one bug for both the cases.a) The backup failed with errors "error: could not connect to server: could not look up local user ID 1000: Too many open files" when the max_wal_senders was set to 2000.
[edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D /home/edb/Desktop/backup/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_9925"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
….
….
pg_basebackup: backup worker (1014) created
pg_basebackup: backup worker (1015) created
pg_basebackup: backup worker (1016) created
pg_basebackup: backup worker (1017) created
pg_basebackup: error: could not connect to server: could not look up local user ID 1000: Too many open files
Segmentation fault
[edb@localhost bin]$[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup /tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 13219]
[New LWP 13222]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 1990 -D /home/edb/Desktop/backup/'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47 if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1 0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
#2 0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
#3 0x00007f2226f76a49 in __run_exit_handlers (status=1, listp=0x7f22272f86c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#4 0x00007f2226f76a95 in __GI_exit (status=<optimized out>) at exit.c:99
#5 0x0000000000408c54 in create_parallel_workers (backupinfo=0x952ca0) at pg_basebackup.c:2811
#6 0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
#7 0x0000000000408b4d in main (argc=6, argv=0x7ffe3dabc718) at pg_basebackup.c:2765
(gdb)
b) When executing two backups at the same time, getting FATAL error due to max_wal_senders and instead of exit Backup got completed.[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 1/DA000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_17066"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
Segmentation fault (core dumped)
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup /tmp/cores/core.pg_basebackup.17041.localhost.localdomain.1586353696
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 17041]
[New LWP 17067]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47 if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1 0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
#2 0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
#3 0x00007f051edc1a49 in __run_exit_handlers (status=1, listp=0x7f051f1436c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#4 0x00007f051edc1a95 in __GI_exit (status=<optimized out>) at exit.c:99
#5 0x0000000000408c54 in create_parallel_workers (backupinfo=0x1c6dca0) at pg_basebackup.c:2811
#6 0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
#7 0x0000000000408b4d in main (argc=6, argv=0x7ffdb76a6d68) at pg_basebackup.c:2765
(gdb)2) The following bug is not fixed yetA similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_16235"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
base pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_xact postgresql.conf
pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION postgresql.auto.conf
[edb@localhost bin]$
[edb@localhost bin]$ThanksKashif ZeeshanOn Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:Hi AsifWhen a non-existent slot is used with tablespace then correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.Steps=======edb@localhost bin]$
[edb@localhost bin]$ mkdir /home/edb/tbl1
[edb@localhost bin]$ mkdir /home/edb/tbl_res
[edb@localhost bin]$
postgres=# create tablespace tbl1 location '/home/edb/tbl1';
CREATE TABLESPACE
postgres=#
postgres=# create table t1 (a int) tablespace tbl1;
CREATE TABLE
postgres=# insert into t1 values(100);
INSERT 0 1
postgres=# insert into t1 values(200);
INSERT 0 1
postgres=# insert into t1 values(300);
INSERT 0 1
postgres=#
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 2 -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test" does not exist
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: write-ahead log end point: 0/2E000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child thread exited with error 1
[edb@localhost bin]$
backup folder not cleaned
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label global pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION pg_xact postgresql.conf
base pg_commit_ts pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_wal postgresql.auto.conf
[edb@localhost bin]$If the same case is executed without the parallel backup patch then the backup folder is cleaned after the error is displayed.[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test999
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test999" does not exist
pg_basebackup: write-ahead log end point: 0/2B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory " /home/edb/Desktop/backup"
pg_basebackup: changes to tablespace directories will not be undoneHi AsifA similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup. I think one bug fix will solve all these cases where clean up is not done when parallel backup is failed.[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57337"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$Same case when executed on pg_basebackup without the Parallel backup patch then proper clean up is done.[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_5590"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
[edb@localhost bin]$ThanksOn Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com> wrote:On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote:On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> Why would you need to do that? As long as the process where
>> STOP_BACKUP can do the check, that seems good enough.
>
> Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
> in progress. So if the backup is a large one, early error detection would be much beneficial.
> This is the current behavior of non-parallel backup as well.
Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.
TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.Non-parallel backup already does the early error checking. I only intended
to make parallel behave the same as non-parallel here. So, I agree with
you that the behavior of parallel backup should be consistent with the
non-parallel one. Please see the code snippet below from
basebackup.c:sendDir()
/*
* Check if the postmaster has signaled us to exit, and abort with an
* error in that case. The error handler further up will call
* do_pg_abort_backup() for us. Also check that if the backup was
* started while still in recovery, the server wasn't promoted.
* do_pg_stop_backup() will check that too, but it's better to stop
* the backup early than continue to the end and fail there.
*/
CHECK_FOR_INTERRUPTS();
if (RecoveryInProgress() != backup_started_in_recovery)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("the standby was promoted during online backup"),
errhint("This means that the backup being taken is corrupt "
"and should not be used. "
"Try taking another online backup.")));
> Okay, then I will add the shared state. And since we are adding the shared state, we can use
> that for throttling, progress-reporting and standby early error checking.
Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.
> There are two possible options:
>
> (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> (2) (Preferred Option) Use the WAL start location as the BackupID.
>
> This BackupID should be given back as a response to start backup command. All client workers
> must append this ID to all parallel backup replication commands. So that we can use this identifier
> to search for that particular backup. Does that sound good?
Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.
I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:
START_BACKUP unique_backup_id [options]...
And then others do:
JOIN_BACKUP unique_backup_id
My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.
If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.Okay so If we are to add another replication command ‘JOIN_BACKUP unique_backup_id’
to make workers find the relevant shared state. There won't be any need for changing
the grammar for any other command. The START_BACKUP can return the unique_backup_id
in the result set.
I am thinking of the following struct for shared state:typedef struct
{
char backupid[NAMEDATALEN];
XLogRecPtr startptr;
slock_t lock;
int64 throttling_counter;
bool backup_started_in_recovery;
} BackupSharedState;
The shared state structure entries would be maintained by a shared hash table.There will be one structure per parallel backup. Since a single parallel backupcan engage more than one wal sender, so I think max_wal_senders might be a littletoo much; perhaps max_wal_senders/2 since there will be at least 2 connectionsper parallel backup? Alternatively, we can set a new GUC that defines the maximumnumber of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’perhaps, or we can make it user-configurable.The key would be “backupid=hex_encode(pg_random_strong(16))”Checking for Standby Promotion:At the START_BACKUP command, we initialize BackupSharedState.backup_started_in_recoveryand keep checking it whenever send_file () is called to send a new file.Throttling:BackupSharedState.throttling_counter - The throttling logic remains the sameas for non-parallel backup with the exception that multiple threads will now beupdating it. So in parallel backup, this will represent the overall bytes thathave been transferred. So the workers would sleep if they have exceeded thelimit. Hence, the shared state carries a lock to safely update the throttlingvalue atomically.Progress Reporting:Although I think we should add progress-reporting for parallel backup as aseparate patch. The relevant entries for progress-reporting such as‘backup_total’ and ‘backup_streamed’ would be then added to this structureas well.Grammar:There is a change in the resultset being returned for START_BACKUP command;unique_backup_id is added. Additionally, JOIN_BACKUP replication command isadded. SEND_FILES has been renamed to SEND_FILE. There are no other changesto the grammar.START_BACKUP [LABEL '<label>'] [FAST]- returns startptr, tli, backup_label, unique_backup_idSTOP_BACKUP [NOWAIT]- returns startptr, tli, backup_labelJOIN_BACKUP ‘unique_backup_id’- attaches a shared state identified by ‘unique_backup_id’ to a backend process.LIST_TABLESPACES [PROGRESS]LIST_FILES [TABLESPACE]LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]
rebased and updated to the current master (8128b0c1). v13 is attached.
- Fixes the above reported issues.
For this, 'backup_streamed' is moved to a shared structure (BackupState) as
While files are being transferred from server to client. The main process remains
- Added backup manifest:
The backend workers maintain their own manifest file which contains a list of files
The main process will read these individual files and concatenate them into a single file
The manifest file is created when the following command is received:
BUILD_MANIFEST 'backupid'
This is a new replication command. It is sent when pg_basebackup has copied all the
Attachment
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 5 -D /home/edb/Desktop/backup/ --no-manifest
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_10223"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: write-ahead log end point: 0/2000100
pg_basebackup: error: could not get data for 'BUILD_MANIFEST': ERROR: could not open file "base/pgsql_tmp/pgsql_tmp_b4ef5ac0fd150b2a28caf626bbb1bef2.1": No such file or directory
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
[edb@localhost bin]$
On Wed, Apr 8, 2020 at 6:53 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:On Tue, Apr 7, 2020 at 9:44 PM Asif Rehman <asifr.rehman@gmail.com> wrote:- This will generate a unique backupid using pg_strong_random(16) and hex-encodedHi,Thanks, Kashif and Rajkumar. I have fixed the reported issues.I have added the shared state as previously described. The new grammar changesare as follows:START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]it. which is then returned as the result set.- It will also create a shared state and add it to the hashtable. The hash table size is setto BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically, I think it'ssufficient initial size. max_wal_senders is not used, because it can be set to quite alarge values.JOIN_BACKUP 'backup_id'- finds 'backup_id' in hashtable and attaches it to server process.SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]- renamed SEND_FILES to SEND_FILE- removed START_WAL_LOCATION from this because 'startptr' is now accessible throughshared state.There is no change in other commands:STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']The current patches (v11) have been rebased to the latest master. The backup manifest is enabledby default, so I have disabled it for parallel backup mode and have generated a warning so thatuser is aware of it and not expect it in the backup.Hi AsifI have verified the bug fixes, one bug is fixed and working now as expectedFor the verification of the other bug fixes faced following issues, please have a look.1) Following bug fixes mentioned below are generating segmentation fault.Please note for reference I have added a description only as steps were given in previous emails of each bug I tried to verify the fix. Backtrace is also added with each case which points to one bug for both the cases.a) The backup failed with errors "error: could not connect to server: could not look up local user ID 1000: Too many open files" when the max_wal_senders was set to 2000.
[edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D /home/edb/Desktop/backup/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_9925"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
….
….
pg_basebackup: backup worker (1014) created
pg_basebackup: backup worker (1015) created
pg_basebackup: backup worker (1016) created
pg_basebackup: backup worker (1017) created
pg_basebackup: error: could not connect to server: could not look up local user ID 1000: Too many open files
Segmentation fault
[edb@localhost bin]$[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup /tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 13219]
[New LWP 13222]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 1990 -D /home/edb/Desktop/backup/'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47 if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1 0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
#2 0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
#3 0x00007f2226f76a49 in __run_exit_handlers (status=1, listp=0x7f22272f86c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#4 0x00007f2226f76a95 in __GI_exit (status=<optimized out>) at exit.c:99
#5 0x0000000000408c54 in create_parallel_workers (backupinfo=0x952ca0) at pg_basebackup.c:2811
#6 0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
#7 0x0000000000408b4d in main (argc=6, argv=0x7ffe3dabc718) at pg_basebackup.c:2765
(gdb)
b) When executing two backups at the same time, getting FATAL error due to max_wal_senders and instead of exit Backup got completed.[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 1/DA000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_17066"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
Segmentation fault (core dumped)
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup /tmp/cores/core.pg_basebackup.17041.localhost.localdomain.1586353696
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 17041]
[New LWP 17067]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47 if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1 0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
#2 0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
#3 0x00007f051edc1a49 in __run_exit_handlers (status=1, listp=0x7f051f1436c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#4 0x00007f051edc1a95 in __GI_exit (status=<optimized out>) at exit.c:99
#5 0x0000000000408c54 in create_parallel_workers (backupinfo=0x1c6dca0) at pg_basebackup.c:2811
#6 0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
#7 0x0000000000408b4d in main (argc=6, argv=0x7ffdb76a6d68) at pg_basebackup.c:2765
(gdb)2) The following bug is not fixed yetA similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_16235"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
base pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_xact postgresql.conf
pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION postgresql.auto.conf
[edb@localhost bin]$
[edb@localhost bin]$ThanksKashif ZeeshanOn Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:Hi AsifWhen a non-existent slot is used with tablespace then correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.Steps=======edb@localhost bin]$
[edb@localhost bin]$ mkdir /home/edb/tbl1
[edb@localhost bin]$ mkdir /home/edb/tbl_res
[edb@localhost bin]$
postgres=# create tablespace tbl1 location '/home/edb/tbl1';
CREATE TABLESPACE
postgres=#
postgres=# create table t1 (a int) tablespace tbl1;
CREATE TABLE
postgres=# insert into t1 values(100);
INSERT 0 1
postgres=# insert into t1 values(200);
INSERT 0 1
postgres=# insert into t1 values(300);
INSERT 0 1
postgres=#
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 2 -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test" does not exist
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: write-ahead log end point: 0/2E000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child thread exited with error 1
[edb@localhost bin]$
backup folder not cleaned
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label global pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION pg_xact postgresql.conf
base pg_commit_ts pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_wal postgresql.auto.conf
[edb@localhost bin]$If the same case is executed without the parallel backup patch then the backup folder is cleaned after the error is displayed.[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test999
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test999" does not exist
pg_basebackup: write-ahead log end point: 0/2B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory " /home/edb/Desktop/backup"
pg_basebackup: changes to tablespace directories will not be undoneHi AsifA similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup. I think one bug fix will solve all these cases where clean up is not done when parallel backup is failed.[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57337"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$Same case when executed on pg_basebackup without the Parallel backup patch then proper clean up is done.[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_5590"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
[edb@localhost bin]$ThanksOn Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com> wrote:On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote:On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> Why would you need to do that? As long as the process where
>> STOP_BACKUP can do the check, that seems good enough.
>
> Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
> in progress. So if the backup is a large one, early error detection would be much beneficial.
> This is the current behavior of non-parallel backup as well.
Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.
TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.Non-parallel backup already does the early error checking. I only intended
to make parallel behave the same as non-parallel here. So, I agree with
you that the behavior of parallel backup should be consistent with the
non-parallel one. Please see the code snippet below from
basebackup.c:sendDir()
/*
* Check if the postmaster has signaled us to exit, and abort with an
* error in that case. The error handler further up will call
* do_pg_abort_backup() for us. Also check that if the backup was
* started while still in recovery, the server wasn't promoted.
* do_pg_stop_backup() will check that too, but it's better to stop
* the backup early than continue to the end and fail there.
*/
CHECK_FOR_INTERRUPTS();
if (RecoveryInProgress() != backup_started_in_recovery)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("the standby was promoted during online backup"),
errhint("This means that the backup being taken is corrupt "
"and should not be used. "
"Try taking another online backup.")));
> Okay, then I will add the shared state. And since we are adding the shared state, we can use
> that for throttling, progress-reporting and standby early error checking.
Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.
> There are two possible options:
>
> (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> (2) (Preferred Option) Use the WAL start location as the BackupID.
>
> This BackupID should be given back as a response to start backup command. All client workers
> must append this ID to all parallel backup replication commands. So that we can use this identifier
> to search for that particular backup. Does that sound good?
Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.
I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:
START_BACKUP unique_backup_id [options]...
And then others do:
JOIN_BACKUP unique_backup_id
My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.
If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.Okay so If we are to add another replication command ‘JOIN_BACKUP unique_backup_id’
to make workers find the relevant shared state. There won't be any need for changing
the grammar for any other command. The START_BACKUP can return the unique_backup_id
in the result set.
I am thinking of the following struct for shared state:typedef struct
{
char backupid[NAMEDATALEN];
XLogRecPtr startptr;
slock_t lock;
int64 throttling_counter;
bool backup_started_in_recovery;
} BackupSharedState;
The shared state structure entries would be maintained by a shared hash table.There will be one structure per parallel backup. Since a single parallel backupcan engage more than one wal sender, so I think max_wal_senders might be a littletoo much; perhaps max_wal_senders/2 since there will be at least 2 connectionsper parallel backup? Alternatively, we can set a new GUC that defines the maximumnumber of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’perhaps, or we can make it user-configurable.The key would be “backupid=hex_encode(pg_random_strong(16))”Checking for Standby Promotion:At the START_BACKUP command, we initialize BackupSharedState.backup_started_in_recoveryand keep checking it whenever send_file () is called to send a new file.Throttling:BackupSharedState.throttling_counter - The throttling logic remains the sameas for non-parallel backup with the exception that multiple threads will now beupdating it. So in parallel backup, this will represent the overall bytes thathave been transferred. So the workers would sleep if they have exceeded thelimit. Hence, the shared state carries a lock to safely update the throttlingvalue atomically.Progress Reporting:Although I think we should add progress-reporting for parallel backup as aseparate patch. The relevant entries for progress-reporting such as‘backup_total’ and ‘backup_streamed’ would be then added to this structureas well.Grammar:There is a change in the resultset being returned for START_BACKUP command;unique_backup_id is added. Additionally, JOIN_BACKUP replication command isadded. SEND_FILES has been renamed to SEND_FILE. There are no other changesto the grammar.START_BACKUP [LABEL '<label>'] [FAST]- returns startptr, tli, backup_label, unique_backup_idSTOP_BACKUP [NOWAIT]- returns startptr, tli, backup_labelJOIN_BACKUP ‘unique_backup_id’- attaches a shared state identified by ‘unique_backup_id’ to a backend process.LIST_TABLESPACES [PROGRESS]LIST_FILES [TABLESPACE]LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]Hi,
rebased and updated to the current master (8128b0c1). v13 is attached.
- Fixes the above reported issues.- Added progress-reporting support for parallel:
For this, 'backup_streamed' is moved to a shared structure (BackupState) aspg_atomic_uint64 variable. The worker processes will keep incrementing thisvariable.
While files are being transferred from server to client. The main process remainsin an idle state. So after each increment, the worker process will signal master toupdate the stats in pg_stat_progress_basebackup view.The 'tablespace_streamed' column is not updated and will remain empty. This isbecause multiple workers may be copying files from different tablespaces.
- Added backup manifest:
The backend workers maintain their own manifest file which contains a list of filesthat are being transferred by the work. Once all backup files are transferred, theworkers will create a temp file as ('pg_tempdir/temp_file_prefix_backupid.workerid')to write the content of the manifest file from BufFile. The workers won’t add theheader, nor the WAL information in their manifest. These two will be added by themain process while merging all worker manifest files.
The main process will read these individual files and concatenate them into a single filewhich is then sent back to the client.
The manifest file is created when the following command is received:BUILD_MANIFEST 'backupid'
This is a new replication command. It is sent when pg_basebackup has copied all the$PGDATA files including WAL files.
--

Hi AsifGetting the following error on Parallel backup when --no-manifest option is used.[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 5 -D /home/edb/Desktop/backup/ --no-manifest
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_10223"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: write-ahead log end point: 0/2000100
pg_basebackup: error: could not get data for 'BUILD_MANIFEST': ERROR: could not open file "base/pgsql_tmp/pgsql_tmp_b4ef5ac0fd150b2a28caf626bbb1bef2.1": No such file or directory
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
[edb@localhost bin]$
ThanksOn Tue, Apr 14, 2020 at 5:33 PM Asif Rehman <asifr.rehman@gmail.com> wrote:On Wed, Apr 8, 2020 at 6:53 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:On Tue, Apr 7, 2020 at 9:44 PM Asif Rehman <asifr.rehman@gmail.com> wrote:- This will generate a unique backupid using pg_strong_random(16) and hex-encodedHi,Thanks, Kashif and Rajkumar. I have fixed the reported issues.I have added the shared state as previously described. The new grammar changesare as follows:START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]it. which is then returned as the result set.- It will also create a shared state and add it to the hashtable. The hash table size is setto BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically, I think it'ssufficient initial size. max_wal_senders is not used, because it can be set to quite alarge values.JOIN_BACKUP 'backup_id'- finds 'backup_id' in hashtable and attaches it to server process.SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]- renamed SEND_FILES to SEND_FILE- removed START_WAL_LOCATION from this because 'startptr' is now accessible throughshared state.There is no change in other commands:STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']The current patches (v11) have been rebased to the latest master. The backup manifest is enabledby default, so I have disabled it for parallel backup mode and have generated a warning so thatuser is aware of it and not expect it in the backup.Hi AsifI have verified the bug fixes, one bug is fixed and working now as expectedFor the verification of the other bug fixes faced following issues, please have a look.1) Following bug fixes mentioned below are generating segmentation fault.Please note for reference I have added a description only as steps were given in previous emails of each bug I tried to verify the fix. Backtrace is also added with each case which points to one bug for both the cases.a) The backup failed with errors "error: could not connect to server: could not look up local user ID 1000: Too many open files" when the max_wal_senders was set to 2000.
[edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D /home/edb/Desktop/backup/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_9925"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
….
….
pg_basebackup: backup worker (1014) created
pg_basebackup: backup worker (1015) created
pg_basebackup: backup worker (1016) created
pg_basebackup: backup worker (1017) created
pg_basebackup: error: could not connect to server: could not look up local user ID 1000: Too many open files
Segmentation fault
[edb@localhost bin]$[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup /tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 13219]
[New LWP 13222]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 1990 -D /home/edb/Desktop/backup/'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47 if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1 0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
#2 0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
#3 0x00007f2226f76a49 in __run_exit_handlers (status=1, listp=0x7f22272f86c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#4 0x00007f2226f76a95 in __GI_exit (status=<optimized out>) at exit.c:99
#5 0x0000000000408c54 in create_parallel_workers (backupinfo=0x952ca0) at pg_basebackup.c:2811
#6 0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
#7 0x0000000000408b4d in main (argc=6, argv=0x7ffe3dabc718) at pg_basebackup.c:2765
(gdb)
b) When executing two backups at the same time, getting FATAL error due to max_wal_senders and instead of exit Backup got completed.[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 1/DA000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_17066"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
Segmentation fault (core dumped)
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup /tmp/cores/core.pg_basebackup.17041.localhost.localdomain.1586353696
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 17041]
[New LWP 17067]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47 if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1 0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
#2 0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
#3 0x00007f051edc1a49 in __run_exit_handlers (status=1, listp=0x7f051f1436c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#4 0x00007f051edc1a95 in __GI_exit (status=<optimized out>) at exit.c:99
#5 0x0000000000408c54 in create_parallel_workers (backupinfo=0x1c6dca0) at pg_basebackup.c:2811
#6 0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
#7 0x0000000000408b4d in main (argc=6, argv=0x7ffdb76a6d68) at pg_basebackup.c:2765
(gdb)2) The following bug is not fixed yetA similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_16235"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
base pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_xact postgresql.conf
pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION postgresql.auto.conf
[edb@localhost bin]$
[edb@localhost bin]$ThanksKashif ZeeshanOn Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:Hi AsifWhen a non-existent slot is used with tablespace then correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.Steps=======edb@localhost bin]$
[edb@localhost bin]$ mkdir /home/edb/tbl1
[edb@localhost bin]$ mkdir /home/edb/tbl_res
[edb@localhost bin]$
postgres=# create tablespace tbl1 location '/home/edb/tbl1';
CREATE TABLESPACE
postgres=#
postgres=# create table t1 (a int) tablespace tbl1;
CREATE TABLE
postgres=# insert into t1 values(100);
INSERT 0 1
postgres=# insert into t1 values(200);
INSERT 0 1
postgres=# insert into t1 values(300);
INSERT 0 1
postgres=#
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 2 -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test" does not exist
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: write-ahead log end point: 0/2E000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child thread exited with error 1
[edb@localhost bin]$
backup folder not cleaned
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label global pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION pg_xact postgresql.conf
base pg_commit_ts pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_wal postgresql.auto.conf
[edb@localhost bin]$If the same case is executed without the parallel backup patch then the backup folder is cleaned after the error is displayed.[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test999
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test999" does not exist
pg_basebackup: write-ahead log end point: 0/2B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory " /home/edb/Desktop/backup"
pg_basebackup: changes to tablespace directories will not be undoneHi AsifA similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup. I think one bug fix will solve all these cases where clean up is not done when parallel backup is failed.[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57337"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$Same case when executed on pg_basebackup without the Parallel backup patch then proper clean up is done.[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_5590"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
[edb@localhost bin]$ThanksOn Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com> wrote:On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote:On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> Why would you need to do that? As long as the process where
>> STOP_BACKUP can do the check, that seems good enough.
>
> Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
> in progress. So if the backup is a large one, early error detection would be much beneficial.
> This is the current behavior of non-parallel backup as well.
Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.
TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.Non-parallel backup already does the early error checking. I only intended
to make parallel behave the same as non-parallel here. So, I agree with
you that the behavior of parallel backup should be consistent with the
non-parallel one. Please see the code snippet below from
basebackup.c:sendDir()
/*
* Check if the postmaster has signaled us to exit, and abort with an
* error in that case. The error handler further up will call
* do_pg_abort_backup() for us. Also check that if the backup was
* started while still in recovery, the server wasn't promoted.
* do_pg_stop_backup() will check that too, but it's better to stop
* the backup early than continue to the end and fail there.
*/
CHECK_FOR_INTERRUPTS();
if (RecoveryInProgress() != backup_started_in_recovery)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("the standby was promoted during online backup"),
errhint("This means that the backup being taken is corrupt "
"and should not be used. "
"Try taking another online backup.")));
> Okay, then I will add the shared state. And since we are adding the shared state, we can use
> that for throttling, progress-reporting and standby early error checking.
Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.
> There are two possible options:
>
> (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> (2) (Preferred Option) Use the WAL start location as the BackupID.
>
> This BackupID should be given back as a response to start backup command. All client workers
> must append this ID to all parallel backup replication commands. So that we can use this identifier
> to search for that particular backup. Does that sound good?
Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.
I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:
START_BACKUP unique_backup_id [options]...
And then others do:
JOIN_BACKUP unique_backup_id
My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.
If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.Okay so If we are to add another replication command ‘JOIN_BACKUP unique_backup_id’
to make workers find the relevant shared state. There won't be any need for changing
the grammar for any other command. The START_BACKUP can return the unique_backup_id
in the result set.
I am thinking of the following struct for shared state:typedef struct
{
char backupid[NAMEDATALEN];
XLogRecPtr startptr;
slock_t lock;
int64 throttling_counter;
bool backup_started_in_recovery;
} BackupSharedState;
The shared state structure entries would be maintained by a shared hash table.There will be one structure per parallel backup. Since a single parallel backupcan engage more than one wal sender, so I think max_wal_senders might be a littletoo much; perhaps max_wal_senders/2 since there will be at least 2 connectionsper parallel backup? Alternatively, we can set a new GUC that defines the maximumnumber of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’perhaps, or we can make it user-configurable.The key would be “backupid=hex_encode(pg_random_strong(16))”Checking for Standby Promotion:At the START_BACKUP command, we initialize BackupSharedState.backup_started_in_recoveryand keep checking it whenever send_file () is called to send a new file.Throttling:BackupSharedState.throttling_counter - The throttling logic remains the sameas for non-parallel backup with the exception that multiple threads will now beupdating it. So in parallel backup, this will represent the overall bytes thathave been transferred. So the workers would sleep if they have exceeded thelimit. Hence, the shared state carries a lock to safely update the throttlingvalue atomically.Progress Reporting:Although I think we should add progress-reporting for parallel backup as aseparate patch. The relevant entries for progress-reporting such as‘backup_total’ and ‘backup_streamed’ would be then added to this structureas well.Grammar:There is a change in the resultset being returned for START_BACKUP command;unique_backup_id is added. Additionally, JOIN_BACKUP replication command isadded. SEND_FILES has been renamed to SEND_FILE. There are no other changesto the grammar.START_BACKUP [LABEL '<label>'] [FAST]- returns startptr, tli, backup_label, unique_backup_idSTOP_BACKUP [NOWAIT]- returns startptr, tli, backup_labelJOIN_BACKUP ‘unique_backup_id’- attaches a shared state identified by ‘unique_backup_id’ to a backend process.LIST_TABLESPACES [PROGRESS]LIST_FILES [TABLESPACE]LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]Hi,
rebased and updated to the current master (8128b0c1). v13 is attached.
- Fixes the above reported issues.- Added progress-reporting support for parallel:
For this, 'backup_streamed' is moved to a shared structure (BackupState) aspg_atomic_uint64 variable. The worker processes will keep incrementing thisvariable.
While files are being transferred from server to client. The main process remainsin an idle state. So after each increment, the worker process will signal master toupdate the stats in pg_stat_progress_basebackup view.The 'tablespace_streamed' column is not updated and will remain empty. This isbecause multiple workers may be copying files from different tablespaces.
- Added backup manifest:
The backend workers maintain their own manifest file which contains a list of filesthat are being transferred by the work. Once all backup files are transferred, theworkers will create a temp file as ('pg_tempdir/temp_file_prefix_backupid.workerid')to write the content of the manifest file from BufFile. The workers won’t add theheader, nor the WAL information in their manifest. These two will be added by themain process while merging all worker manifest files.
The main process will read these individual files and concatenate them into a single filewhich is then sent back to the client.
The manifest file is created when the following command is received:BUILD_MANIFEST 'backupid'
This is a new replication command. It is sent when pg_basebackup has copied all the$PGDATA files including WAL files.
--Regards====================================Kashif ZeeshanLead Quality Assurance Engineer / ManagerEnterpriseDB CorporationThe Enterprise Postgres Company
Attachment
On Tue, Apr 14, 2020 at 10:37 AM Asif Rehman <asifr.rehman@gmail.com> wrote: > I forgot to make a check for no-manifest. Fixed. Attached is the updated patch. +typedef struct +{ ... +} BackupFile; + +typedef struct +{ ... +} BackupState; These structures need comments. +list_wal_files_opt_list: + SCONST SCONST { - $$ = makeDefElem("manifest_checksums", - (Node *)makeString($2), -1); + $$ = list_make2( + makeDefElem("start_wal_location", + (Node *)makeString($2), -1), + makeDefElem("end_wal_location", + (Node *)makeString($2), -1)); + } This seems like an unnecessarily complicated parse representation. The DefElems seem to be completely unnecessary here. @@ -998,7 +1110,37 @@ SendBaseBackup(BaseBackupCmd *cmd) set_ps_display(activitymsg); } - perform_base_backup(&opt); + switch (cmd->cmdtag) So the design here is that SendBaseBackup() is now going to do a bunch of things that are NOT sending a base backup? With no updates to the comments of that function and no change to the process title it sets? - return (manifest->buffile != NULL); + return (manifest && manifest->buffile != NULL); Heck no. It appears that you didn't even bother reading the function header comment. + * Send a single resultset containing XLogRecPtr record (in text format) + * TimelineID and backup label. */ static void -SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli) +SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli, + StringInfo label, char *backupid) This just casually breaks wire protocol compatibility, which seems completely unacceptable. + if (strlen(opt->tablespace) > 0) + sendTablespace(opt->tablespace, NULL, true, NULL, &files); + else + sendDir(".", 1, true, NIL, true, NULL, NULL, &files); + + SendFilesHeader(files); So I guess the idea here is that we buffer the entire list of files in memory, regardless of size, and then we send it out afterwards. That doesn't seem like a good idea. The list of files might be very large. We probably need some code refactoring here rather than just piling more and more different responsibilities onto sendTablespace() and sendDir(). + if (state->parallel_mode) + SpinLockAcquire(&state->lock); + + state->throttling_counter += increment; + + if (state->parallel_mode) + SpinLockRelease(&state->lock); I don't like this much. It seems to me that we would do better to use atomics here all the time, instead of conditional spinlocks. +static void +send_file(basebackup_options *opt, char *file, bool missing_ok) ... + if (file == NULL) + return; That seems totally inappropriate. + sendFile(file, file + basepathlen, &statbuf, true, InvalidOid, NULL, NULL); Maybe I'm misunderstanding, but this looks like it's going to write a tar header, even though we're not writing a tarfile. + else + ereport(WARNING, + (errmsg("skipping special file or directory \"%s\"", file))); So, if the user asks for a directory or symlink, what's going to happen is that they're going to receive an empty file, and get a warning. That sounds like terrible behavior. + /* + * Check for checksum failures. If there are failures across multiple + * processes it may not report total checksum count, but it will error + * out,terminating the backup. + */ In other words, the patch breaks the feature. Not that the feature in question works particularly well as things stand, but this makes it worse. I think this patch (0003) is in really bad shape. I'm having second thoughts about the design, but it's kind of hard to even have a discussion about the design when the patch is riddled with minor problems like inadequate comments, failure to update existing comments, and breaking a bunch of things. I understand that sometimes things get missed, but this is version 14 of a patch that's been kicking around since last August. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Apr 14, 2020 at 10:37 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> I forgot to make a check for no-manifest. Fixed. Attached is the updated patch.
+typedef struct
+{
...
+} BackupFile;
+
+typedef struct
+{
...
+} BackupState;
These structures need comments.
+list_wal_files_opt_list:
+ SCONST SCONST
{
- $$ = makeDefElem("manifest_checksums",
-
(Node *)makeString($2), -1);
+ $$ = list_make2(
+ makeDefElem("start_wal_location",
+ (Node *)makeString($2), -1),
+ makeDefElem("end_wal_location",
+ (Node *)makeString($2), -1));
+
}
This seems like an unnecessarily complicated parse representation. The
DefElems seem to be completely unnecessary here.
@@ -998,7 +1110,37 @@ SendBaseBackup(BaseBackupCmd *cmd)
set_ps_display(activitymsg);
}
- perform_base_backup(&opt);
+ switch (cmd->cmdtag)
So the design here is that SendBaseBackup() is now going to do a bunch
of things that are NOT sending a base backup? With no updates to the
comments of that function and no change to the process title it sets?
- return (manifest->buffile != NULL);
+ return (manifest && manifest->buffile != NULL);
Heck no. It appears that you didn't even bother reading the function
header comment.
+ * Send a single resultset containing XLogRecPtr record (in text format)
+ * TimelineID and backup label.
*/
static void
-SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
+SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli,
+ StringInfo label, char *backupid)
This just casually breaks wire protocol compatibility, which seems
completely unacceptable.
+ if (strlen(opt->tablespace) > 0)
+ sendTablespace(opt->tablespace, NULL, true, NULL, &files);
+ else
+ sendDir(".", 1, true, NIL, true, NULL, NULL, &files);
+
+ SendFilesHeader(files);
So I guess the idea here is that we buffer the entire list of files in
memory, regardless of size, and then we send it out afterwards. That
doesn't seem like a good idea. The list of files might be very large.
We probably need some code refactoring here rather than just piling
more and more different responsibilities onto sendTablespace() and
sendDir().
+ if (state->parallel_mode)
+ SpinLockAcquire(&state->lock);
+
+ state->throttling_counter += increment;
+
+ if (state->parallel_mode)
+ SpinLockRelease(&state->lock);
I don't like this much. It seems to me that we would do better to use
atomics here all the time, instead of conditional spinlocks.
+static void
+send_file(basebackup_options *opt, char *file, bool missing_ok)
...
+ if (file == NULL)
+ return;
That seems totally inappropriate.
+ sendFile(file, file + basepathlen, &statbuf,
true, InvalidOid, NULL, NULL);
Maybe I'm misunderstanding, but this looks like it's going to write a
tar header, even though we're not writing a tarfile.
+ else
+ ereport(WARNING,
+ (errmsg("skipping special file
or directory \"%s\"", file)));
So, if the user asks for a directory or symlink, what's going to
happen is that they're going to receive an empty file, and get a
warning. That sounds like terrible behavior.
+ /*
+ * Check for checksum failures. If there are failures across multiple
+ * processes it may not report total checksum count, but it will error
+ * out,terminating the backup.
+ */
In other words, the patch breaks the feature. Not that the feature in
question works particularly well as things stand, but this makes it
worse.
I think this patch (0003) is in really bad shape. I'm having second
thoughts about the design, but it's kind of hard to even have a
discussion about the design when the patch is riddled with minor
problems like inadequate comments, failure to update existing
comments, and breaking a bunch of things. I understand that sometimes
things get missed, but this is version 14 of a patch that's been
kicking around since last August.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.ca
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create tablespace tblsp1 location '/tmp/test_bkp/tblsp1';"
CREATE TABLESPACE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "create table test (a text) tablespace tblsp1;"
CREATE TABLE
[edb@localhost bin]$ ./psql postgres -p 5432 -c "insert into test values ('parallel_backup with -T tablespace option');"
INSERT 0 1
[edb@localhost bin]$ ./pg_basebackup -p 5432 -D /tmp/test_bkp/bkp -T /tmp/test_bkp/tblsp1=/tmp/test_bkp/tblsp2 -j 4
[edb@localhost bin]$ ./pg_verifybackup /tmp/test_bkp/bkp
pg_verifybackup: error: "pg_tblspc/16384/PG_13_202004074/13530/16390" is present on disk but not in the manifest
pg_verifybackup: error: "pg_tblspc/16384/PG_13_202004074/13530/16388" is present on disk but not in the manifest
pg_verifybackup: error: "pg_tblspc/16384/PG_13_202004074/13530/16385" is present on disk but not in the manifest
pg_verifybackup: error: "/PG_13_202004074/13530/16388" is present in the manifest but not on disk
pg_verifybackup: error: "/PG_13_202004074/13530/16390" is present in the manifest but not on disk
pg_verifybackup: error: "/PG_13_202004074/13530/16385" is present in the manifest but not on disk
[edb@localhost bin]$ ./pg_verifybackup /tmp/test_bkp/bkp1
backup successfully verified
--On Wed, 15 Apr 2020 at 1:49 AM, Robert Haas <robertmhaas@gmail.com> wrote:On Tue, Apr 14, 2020 at 10:37 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
> I forgot to make a check for no-manifest. Fixed. Attached is the updated patch.
+typedef struct
+{
...
+} BackupFile;
+
+typedef struct
+{
...
+} BackupState;
These structures need comments.
+list_wal_files_opt_list:
+ SCONST SCONST
{
- $$ = makeDefElem("manifest_checksums",
-
(Node *)makeString($2), -1);
+ $$ = list_make2(
+ makeDefElem("start_wal_location",
+ (Node *)makeString($2), -1),
+ makeDefElem("end_wal_location",
+ (Node *)makeString($2), -1));
+
}
This seems like an unnecessarily complicated parse representation. The
DefElems seem to be completely unnecessary here.
@@ -998,7 +1110,37 @@ SendBaseBackup(BaseBackupCmd *cmd)
set_ps_display(activitymsg);
}
- perform_base_backup(&opt);
+ switch (cmd->cmdtag)
So the design here is that SendBaseBackup() is now going to do a bunch
of things that are NOT sending a base backup? With no updates to the
comments of that function and no change to the process title it sets?
- return (manifest->buffile != NULL);
+ return (manifest && manifest->buffile != NULL);
Heck no. It appears that you didn't even bother reading the function
header comment.
+ * Send a single resultset containing XLogRecPtr record (in text format)
+ * TimelineID and backup label.
*/
static void
-SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
+SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli,
+ StringInfo label, char *backupid)
This just casually breaks wire protocol compatibility, which seems
completely unacceptable.
+ if (strlen(opt->tablespace) > 0)
+ sendTablespace(opt->tablespace, NULL, true, NULL, &files);
+ else
+ sendDir(".", 1, true, NIL, true, NULL, NULL, &files);
+
+ SendFilesHeader(files);
So I guess the idea here is that we buffer the entire list of files in
memory, regardless of size, and then we send it out afterwards. That
doesn't seem like a good idea. The list of files might be very large.
We probably need some code refactoring here rather than just piling
more and more different responsibilities onto sendTablespace() and
sendDir().
+ if (state->parallel_mode)
+ SpinLockAcquire(&state->lock);
+
+ state->throttling_counter += increment;
+
+ if (state->parallel_mode)
+ SpinLockRelease(&state->lock);
I don't like this much. It seems to me that we would do better to use
atomics here all the time, instead of conditional spinlocks.
+static void
+send_file(basebackup_options *opt, char *file, bool missing_ok)
...
+ if (file == NULL)
+ return;
That seems totally inappropriate.
+ sendFile(file, file + basepathlen, &statbuf,
true, InvalidOid, NULL, NULL);
Maybe I'm misunderstanding, but this looks like it's going to write a
tar header, even though we're not writing a tarfile.
+ else
+ ereport(WARNING,
+ (errmsg("skipping special file
or directory \"%s\"", file)));
So, if the user asks for a directory or symlink, what's going to
happen is that they're going to receive an empty file, and get a
warning. That sounds like terrible behavior.
+ /*
+ * Check for checksum failures. If there are failures across multiple
+ * processes it may not report total checksum count, but it will error
+ * out,terminating the backup.
+ */
In other words, the patch breaks the feature. Not that the feature in
question works particularly well as things stand, but this makes it
worse.
I think this patch (0003) is in really bad shape. I'm having second
thoughts about the design, but it's kind of hard to even have a
discussion about the design when the patch is riddled with minor
problems like inadequate comments, failure to update existing
comments, and breaking a bunch of things. I understand that sometimes
things get missed, but this is version 14 of a patch that's been
kicking around since last August.Fair enough. Some of this is also due to backup related features i.e backup manifest, progress reporting that got committed to master towards the tail end of PG-13. Rushing to get parallel backup feature compatible with these features also caused some of the oversights.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyHighgo Software (Canada/China/Pakistan)
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.ca
On Wed, Apr 15, 2020 at 4:49 AM Ahsan Hadi <ahsan.hadi@gmail.com> wrote: > Fair enough. Some of this is also due to backup related features i.e backup manifest, progress reporting that got committedto master towards the tail end of PG-13. Rushing to get parallel backup feature compatible with these features alsocaused some of the oversights. Sure, but there's also no point in rushing out a feature that's in a state where it's got no chance of being acceptable, and quite a number of these problems are not new, either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Apr 14, 2020 at 6:32 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:Hi AsifGetting the following error on Parallel backup when --no-manifest option is used.[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 5 -D /home/edb/Desktop/backup/ --no-manifest
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_10223"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: write-ahead log end point: 0/2000100
pg_basebackup: error: could not get data for 'BUILD_MANIFEST': ERROR: could not open file "base/pgsql_tmp/pgsql_tmp_b4ef5ac0fd150b2a28caf626bbb1bef2.1": No such file or directory
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
[edb@localhost bin]$I forgot to make a check for no-manifest. Fixed. Attached is the updated patch.
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 5 -D
/home/edb/Desktop/backup --no-manifest
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/4000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_27407"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: write-ahead log end point: 0/4000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: base backup completed
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label pg_commit_ts pg_ident.conf pg_notify pg_snapshots
pg_subtrans PG_VERSION postgresql.auto.conf
base pg_dynshmem pg_logical pg_replslot pg_stat
pg_tblspc pg_wal postgresql.conf
global pg_hba.conf pg_multixact pg_serial pg_stat_tmp
pg_twophase pg_xact
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
ThanksOn Tue, Apr 14, 2020 at 5:33 PM Asif Rehman <asifr.rehman@gmail.com> wrote:On Wed, Apr 8, 2020 at 6:53 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:On Tue, Apr 7, 2020 at 9:44 PM Asif Rehman <asifr.rehman@gmail.com> wrote:- This will generate a unique backupid using pg_strong_random(16) and hex-encodedHi,Thanks, Kashif and Rajkumar. I have fixed the reported issues.I have added the shared state as previously described. The new grammar changesare as follows:START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]it. which is then returned as the result set.- It will also create a shared state and add it to the hashtable. The hash table size is setto BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically, I think it'ssufficient initial size. max_wal_senders is not used, because it can be set to quite alarge values.JOIN_BACKUP 'backup_id'- finds 'backup_id' in hashtable and attaches it to server process.SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]- renamed SEND_FILES to SEND_FILE- removed START_WAL_LOCATION from this because 'startptr' is now accessible throughshared state.There is no change in other commands:STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']The current patches (v11) have been rebased to the latest master. The backup manifest is enabledby default, so I have disabled it for parallel backup mode and have generated a warning so thatuser is aware of it and not expect it in the backup.Hi AsifI have verified the bug fixes, one bug is fixed and working now as expectedFor the verification of the other bug fixes faced following issues, please have a look.1) Following bug fixes mentioned below are generating segmentation fault.Please note for reference I have added a description only as steps were given in previous emails of each bug I tried to verify the fix. Backtrace is also added with each case which points to one bug for both the cases.a) The backup failed with errors "error: could not connect to server: could not look up local user ID 1000: Too many open files" when the max_wal_senders was set to 2000.
[edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D /home/edb/Desktop/backup/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_9925"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
….
….
pg_basebackup: backup worker (1014) created
pg_basebackup: backup worker (1015) created
pg_basebackup: backup worker (1016) created
pg_basebackup: backup worker (1017) created
pg_basebackup: error: could not connect to server: could not look up local user ID 1000: Too many open files
Segmentation fault
[edb@localhost bin]$[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup /tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 13219]
[New LWP 13222]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 1990 -D /home/edb/Desktop/backup/'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47 if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1 0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
#2 0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
#3 0x00007f2226f76a49 in __run_exit_handlers (status=1, listp=0x7f22272f86c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#4 0x00007f2226f76a95 in __GI_exit (status=<optimized out>) at exit.c:99
#5 0x0000000000408c54 in create_parallel_workers (backupinfo=0x952ca0) at pg_basebackup.c:2811
#6 0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
#7 0x0000000000408b4d in main (argc=6, argv=0x7ffe3dabc718) at pg_basebackup.c:2765
(gdb)
b) When executing two backups at the same time, getting FATAL error due to max_wal_senders and instead of exit Backup got completed.[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 1/DA000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_17066"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
Segmentation fault (core dumped)
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup /tmp/cores/core.pg_basebackup.17041.localhost.localdomain.1586353696
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 17041]
[New LWP 17067]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47 if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1 0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
#2 0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
#3 0x00007f051edc1a49 in __run_exit_handlers (status=1, listp=0x7f051f1436c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#4 0x00007f051edc1a95 in __GI_exit (status=<optimized out>) at exit.c:99
#5 0x0000000000408c54 in create_parallel_workers (backupinfo=0x1c6dca0) at pg_basebackup.c:2811
#6 0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
#7 0x0000000000408b4d in main (argc=6, argv=0x7ffdb76a6d68) at pg_basebackup.c:2765
(gdb)2) The following bug is not fixed yetA similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_16235"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
base pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_xact postgresql.conf
pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION postgresql.auto.conf
[edb@localhost bin]$
[edb@localhost bin]$ThanksKashif ZeeshanOn Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:Hi AsifWhen a non-existent slot is used with tablespace then correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.Steps=======edb@localhost bin]$
[edb@localhost bin]$ mkdir /home/edb/tbl1
[edb@localhost bin]$ mkdir /home/edb/tbl_res
[edb@localhost bin]$
postgres=# create tablespace tbl1 location '/home/edb/tbl1';
CREATE TABLESPACE
postgres=#
postgres=# create table t1 (a int) tablespace tbl1;
CREATE TABLE
postgres=# insert into t1 values(100);
INSERT 0 1
postgres=# insert into t1 values(200);
INSERT 0 1
postgres=# insert into t1 values(300);
INSERT 0 1
postgres=#
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 2 -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test" does not exist
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: write-ahead log end point: 0/2E000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child thread exited with error 1
[edb@localhost bin]$
backup folder not cleaned
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label global pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION pg_xact postgresql.conf
base pg_commit_ts pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_wal postgresql.auto.conf
[edb@localhost bin]$If the same case is executed without the parallel backup patch then the backup folder is cleaned after the error is displayed.[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test999
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test999" does not exist
pg_basebackup: write-ahead log end point: 0/2B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory " /home/edb/Desktop/backup"
pg_basebackup: changes to tablespace directories will not be undoneHi AsifA similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup. I think one bug fix will solve all these cases where clean up is not done when parallel backup is failed.[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57337"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$Same case when executed on pg_basebackup without the Parallel backup patch then proper clean up is done.[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_5590"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
[edb@localhost bin]$ThanksOn Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com> wrote:On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote:On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> Why would you need to do that? As long as the process where
>> STOP_BACKUP can do the check, that seems good enough.
>
> Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
> in progress. So if the backup is a large one, early error detection would be much beneficial.
> This is the current behavior of non-parallel backup as well.
Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.
TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.Non-parallel backup already does the early error checking. I only intended
to make parallel behave the same as non-parallel here. So, I agree with
you that the behavior of parallel backup should be consistent with the
non-parallel one. Please see the code snippet below from
basebackup.c:sendDir()
/*
* Check if the postmaster has signaled us to exit, and abort with an
* error in that case. The error handler further up will call
* do_pg_abort_backup() for us. Also check that if the backup was
* started while still in recovery, the server wasn't promoted.
* do_pg_stop_backup() will check that too, but it's better to stop
* the backup early than continue to the end and fail there.
*/
CHECK_FOR_INTERRUPTS();
if (RecoveryInProgress() != backup_started_in_recovery)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("the standby was promoted during online backup"),
errhint("This means that the backup being taken is corrupt "
"and should not be used. "
"Try taking another online backup.")));
> Okay, then I will add the shared state. And since we are adding the shared state, we can use
> that for throttling, progress-reporting and standby early error checking.
Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.
> There are two possible options:
>
> (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> (2) (Preferred Option) Use the WAL start location as the BackupID.
>
> This BackupID should be given back as a response to start backup command. All client workers
> must append this ID to all parallel backup replication commands. So that we can use this identifier
> to search for that particular backup. Does that sound good?
Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.
I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:
START_BACKUP unique_backup_id [options]...
And then others do:
JOIN_BACKUP unique_backup_id
My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.
If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.Okay so If we are to add another replication command ‘JOIN_BACKUP unique_backup_id’
to make workers find the relevant shared state. There won't be any need for changing
the grammar for any other command. The START_BACKUP can return the unique_backup_id
in the result set.
I am thinking of the following struct for shared state:typedef struct
{
char backupid[NAMEDATALEN];
XLogRecPtr startptr;
slock_t lock;
int64 throttling_counter;
bool backup_started_in_recovery;
} BackupSharedState;
The shared state structure entries would be maintained by a shared hash table.There will be one structure per parallel backup. Since a single parallel backupcan engage more than one wal sender, so I think max_wal_senders might be a littletoo much; perhaps max_wal_senders/2 since there will be at least 2 connectionsper parallel backup? Alternatively, we can set a new GUC that defines the maximumnumber of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’perhaps, or we can make it user-configurable.The key would be “backupid=hex_encode(pg_random_strong(16))”Checking for Standby Promotion:At the START_BACKUP command, we initialize BackupSharedState.backup_started_in_recoveryand keep checking it whenever send_file () is called to send a new file.Throttling:BackupSharedState.throttling_counter - The throttling logic remains the sameas for non-parallel backup with the exception that multiple threads will now beupdating it. So in parallel backup, this will represent the overall bytes thathave been transferred. So the workers would sleep if they have exceeded thelimit. Hence, the shared state carries a lock to safely update the throttlingvalue atomically.Progress Reporting:Although I think we should add progress-reporting for parallel backup as aseparate patch. The relevant entries for progress-reporting such as‘backup_total’ and ‘backup_streamed’ would be then added to this structureas well.Grammar:There is a change in the resultset being returned for START_BACKUP command;unique_backup_id is added. Additionally, JOIN_BACKUP replication command isadded. SEND_FILES has been renamed to SEND_FILE. There are no other changesto the grammar.START_BACKUP [LABEL '<label>'] [FAST]- returns startptr, tli, backup_label, unique_backup_idSTOP_BACKUP [NOWAIT]- returns startptr, tli, backup_labelJOIN_BACKUP ‘unique_backup_id’- attaches a shared state identified by ‘unique_backup_id’ to a backend process.LIST_TABLESPACES [PROGRESS]LIST_FILES [TABLESPACE]LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]Hi,
rebased and updated to the current master (8128b0c1). v13 is attached.
- Fixes the above reported issues.- Added progress-reporting support for parallel:
For this, 'backup_streamed' is moved to a shared structure (BackupState) aspg_atomic_uint64 variable. The worker processes will keep incrementing thisvariable.
While files are being transferred from server to client. The main process remainsin an idle state. So after each increment, the worker process will signal master toupdate the stats in pg_stat_progress_basebackup view.The 'tablespace_streamed' column is not updated and will remain empty. This isbecause multiple workers may be copying files from different tablespaces.
- Added backup manifest:
The backend workers maintain their own manifest file which contains a list of filesthat are being transferred by the work. Once all backup files are transferred, theworkers will create a temp file as ('pg_tempdir/temp_file_prefix_backupid.workerid')to write the content of the manifest file from BufFile. The workers won’t add theheader, nor the WAL information in their manifest. These two will be added by themain process while merging all worker manifest files.
The main process will read these individual files and concatenate them into a single filewhich is then sent back to the client.
The manifest file is created when the following command is received:BUILD_MANIFEST 'backupid'
This is a new replication command. It is sent when pg_basebackup has copied all the$PGDATA files including WAL files.
--Regards====================================Kashif ZeeshanLead Quality Assurance Engineer / ManagerEnterpriseDB CorporationThe Enterprise Postgres Company----Asif Rehman
--

On Wed, Apr 8, 2020 at 6:53 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:On Tue, Apr 7, 2020 at 9:44 PM Asif Rehman <asifr.rehman@gmail.com> wrote:- This will generate a unique backupid using pg_strong_random(16) and hex-encodedHi,Thanks, Kashif and Rajkumar. I have fixed the reported issues.I have added the shared state as previously described. The new grammar changesare as follows:START_BACKUP [LABEL '<label>'] [FAST] [MAX_RATE %d]it. which is then returned as the result set.- It will also create a shared state and add it to the hashtable. The hash table size is setto BACKUP_HASH_SIZE=10, but since hashtable can expand dynamically, I think it'ssufficient initial size. max_wal_senders is not used, because it can be set to quite alarge values.JOIN_BACKUP 'backup_id'- finds 'backup_id' in hashtable and attaches it to server process.SEND_FILE '(' 'FILE' ')' [NOVERIFY_CHECKSUMS]- renamed SEND_FILES to SEND_FILE- removed START_WAL_LOCATION from this because 'startptr' is now accessible throughshared state.There is no change in other commands:STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']The current patches (v11) have been rebased to the latest master. The backup manifest is enabledby default, so I have disabled it for parallel backup mode and have generated a warning so thatuser is aware of it and not expect it in the backup.Hi AsifI have verified the bug fixes, one bug is fixed and working now as expectedFor the verification of the other bug fixes faced following issues, please have a look.1) Following bug fixes mentioned below are generating segmentation fault.Please note for reference I have added a description only as steps were given in previous emails of each bug I tried to verify the fix. Backtrace is also added with each case which points to one bug for both the cases.a) The backup failed with errors "error: could not connect to server: could not look up local user ID 1000: Too many open files" when the max_wal_senders was set to 2000.
[edb@localhost bin]$ ./pg_basebackup -v -j 1990 -D /home/edb/Desktop/backup/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_9925"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
….
….
pg_basebackup: backup worker (1014) created
pg_basebackup: backup worker (1015) created
pg_basebackup: backup worker (1016) created
pg_basebackup: backup worker (1017) created
pg_basebackup: error: could not connect to server: could not look up local user ID 1000: Too many open files
Segmentation fault
[edb@localhost bin]$[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup /tmp/cores/core.pg_basebackup.13219.localhost.localdomain.1586349551
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 13219]
[New LWP 13222]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 1990 -D /home/edb/Desktop/backup/'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47 if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1 0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
#2 0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
#3 0x00007f2226f76a49 in __run_exit_handlers (status=1, listp=0x7f22272f86c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#4 0x00007f2226f76a95 in __GI_exit (status=<optimized out>) at exit.c:99
#5 0x0000000000408c54 in create_parallel_workers (backupinfo=0x952ca0) at pg_basebackup.c:2811
#6 0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
#7 0x0000000000408b4d in main (argc=6, argv=0x7ffe3dabc718) at pg_basebackup.c:2765
(gdb)
b) When executing two backups at the same time, getting FATAL error due to max_wal_senders and instead of exit Backup got completed.[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 1/DA000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_17066"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
Segmentation fault (core dumped)
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ gdb pg_basebackup /tmp/cores/core.pg_basebackup.17041.localhost.localdomain.1586353696
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/edb/Communtiy_Parallel_backup/postgresql/inst/bin/pg_basebackup...done.
[New LWP 17041]
[New LWP 17067]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `./pg_basebackup -v -j 8 -D /home/edb/Desktop/backup1/'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
47 if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) bt
#0 pthread_join (threadid=0, thread_return=0x0) at pthread_join.c:47
#1 0x000000000040904a in cleanup_workers () at pg_basebackup.c:2978
#2 0x0000000000403806 in disconnect_atexit () at pg_basebackup.c:332
#3 0x00007f051edc1a49 in __run_exit_handlers (status=1, listp=0x7f051f1436c8 <__exit_funcs>, run_list_atexit=run_list_atexit@entry=true) at exit.c:77
#4 0x00007f051edc1a95 in __GI_exit (status=<optimized out>) at exit.c:99
#5 0x0000000000408c54 in create_parallel_workers (backupinfo=0x1c6dca0) at pg_basebackup.c:2811
#6 0x000000000040798f in BaseBackup () at pg_basebackup.c:2211
#7 0x0000000000408b4d in main (argc=6, argv=0x7ffdb76a6d68) at pg_basebackup.c:2765
(gdb)2) The following bug is not fixed yetA similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_16235"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
base pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_xact postgresql.conf
pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION postgresql.auto.conf
[edb@localhost bin]$
[edb@localhost bin]$ThanksKashif ZeeshanOn Tue, Apr 7, 2020 at 4:03 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:On Fri, Apr 3, 2020 at 3:01 PM Kashif Zeeshan <kashif.zeeshan@enterprisedb.com> wrote:Hi AsifWhen a non-existent slot is used with tablespace then correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup.Steps=======edb@localhost bin]$
[edb@localhost bin]$ mkdir /home/edb/tbl1
[edb@localhost bin]$ mkdir /home/edb/tbl_res
[edb@localhost bin]$
postgres=# create tablespace tbl1 location '/home/edb/tbl1';
CREATE TABLESPACE
postgres=#
postgres=# create table t1 (a int) tablespace tbl1;
CREATE TABLE
postgres=# insert into t1 values(100);
INSERT 0 1
postgres=# insert into t1 values(200);
INSERT 0 1
postgres=# insert into t1 values(300);
INSERT 0 1
postgres=#
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -j 2 -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2E000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test" does not exist
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: write-ahead log end point: 0/2E000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child thread exited with error 1
[edb@localhost bin]$
backup folder not cleaned
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
backup_label global pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION pg_xact postgresql.conf
base pg_commit_ts pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_wal postgresql.auto.conf
[edb@localhost bin]$If the same case is executed without the parallel backup patch then the backup folder is cleaned after the error is displayed.[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -T /home/edb/tbl1=/home/edb/tbl_res -S test999
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/2B000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: replication slot "test999" does not exist
pg_basebackup: write-ahead log end point: 0/2B000100
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory " /home/edb/Desktop/backup"
pg_basebackup: changes to tablespace directories will not be undoneHi AsifA similar case is when DB Server is shut down while the Parallel Backup is in progress then the correct error is displayed but then the backup folder is not cleaned and leaves a corrupt backup. I think one bug fix will solve all these cases where clean up is not done when parallel backup is failed.[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C1000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_57337"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$Same case when executed on pg_basebackup without the Parallel backup patch then proper clean up is done.[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/C5000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_5590"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
[edb@localhost bin]$ThanksOn Fri, Apr 3, 2020 at 1:46 PM Asif Rehman <asifr.rehman@gmail.com> wrote:On Thu, Apr 2, 2020 at 8:45 PM Robert Haas <robertmhaas@gmail.com> wrote:On Thu, Apr 2, 2020 at 11:17 AM Asif Rehman <asifr.rehman@gmail.com> wrote:
>> Why would you need to do that? As long as the process where
>> STOP_BACKUP can do the check, that seems good enough.
>
> Yes, but the user will get the error only after the STOP_BACKUP, not while the backup is
> in progress. So if the backup is a large one, early error detection would be much beneficial.
> This is the current behavior of non-parallel backup as well.
Because non-parallel backup does not feature early detection of this
error, it is not necessary to make parallel backup do so. Indeed, it
is undesirable. If you want to fix that problem, do it on a separate
thread in a separate patch. A patch proposing to make parallel backup
inconsistent in behavior with non-parallel backup will be rejected, at
least if I have anything to say about it.
TBH, fixing this doesn't seem like an urgent problem to me. The
current situation is not great, but promotions ought to be relatively
infrequent, so I'm not sure it's a huge problem in practice. It is
also worth considering whether the right fix is to figure out how to
make that case actually work, rather than just making it fail quicker.
I don't currently understand the reason for the prohibition so I can't
express an intelligent opinion on what the right answer is here, but
it seems like it ought to be investigated before somebody goes and
builds a bunch of infrastructure to make the error more timely.Non-parallel backup already does the early error checking. I only intended
to make parallel behave the same as non-parallel here. So, I agree with
you that the behavior of parallel backup should be consistent with the
non-parallel one. Please see the code snippet below from
basebackup.c:sendDir()
/*
* Check if the postmaster has signaled us to exit, and abort with an
* error in that case. The error handler further up will call
* do_pg_abort_backup() for us. Also check that if the backup was
* started while still in recovery, the server wasn't promoted.
* do_pg_stop_backup() will check that too, but it's better to stop
* the backup early than continue to the end and fail there.
*/
CHECK_FOR_INTERRUPTS();
if (RecoveryInProgress() != backup_started_in_recovery)
ereport(ERROR,
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg("the standby was promoted during online backup"),
errhint("This means that the backup being taken is corrupt "
"and should not be used. "
"Try taking another online backup.")));
> Okay, then I will add the shared state. And since we are adding the shared state, we can use
> that for throttling, progress-reporting and standby early error checking.
Please propose a grammar here for all the new replication commands you
plan to add before going and implement everything. That will make it
easier to hash out the design without forcing you to keep changing the
code. Your design should include a sketch of how several sets of
coordinating backends taking several concurrent parallel backups will
end up with one shared state per parallel backup.
> There are two possible options:
>
> (1) Server may generate a unique ID i.e. BackupID=<unique_string> OR
> (2) (Preferred Option) Use the WAL start location as the BackupID.
>
> This BackupID should be given back as a response to start backup command. All client workers
> must append this ID to all parallel backup replication commands. So that we can use this identifier
> to search for that particular backup. Does that sound good?
Using the WAL start location as the backup ID seems like it might be
problematic -- could a single checkpoint not end up as the start
location for multiple backups started at the same time? Whether that's
possible now or not, it seems unwise to hard-wire that assumption into
the wire protocol.
I was thinking that perhaps the client should generate a unique backup
ID, e.g. leader does:
START_BACKUP unique_backup_id [options]...
And then others do:
JOIN_BACKUP unique_backup_id
My thought is that you will have a number of shared memory structure
equal to max_wal_senders, each one large enough to hold the shared
state for one backup. The shared state will include
char[NAMEDATALEN-or-something] which will be used to hold the backup
ID. START_BACKUP would allocate one and copy the name into it;
JOIN_BACKUP would search for one by name.
If you want to generate the name on the server side, then I suppose
START_BACKUP would return a result set that includes the backup ID,
and clients would have to specify that same backup ID when invoking
JOIN_BACKUP. The rest would stay the same. I am not sure which way is
better. Either way, the backup ID should be something long and hard to
guess, not e.g. the leader processes' PID. I think we should generate
it using pg_strong_random, say 8 or 16 bytes, and then hex-encode the
result to get a string. That way there's almost no risk of two backup
IDs colliding accidentally, and even if we somehow had a malicious
user trying to screw up somebody else's parallel backup by choosing a
colliding backup ID, it would be pretty hard to have any success. A
user with enough access to do that sort of thing can probably cause a
lot worse problems anyway, but it seems pretty easy to guard against
intentional collisions robustly here, so I think we should.Okay so If we are to add another replication command ‘JOIN_BACKUP unique_backup_id’
to make workers find the relevant shared state. There won't be any need for changing
the grammar for any other command. The START_BACKUP can return the unique_backup_id
in the result set.
I am thinking of the following struct for shared state:typedef struct
{
char backupid[NAMEDATALEN];
XLogRecPtr startptr;
slock_t lock;
int64 throttling_counter;
bool backup_started_in_recovery;
} BackupSharedState;
The shared state structure entries would be maintained by a shared hash table.There will be one structure per parallel backup. Since a single parallel backupcan engage more than one wal sender, so I think max_wal_senders might be a littletoo much; perhaps max_wal_senders/2 since there will be at least 2 connectionsper parallel backup? Alternatively, we can set a new GUC that defines the maximumnumber of for concurrent parallel backups i.e. ‘max_concurent_backups_allowed = 10’perhaps, or we can make it user-configurable.The key would be “backupid=hex_encode(pg_random_strong(16))”Checking for Standby Promotion:At the START_BACKUP command, we initialize BackupSharedState.backup_started_in_recoveryand keep checking it whenever send_file () is called to send a new file.Throttling:BackupSharedState.throttling_counter - The throttling logic remains the sameas for non-parallel backup with the exception that multiple threads will now beupdating it. So in parallel backup, this will represent the overall bytes thathave been transferred. So the workers would sleep if they have exceeded thelimit. Hence, the shared state carries a lock to safely update the throttlingvalue atomically.Progress Reporting:Although I think we should add progress-reporting for parallel backup as aseparate patch. The relevant entries for progress-reporting such as‘backup_total’ and ‘backup_streamed’ would be then added to this structureas well.Grammar:There is a change in the resultset being returned for START_BACKUP command;unique_backup_id is added. Additionally, JOIN_BACKUP replication command isadded. SEND_FILES has been renamed to SEND_FILE. There are no other changesto the grammar.START_BACKUP [LABEL '<label>'] [FAST]- returns startptr, tli, backup_label, unique_backup_idSTOP_BACKUP [NOWAIT]- returns startptr, tli, backup_labelJOIN_BACKUP ‘unique_backup_id’- attaches a shared state identified by ‘unique_backup_id’ to a backend process.LIST_TABLESPACES [PROGRESS]LIST_FILES [TABLESPACE]LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']SEND_FILE '(' FILE ')' [NOVERIFY_CHECKSUMS]Hi,
rebased and updated to the current master (8128b0c1). v13 is attached.
- Fixes the above reported issues.
[edb@localhost bin]$
[edb@localhost bin]$ ./pg_basebackup -v -D /home/edb/Desktop/backup/ -j 8
pg_basebackup: warning: backup manifest is disabled in parallel backup mode
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/A0000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_16235"
pg_basebackup: backup worker (0) created
pg_basebackup: backup worker (1) created
pg_basebackup: backup worker (2) created
pg_basebackup: backup worker (3) created
pg_basebackup: backup worker (4) created
pg_basebackup: backup worker (5) created
pg_basebackup: backup worker (6) created
pg_basebackup: backup worker (7) created
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
pg_basebackup: removing contents of data directory "/home/edb/Desktop/backup/"
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$
[edb@localhost bin]$ ls /home/edb/Desktop/backup
base pg_hba.conf pg_logical pg_notify pg_serial pg_stat pg_subtrans pg_twophase pg_xact postgresql.conf
pg_dynshmem pg_ident.conf pg_multixact pg_replslot pg_snapshots pg_stat_tmp pg_tblspc PG_VERSION postgresql.auto.conf
[edb@localhost bin]$
[edb@localhost bin]$
- Added progress-reporting support for parallel:
For this, 'backup_streamed' is moved to a shared structure (BackupState) aspg_atomic_uint64 variable. The worker processes will keep incrementing thisvariable.
While files are being transferred from server to client. The main process remainsin an idle state. So after each increment, the worker process will signal master toupdate the stats in pg_stat_progress_basebackup view.The 'tablespace_streamed' column is not updated and will remain empty. This isbecause multiple workers may be copying files from different tablespaces.
- Added backup manifest:
The backend workers maintain their own manifest file which contains a list of filesthat are being transferred by the work. Once all backup files are transferred, theworkers will create a temp file as ('pg_tempdir/temp_file_prefix_backupid.workerid')to write the content of the manifest file from BufFile. The workers won’t add theheader, nor the WAL information in their manifest. These two will be added by themain process while merging all worker manifest files.
The main process will read these individual files and concatenate them into a single filewhich is then sent back to the client.
The manifest file is created when the following command is received:BUILD_MANIFEST 'backupid'
This is a new replication command. It is sent when pg_basebackup has copied all the$PGDATA files including WAL files.
--

I forgot to make a check for no-manifest. Fixed. Attached is the updated patch.
On Tue, Apr 14, 2020 at 8:07 PM Asif Rehman <asifr.rehman@gmail.com> wrote:--I forgot to make a check for no-manifest. Fixed. Attached is the updated patch.Have we done any performance testing with this patch to see the benefits? If so, can you point me to the results? If not, then can we perform some tests on large backups to see the benefits of this patch/idea?
Hi,I did some tests a while back, and here are the results. The tests were done to simulatea live database environment using pgbench.machine configuration used for this test:Instance Type: t2.xlargeVolume Type : io1Memory (MiB) : 16384vCPU # : 4Architecture : X86_64IOP : 16000Database Size (GB) : 102The setup consist of 3 machines.- one for database instances- one for pg_basebackup client and- one for pgbench with some parallel workers, simulating SELECT loads.basebackup | 4 workers | 8 Workers | 16 workersBackup Duration(Min): 69.25 | 20.44 | 19.86 | 20.15(pgbench running with 50 parallel client simulating SELECT load)
Backup Duration(Min): 154.75 | 49.28 | 45.27 | 20.35(pgbench running with 100 parallel client simulating SELECT load)On Tue, Apr 21, 2020 at 9:27 AM Amit Kapila <amit.kapila16@gmail.com> wrote:On Tue, Apr 14, 2020 at 8:07 PM Asif Rehman <asifr.rehman@gmail.com> wrote:--I forgot to make a check for no-manifest. Fixed. Attached is the updated patch.Have we done any performance testing with this patch to see the benefits? If so, can you point me to the results? If not, then can we perform some tests on large backups to see the benefits of this patch/idea?----Asif Rehman
Hi Asif,On Tue, Apr 21, 2020 at 1:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote:Hi,I did some tests a while back, and here are the results. The tests were done to simulatea live database environment using pgbench.machine configuration used for this test:Instance Type: t2.xlargeVolume Type : io1Memory (MiB) : 16384vCPU # : 4Architecture : X86_64IOP : 16000Database Size (GB) : 102The setup consist of 3 machines.- one for database instances- one for pg_basebackup client and- one for pgbench with some parallel workers, simulating SELECT loads.basebackup | 4 workers | 8 Workers | 16 workersBackup Duration(Min): 69.25 | 20.44 | 19.86 | 20.15(pgbench running with 50 parallel client simulating SELECT load)Well that looks a bit strange. All 4, 8 and 16 workers backup configurationsseem to have taken the same time. Is it because the machine CPUs areonly 4? In that case did you try to run with 2-workers and compare thatwith 4-workers time?Also, just to clarify and be sure - was there anything else running on any ofthese 3 machines while the backup was in progress.
Regards,Jeevan LadheBackup Duration(Min): 154.75 | 49.28 | 45.27 | 20.35(pgbench running with 100 parallel client simulating SELECT load)On Tue, Apr 21, 2020 at 9:27 AM Amit Kapila <amit.kapila16@gmail.com> wrote:On Tue, Apr 14, 2020 at 8:07 PM Asif Rehman <asifr.rehman@gmail.com> wrote:--I forgot to make a check for no-manifest. Fixed. Attached is the updated patch.Have we done any performance testing with this patch to see the benefits? If so, can you point me to the results? If not, then can we perform some tests on large backups to see the benefits of this patch/idea?----Asif Rehman
On Tue, Apr 21, 2020 at 1:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote: > > I did some tests a while back, and here are the results. The tests were done to simulate > a live database environment using pgbench. > > machine configuration used for this test: > Instance Type: t2.xlarge > Volume Type : io1 > Memory (MiB) : 16384 > vCPU # : 4 > Architecture : X86_64 > IOP : 16000 > Database Size (GB) : 102 > > The setup consist of 3 machines. > - one for database instances > - one for pg_basebackup client and > - one for pgbench with some parallel workers, simulating SELECT loads. > > basebackup | 4 workers | 8 Workers | 16 workers > Backup Duration(Min): 69.25 | 20.44 | 19.86 | 20.15 > (pgbench running with 50 parallel client simulating SELECT load) > > Backup Duration(Min): 154.75 | 49.28 | 45.27 | 20.35 > (pgbench running with 100 parallel client simulating SELECT load) > Thanks for sharing the results, these show nice speedup! However, I think we should try to find what exactly causes this speed up. If you see the recent discussion on another thread related to this topic, Andres, pointed out that he doesn't think that we can gain much by having multiple connections[1]. It might be due to some internal limitations (like small buffers) [2] due to which we are seeing these speedups. It might help if you can share the perf reports of the server-side and pg_basebackup side. We don't need pgbench type workload to see what caused speed up. [1] - https://www.postgresql.org/message-id/20200420201922.55ab7ovg6535suyz%40alap3.anarazel.de [2] - https://www.postgresql.org/message-id/20200421064420.z7eattzqbunbutz3%40alap3.anarazel.de -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Apr 21, 2020 at 5:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Tue, Apr 21, 2020 at 1:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote: > > > > I did some tests a while back, and here are the results. The tests were done to simulate > > a live database environment using pgbench. > > > > machine configuration used for this test: > > Instance Type: t2.xlarge > > Volume Type : io1 > > Memory (MiB) : 16384 > > vCPU # : 4 > > Architecture : X86_64 > > IOP : 16000 > > Database Size (GB) : 102 > > > > The setup consist of 3 machines. > > - one for database instances > > - one for pg_basebackup client and > > - one for pgbench with some parallel workers, simulating SELECT loads. > > > > basebackup | 4 workers | 8 Workers | 16 workers > > Backup Duration(Min): 69.25 | 20.44 | 19.86 | 20.15 > > (pgbench running with 50 parallel client simulating SELECT load) > > > > Backup Duration(Min): 154.75 | 49.28 | 45.27 | 20.35 > > (pgbench running with 100 parallel client simulating SELECT load) > > > > Thanks for sharing the results, these show nice speedup! However, I > think we should try to find what exactly causes this speed up. If you > see the recent discussion on another thread related to this topic, > Andres, pointed out that he doesn't think that we can gain much by > having multiple connections[1]. It might be due to some internal > limitations (like small buffers) [2] due to which we are seeing these > speedups. It might help if you can share the perf reports of the > server-side and pg_basebackup side. > Just to be clear, we need perf reports both with and without patch-set. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Tue, Apr 21, 2020 at 5:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Apr 21, 2020 at 1:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
> >
> > I did some tests a while back, and here are the results. The tests were done to simulate
> > a live database environment using pgbench.
> >
> > machine configuration used for this test:
> > Instance Type: t2.xlarge
> > Volume Type : io1
> > Memory (MiB) : 16384
> > vCPU # : 4
> > Architecture : X86_64
> > IOP : 16000
> > Database Size (GB) : 102
> >
> > The setup consist of 3 machines.
> > - one for database instances
> > - one for pg_basebackup client and
> > - one for pgbench with some parallel workers, simulating SELECT loads.
> >
> > basebackup | 4 workers | 8 Workers | 16 workers
> > Backup Duration(Min): 69.25 | 20.44 | 19.86 | 20.15
> > (pgbench running with 50 parallel client simulating SELECT load)
> >
> > Backup Duration(Min): 154.75 | 49.28 | 45.27 | 20.35
> > (pgbench running with 100 parallel client simulating SELECT load)
> >
>
> Thanks for sharing the results, these show nice speedup! However, I
> think we should try to find what exactly causes this speed up. If you
> see the recent discussion on another thread related to this topic,
> Andres, pointed out that he doesn't think that we can gain much by
> having multiple connections[1]. It might be due to some internal
> limitations (like small buffers) [2] due to which we are seeing these
> speedups. It might help if you can share the perf reports of the
> server-side and pg_basebackup side.
>
Just to be clear, we need perf reports both with and without patch-set.
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.ca
On Tue, Apr 21, 2020 at 5:26 PM Ahsan Hadi <ahsan.hadi@gmail.com> wrote: > > On Tue, Apr 21, 2020 at 4:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> On Tue, Apr 21, 2020 at 5:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote: >> > >> > On Tue, Apr 21, 2020 at 1:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote: >> > > >> > > I did some tests a while back, and here are the results. The tests were done to simulate >> > > a live database environment using pgbench. >> > > >> > > machine configuration used for this test: >> > > Instance Type: t2.xlarge >> > > Volume Type : io1 >> > > Memory (MiB) : 16384 >> > > vCPU # : 4 >> > > Architecture : X86_64 >> > > IOP : 16000 >> > > Database Size (GB) : 102 >> > > >> > > The setup consist of 3 machines. >> > > - one for database instances >> > > - one for pg_basebackup client and >> > > - one for pgbench with some parallel workers, simulating SELECT loads. >> > > >> > > basebackup | 4 workers | 8 Workers | 16 workers >> > > Backup Duration(Min): 69.25 | 20.44 | 19.86 | 20.15 >> > > (pgbench running with 50 parallel client simulating SELECT load) >> > > >> > > Backup Duration(Min): 154.75 | 49.28 | 45.27 | 20.35 >> > > (pgbench running with 100 parallel client simulating SELECT load) >> > > >> > >> > Thanks for sharing the results, these show nice speedup! However, I >> > think we should try to find what exactly causes this speed up. If you >> > see the recent discussion on another thread related to this topic, >> > Andres, pointed out that he doesn't think that we can gain much by >> > having multiple connections[1]. It might be due to some internal >> > limitations (like small buffers) [2] due to which we are seeing these >> > speedups. It might help if you can share the perf reports of the >> > server-side and pg_basebackup side. >> > >> >> Just to be clear, we need perf reports both with and without patch-set. > > > These tests were done a while back, I think it would be good to run the benchmark again with the latest patches of parallelbackup and share the results and perf reports. > Sounds good. I think we should also try to run the test with 1 worker as well. The reason it will be good to see the results with 1 worker is that we can know if the technique to send file by file as is done in this patch is better or worse than the current HEAD code. So, it will be good to see the results of an unpatched code, 1 worker, 2 workers, 4 workers, etc. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Hi Asif, I am reviewing your recent patch and found the patch is not applicable on latest master. Could you please resolve the conflicts and update a new patch? Thanks, Dipesh EnterpriseDB: http://www.enterprisedb.com
+typedef struct
+{
...
+} BackupFile;
+
+typedef struct
+{
...
+} BackupState;
These structures need comments.
+list_wal_files_opt_list:
+ SCONST SCONST
{
- $$ = makeDefElem("manifest_checksums",
-
(Node *)makeString($2), -1);
+ $$ = list_make2(
+ makeDefElem("start_wal_location",
+ (Node *)makeString($2), -1),
+ makeDefElem("end_wal_location",
+ (Node *)makeString($2), -1));
+
}
This seems like an unnecessarily complicated parse representation. The
DefElems seem to be completely unnecessary here.
@@ -998,7 +1110,37 @@ SendBaseBackup(BaseBackupCmd *cmd)
set_ps_display(activitymsg);
}
- perform_base_backup(&opt);
+ switch (cmd->cmdtag)
So the design here is that SendBaseBackup() is now going to do a bunch
of things that are NOT sending a base backup? With no updates to the
comments of that function and no change to the process title it sets?
- return (manifest->buffile != NULL);
+ return (manifest && manifest->buffile != NULL);
Heck no. It appears that you didn't even bother reading the function
header comment.
object is always available. Anyways I have removed this check for 003 patch
as well.
+ * Send a single resultset containing XLogRecPtr record (in text format)
+ * TimelineID and backup label.
*/
static void
-SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
+SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli,
+ StringInfo label, char *backupid)
This just casually breaks wire protocol compatibility, which seems
completely unacceptable.
returns startptr, tli, backup label and backupid. So I had extended this result set.
+ if (strlen(opt->tablespace) > 0)
+ sendTablespace(opt->tablespace, NULL, true, NULL, &files);
+ else
+ sendDir(".", 1, true, NIL, true, NULL, NULL, &files);
+
+ SendFilesHeader(files);
So I guess the idea here is that we buffer the entire list of files in
memory, regardless of size, and then we send it out afterwards. That
doesn't seem like a good idea. The list of files might be very large.
We probably need some code refactoring here rather than just piling
more and more different responsibilities onto sendTablespace() and
sendDir().
Currently, basebackup only does compression on the client-side. So, I suggest we stick with
+ if (state->parallel_mode)
+ SpinLockAcquire(&state->lock);
+
+ state->throttling_counter += increment;
+
+ if (state->parallel_mode)
+ SpinLockRelease(&state->lock);
I don't like this much. It seems to me that we would do better to use
atomics here all the time, instead of conditional spinlocks.
Okay have added throttling_counter as atomic. however a lock is still required
for throttling_counter%=throttling_sample.
+static void
+send_file(basebackup_options *opt, char *file, bool missing_ok)
...
+ if (file == NULL)
+ return;
That seems totally inappropriate.
+ sendFile(file, file + basepathlen, &statbuf,
true, InvalidOid, NULL, NULL);
Maybe I'm misunderstanding, but this looks like it's going to write a
tar header, even though we're not writing a tarfile.
sendFile() always sends files with tar header included, even if the backup mode
is plain. pg_basebackup also expects the same. That's the current behavior of
the system.
Otherwise, we will have to duplicate this function which would be doing the pretty
much same thing, except the tar header.
+ else
+ ereport(WARNING,
+ (errmsg("skipping special file
or directory \"%s\"", file)));
So, if the user asks for a directory or symlink, what's going to
happen is that they're going to receive an empty file, and get a
warning. That sounds like terrible behavior.
+ /*
+ * Check for checksum failures. If there are failures across multiple
+ * processes it may not report total checksum count, but it will error
+ * out,terminating the backup.
+ */
In other words, the patch breaks the feature. Not that the feature in
question works particularly well as things stand, but this makes it
worse.
the total count across workers, So it will have the same behavior as current.
Attachment
On Wed, Apr 22, 2020 at 10:18 AM Asif Rehman <asifr.rehman@gmail.com> wrote: > I don't foresee memory to be a challenge here. Assuming a database containing 10240 > relation files (that max reach to 10 TB of size), the list will occupy approximately 102MB > of space in memory. This obviously can be reduced, but it doesn’t seem too bad either. > One way of doing it is by fetching a smaller set of files and clients can result in the next > set if the current one is processed; perhaps fetch initially per table space and request for > next one once the current one is done with. The more concerning case is when someone has a lot of small files. > Okay have added throttling_counter as atomic. however a lock is still required > for throttling_counter%=throttling_sample. Well, if you can't get rid of the lock, using a atomics is pointless. >> + sendFile(file, file + basepathlen, &statbuf, >> true, InvalidOid, NULL, NULL); >> >> Maybe I'm misunderstanding, but this looks like it's going to write a >> tar header, even though we're not writing a tarfile. > > sendFile() always sends files with tar header included, even if the backup mode > is plain. pg_basebackup also expects the same. That's the current behavior of > the system. > > Otherwise, we will have to duplicate this function which would be doing the pretty > much same thing, except the tar header. Well, as I said before, the solution to that problem is refactoring, not crummy interfaces. You're never going to persuade any committer who understands what that code actually does to commit it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi Dipesh,The rebased and updated patch is attached. Its rebased to (9f2c4ede).
On Wed, Apr 22, 2020 at 7:48 PM Asif Rehman <asifr.rehman@gmail.com> wrote:Hi Dipesh,The rebased and updated patch is attached. Its rebased to (9f2c4ede).Make is failing for v15 patch.gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -g -g -O0 -I. -I. -I../../../src/include -D_GNU_SOURCE -c -o basebackup.o basebackup.c -MMD -MP -MF .deps/basebackup.PoIn file included from basebackup.c:33:../../../src/include/replication/backup_manifest.h:37: error: redefinition of typedef ‘manifest_info’../../../src/include/replication/basebackup.h:35: note: previous declaration of ‘manifest_info’ was heremake[3]: *** [basebackup.o] Error 1make[3]: Leaving directory `/home/edb/WORKDB/PG2/postgresql/src/backend/replication'make[2]: *** [replication-recursive] Error 2
On Thu, Apr 23, 2020 at 11:43 AM Rajkumar Raghuwanshi <rajkumar.raghuwanshi@enterprisedb.com> wrote:On Wed, Apr 22, 2020 at 7:48 PM Asif Rehman <asifr.rehman@gmail.com> wrote:Hi Dipesh,The rebased and updated patch is attached. Its rebased to (9f2c4ede).Make is failing for v15 patch.gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -g -g -O0 -I. -I. -I../../../src/include -D_GNU_SOURCE -c -o basebackup.o basebackup.c -MMD -MP -MF .deps/basebackup.PoIn file included from basebackup.c:33:../../../src/include/replication/backup_manifest.h:37: error: redefinition of typedef ‘manifest_info’../../../src/include/replication/basebackup.h:35: note: previous declaration of ‘manifest_info’ was heremake[3]: *** [basebackup.o] Error 1make[3]: Leaving directory `/home/edb/WORKDB/PG2/postgresql/src/backend/replication'make[2]: *** [replication-recursive] Error 2Just compiled on clean source and its compiling fine. Can you see if you have a clean source tree?
Hi, Here is the parallel backup performance test results with and without the patch "parallel_backup_v15" on AWS cloud environment. Two "t2.xlarge" machines were used: one for Postgres server and the other one for pg_basebackup with the same machine configuration showing below. Machine configuration: Instance Type :t2.xlarge Volume type :io1 Memory (MiB) :16GB vCPU # :4 Architecture :x86_64 IOP :6000 Database Size (GB) :108 Performance test results: without patch: real 18m49.346s user 1m24.178s sys 7m2.966s 1 worker with patch: real 18m43.201s user 1m55.787s sys 7m24.724s 2 worker with patch: real 18m47.373s user 2m22.970s sys 11m23.891s 4 worker with patch: real 18m46.878s user 2m26.791s sys 13m14.716s As required, I didn't have the pgbench running in parallel like we did in the previous benchmark. The perf report files for both Postgres server and pg_basebackup sides are attached. The files are listed like below. i.e. without patch 1 worker, and with patch 1, 2, 4 workers. perf report on Postgres server side: perf.data-postgres-without-parallel_backup_v15.txt perf.data-postgres-with-parallel_backup_v15-j1.txt perf.data-postgres-with-parallel_backup_v15-j2.txt perf.data-postgres-with-parallel_backup_v15-j4.txt perf report on pg_basebackup side: perf.data-pg_basebackup-without-parallel_backup_v15.txt perf.data-pg_basebackup-with-parallel_backup_v15-j1.txt perf.data-pg_basebackup-with-parallel_backup_v15-j2.txt perf.data-pg_basebackup-with-parallel_backup_v15-j4.txt If any more information required please let me know. On 2020-04-21 7:12 a.m., Amit Kapila wrote: > On Tue, Apr 21, 2020 at 5:26 PM Ahsan Hadi <ahsan.hadi@gmail.com> wrote: >> On Tue, Apr 21, 2020 at 4:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote: >>> On Tue, Apr 21, 2020 at 5:18 PM Amit Kapila <amit.kapila16@gmail.com> wrote: >>>> On Tue, Apr 21, 2020 at 1:00 PM Asif Rehman <asifr.rehman@gmail.com> wrote: >>>>> I did some tests a while back, and here are the results. The tests were done to simulate >>>>> a live database environment using pgbench. >>>>> >>>>> machine configuration used for this test: >>>>> Instance Type: t2.xlarge >>>>> Volume Type : io1 >>>>> Memory (MiB) : 16384 >>>>> vCPU # : 4 >>>>> Architecture : X86_64 >>>>> IOP : 16000 >>>>> Database Size (GB) : 102 >>>>> >>>>> The setup consist of 3 machines. >>>>> - one for database instances >>>>> - one for pg_basebackup client and >>>>> - one for pgbench with some parallel workers, simulating SELECT loads. >>>>> >>>>> basebackup | 4 workers | 8 Workers | 16 workers >>>>> Backup Duration(Min): 69.25 | 20.44 | 19.86 | 20.15 >>>>> (pgbench running with 50 parallel client simulating SELECT load) >>>>> >>>>> Backup Duration(Min): 154.75 | 49.28 | 45.27 | 20.35 >>>>> (pgbench running with 100 parallel client simulating SELECT load) >>>>> >>>> Thanks for sharing the results, these show nice speedup! However, I >>>> think we should try to find what exactly causes this speed up. If you >>>> see the recent discussion on another thread related to this topic, >>>> Andres, pointed out that he doesn't think that we can gain much by >>>> having multiple connections[1]. It might be due to some internal >>>> limitations (like small buffers) [2] due to which we are seeing these >>>> speedups. It might help if you can share the perf reports of the >>>> server-side and pg_basebackup side. >>>> >>> Just to be clear, we need perf reports both with and without patch-set. >> >> These tests were done a while back, I think it would be good to run the benchmark again with the latest patches of parallelbackup and share the results and perf reports. >> > Sounds good. I think we should also try to run the test with 1 worker > as well. The reason it will be good to see the results with 1 worker > is that we can know if the technique to send file by file as is done > in this patch is better or worse than the current HEAD code. So, it > will be good to see the results of an unpatched code, 1 worker, 2 > workers, 4 workers, etc. > -- David Software Engineer Highgo Software Inc. (Canada) www.highgo.ca
Attachment
On Mon, Apr 27, 2020 at 10:23 PM David Zhang <david.zhang@highgo.ca> wrote: > > Hi, > > Here is the parallel backup performance test results with and without > the patch "parallel_backup_v15" on AWS cloud environment. Two > "t2.xlarge" machines were used: one for Postgres server and the other > one for pg_basebackup with the same machine configuration showing below. > > Machine configuration: > Instance Type :t2.xlarge > Volume type :io1 > Memory (MiB) :16GB > vCPU # :4 > Architecture :x86_64 > IOP :6000 > Database Size (GB) :108 > > Performance test results: > without patch: > real 18m49.346s > user 1m24.178s > sys 7m2.966s > > 1 worker with patch: > real 18m43.201s > user 1m55.787s > sys 7m24.724s > > 2 worker with patch: > real 18m47.373s > user 2m22.970s > sys 11m23.891s > > 4 worker with patch: > real 18m46.878s > user 2m26.791s > sys 13m14.716s > > As required, I didn't have the pgbench running in parallel like we did > in the previous benchmark. > So, there doesn't seem to be any significant improvement in this scenario. Now, it is not clear why there was a significant improvement in the previous run where pgbench was also running simultaneously. I am not sure but maybe it is because when a lot of other backends were running (performing read-only workload) the backend that was responsible for doing backup was getting frequently scheduled out and it slowed down the overall backup process. And when we start using multiple backends for backup one or other backup process is always running making the overall backup faster. One idea to find this out is to check how much time backup takes when we run it with and without pgbench workload on HEAD (aka unpatched code). Even if what I am saying is true or there is some other reason due to which we are seeing speedup in some cases (where there is a concurrent workload), it might not make the case for using multiple backends for backup but still, it is good to find that information as it might help in designing this feature better. > The perf report files for both Postgres server and pg_basebackup sides > are attached. > It is not clear which functions are taking more time or for which functions time is reduced as function symbols are not present in the reports. I think you can refer "https://wiki.postgresql.org/wiki/Profiling_with_perf" to see how to take profiles and additionally use -fno-omit-frame-pointer during configure (you can use CFLAGS="-fno-omit-frame-pointer during configure). -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
We run those test on two setup
1) Client and Server both on the same machine (Local backups)
2) Client and server on a different machine (remote backups)
Machine details:
1: Server (on which local backups performed and used as server for remote backups)
2: Client (Used as a client for remote backups)
Server:
RAM: 500 GB |
CPU details: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 8 NUMA node(s): 8 |
Filesystem: ext4 |
Client:
RAM: 490 GB |
CPU details: Architecture: ppc64le Byte Order: Little Endian CPU(s): 192 On-line CPU(s) list: 0-191 Thread(s) per core: 8 Core(s) per socket: 1 Socket(s): 24 |
Filesystem: ext4 |
Data size | without paralle backup patch | parallel backup with 1 worker | % performance increased/decreased compare to normal backup (without patch) | parallel backup with 2 worker | % performance increased/decreased compare to normal backup (without patch) | parallel backup with 4 worker | % performance increased/decreased compare to normal backup (without patch) | parallel backup with 8 worker | % performance increased/decreased compare to normal backup (without patch) |
10 GB (10 tables - each table around 1.05 GB) | real 0m27.016s user 0m3.378s sys 0m23.059s | real 0m30.314s user 0m3.575s sys 0m22.946s | 12% performance decreased | real 0m20.400s user 0m3.622s sys 0m29.670s | 27% performace increased | real 0m15.331s user 0m3.706s sys 0m39.189s | 43% performance increased | real 0m15.094s user 0m3.915s sys 1m23.350s | 44% performace increased. |
50GB (50 tables - each table around 1.05 GB) | real 2m11.049s user 0m16.464s sys 2m1.757s | real 2m26.621s user 0m18.497s sys 2m4.792s | 21% performance decreased | real 1m9.581s user 0m18.298s sys 2m12.030s | 46% performance increased | real 0m53.894s user 0m18.588s sys 2m47.390s | 58% performance increased. | real 0m55.373s user 0m18.423s sys 5m57.470s | 57% performance increased. |
100GB (100 tables - each table around 1.05 GB) | real 4m4.776s user 0m33.699s sys 3m27.777s | real 4m20.862s user 0m35.753s sys 3m28.262s | 6% performance decreased | real 2m37.411s user 0m36.440s sys 4m16.424s" | 35% performance increased | real 1m49.503s user 0m37.200s sys 5m58.077s | 55% performace increased | real 1m36.762s user 0m36.987s sys 9m36.906s | 60% performace increased. |
200GB (200 tables - each table around 1.05 GB) | real 10m34.998s user 1m8.471s sys 7m21.520s | real 11m30.899s user 1m12.933s sys 8m14.496s | 8% performance decreased | real 6m8.481s user 1m13.771s sys 9m31.216s | 41% performance increased | real 4m2.403s user 1m18.331s sys 12m29.661s | 61% performance increased | real 4m3.768s user 1m24.547s sys 15m21.421s | 61% performance increased |
Data size | without paralle backup patch | parallel backup with 1 worker | % performance increased/decreased compare to normal backup (without patch) | parallel backup with 2 worker | % performance increased/decreased compare to normal backup (without patch) | parallel backup with 4 worker | % performance increased/decreased compare to normal backup (without patch) | parallel backup with 8 worker | % performance increased/decreased compare to normal backup (without patch) |
10 GB (10 tables - each table around 1.05 GB) | real 1m36.829s user 0m2.124s sys 0m14.004s | real 1m37.598s user 0m3.272s sys 0m11.110s | 0.8% performance decreased | real 1m36.753s user 0m2.627s sys 0m15.312s | 0.08% performance increased. | real 1m37.212s user 0m3.835s sys 0m13.221s | 0.3% performance decreased. | real 1m36.977s user 0m4.475s sys 0m17.937s | 0.1% perfomance decreased. |
50GB (50 tables - each table around 1.05 GB) | real 7m54.211s user 0m10.826s sys 1m10.435s | real 7m55.603s user 0m16.535s sys 1m8.147s | 0.2% performance decreased | real 7m53.499s user 0m18.131s sys 1m8.822s | 0.1% performance increased. | real 7m54.687s user 0m15.818s sys 1m30.991s | 0.1% performance decreased | real 7m54.658s user 0m20.783s sys 1m34.460s | 0.1% performance decreased |
100GB (100 tables - each table around 1.05 GB) | real 15m45.776s user 0m21.802s sys 2m59.006s | real 15m46.315s user 0m32.499s sys 2m47.245s | 0.05% performance decreased | real 15m46.065s user 0m28.877s sys 2m21.181s | 0.03% performacne drcreased | real 15m47.793s user 0m30.932s sys 2m36.708s | 0.2% performance decresed | real 15m47.129s user 0m35.151s sys 3m23.572s | 0.14% performance decreased. |
200GB (200 tables - each table around 1.05 GB) | real 32m55.720s user 0m50.602s sys 5m38.875s | real 31m30.602s user 0m45.377s sys 4m57.405s | 4% performance increased | real 31m30.214s user 0m55.023s sys 5m8.689s | 4% performance increased | real 31m31.187s user 1m13.390s sys 5m40.861s | 4% performance increased | real 31m31.729s user 1m4.955s sys 6m35.774s | 4% performance decreased |
Client & Server on the same machine, the result shows around 50% improvement in parallel run with worker 4 and 8. We don’t see the huge performance improvement with more workers been added.
Whereas, when the client and server on a different machine, we don’t see any major benefit in performance. This testing result matches the testing results posted by David Zhang up thread.
We ran the test for 100GB backup with parallel worker 4 to see the CPU usage and other information. What we noticed is that server is consuming the CPU almost 100% whole the time and pg_stat_activity shows that server is busy with ClientWrite most of the time.
Attaching captured output for
1) Top command output on the server after every 5 second
2) pg_stat_activity output after every 5 second
3) Top command output on the client after every 5 second
Do let me know if anyone has further questions/inputs for the benchmarking.
On Mon, Apr 27, 2020 at 10:23 PM David Zhang <david.zhang@highgo.ca> wrote:
>
> Hi,
>
> Here is the parallel backup performance test results with and without
> the patch "parallel_backup_v15" on AWS cloud environment. Two
> "t2.xlarge" machines were used: one for Postgres server and the other
> one for pg_basebackup with the same machine configuration showing below.
>
> Machine configuration:
> Instance Type :t2.xlarge
> Volume type :io1
> Memory (MiB) :16GB
> vCPU # :4
> Architecture :x86_64
> IOP :6000
> Database Size (GB) :108
>
> Performance test results:
> without patch:
> real 18m49.346s
> user 1m24.178s
> sys 7m2.966s
>
> 1 worker with patch:
> real 18m43.201s
> user 1m55.787s
> sys 7m24.724s
>
> 2 worker with patch:
> real 18m47.373s
> user 2m22.970s
> sys 11m23.891s
>
> 4 worker with patch:
> real 18m46.878s
> user 2m26.791s
> sys 13m14.716s
>
> As required, I didn't have the pgbench running in parallel like we did
> in the previous benchmark.
>
So, there doesn't seem to be any significant improvement in this
scenario. Now, it is not clear why there was a significant
improvement in the previous run where pgbench was also running
simultaneously. I am not sure but maybe it is because when a lot of
other backends were running (performing read-only workload) the
backend that was responsible for doing backup was getting frequently
scheduled out and it slowed down the overall backup process. And when
we start using multiple backends for backup one or other backup
process is always running making the overall backup faster. One idea
to find this out is to check how much time backup takes when we run it
with and without pgbench workload on HEAD (aka unpatched code). Even
if what I am saying is true or there is some other reason due to which
we are seeing speedup in some cases (where there is a concurrent
workload), it might not make the case for using multiple backends for
backup but still, it is good to find that information as it might help
in designing this feature better.
> The perf report files for both Postgres server and pg_basebackup sides
> are attached.
>
It is not clear which functions are taking more time or for which
functions time is reduced as function symbols are not present in the
reports. I think you can refer
"https://wiki.postgresql.org/wiki/Profiling_with_perf" to see how to
take profiles and additionally use -fno-omit-frame-pointer during
configure (you can use CFLAGS="-fno-omit-frame-pointer during
configure).
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachment
Hi,
Thanks a lot for sharing the test results. Here is the our test results using perf on three ASW t2.xlarge with below configuration.
Machine configuration:
Instance Type :t2.xlarge
Volume type :io1
Memory (MiB) :16GB
vCPU # :4
Architecture :x86_64
IOP :6000
Database Size (GB) :45 (Server)
case 1: postgres server: without patch and without load
* Disk I/O:
# Samples: 342K of event 'block:block_rq_insert'
# Event count (approx.): 342834
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. .....................
#
97.65% postgres [kernel.kallsyms] [k] __elv_add_request
2.27% kworker/u30:0 [kernel.kallsyms] [k] __elv_add_request
* CPU:
# Samples: 6M of event 'cpu-clock'
# Event count (approx.): 1559444750000
#
# Overhead Command Shared Object Symbol
# ........ ............... .................... .............................................
#
64.73% swapper [kernel.kallsyms] [k] native_safe_halt
10.89% postgres [vdso] [.] __vdso_gettimeofday
5.64% postgres [kernel.kallsyms] [k] do_syscall_64
5.43% postgres libpthread-2.26.so [.] __libc_recv
1.72% postgres [kernel.kallsyms] [k] pvclock_clocksource_read
* Network:
# Samples: 2M of event 'skb:consume_skb'
# Event count (approx.): 2739785
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ...........................
#
91.58% swapper [kernel.kallsyms] [k] consume_skb
7.09% postgres [kernel.kallsyms] [k] consume_skb
0.61% kswapd0 [kernel.kallsyms] [k] consume_skb
0.44% ksoftirqd/3 [kernel.kallsyms] [k] consume_skb
case 1: pg_basebackup client: without patch and without load
* Disk I/O:
# Samples: 371K of event 'block:block_rq_insert'
# Event count (approx.): 371362
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. .....................
#
96.78% kworker/u30:0 [kernel.kallsyms] [k] __elv_add_request
2.82% pg_basebackup [kernel.kallsyms] [k] __elv_add_request
0.29% kworker/u30:1 [kernel.kallsyms] [k] __elv_add_request
0.09% xfsaild/xvda1 [kernel.kallsyms] [k] __elv_add_request
* CPU:
# Samples: 3M of event 'cpu-clock'
# Event count (approx.): 903527000000
#
# Overhead Command Shared Object Symbol
# ........ ............... .................. .............................................
#
87.99% swapper [kernel.kallsyms] [k] native_safe_halt
3.14% swapper [kernel.kallsyms] [k] __lock_text_start
0.48% swapper [kernel.kallsyms] [k] __softirqentry_text_start
0.37% pg_basebackup [kernel.kallsyms] [k] copy_user_enhanced_fast_string
0.35% swapper [kernel.kallsyms] [k] do_csum
* Network:
# Samples: 12M of event 'skb:consume_skb'
# Event count (approx.): 12260713
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ...........................
#
95.12% swapper [kernel.kallsyms] [k] consume_skb
3.23% pg_basebackup [kernel.kallsyms] [k] consume_skb
0.83% ksoftirqd/1 [kernel.kallsyms] [k] consume_skb
0.45% kswapd0 [kernel.kallsyms] [k] consume_skb
case 2: postgres server: with patch and with load, 4 backup workers on client side
* Disk I/O:
# Samples: 3M of event 'block:block_rq_insert'
# Event count (approx.): 3634542
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. .....................
#
98.88% postgres [kernel.kallsyms] [k] __elv_add_request
0.66% perf [kernel.kallsyms] [k] __elv_add_request
0.42% kworker/u30:1 [kernel.kallsyms] [k] __elv_add_request
0.01% sshd [kernel.kallsyms] [k] __elv_add_request
* CPU:
# Samples: 9M of event 'cpu-clock'
# Event count (approx.): 2299129250000
#
# Overhead Command Shared Object Symbol
# ........ ............... ..................... .............................................
#
52.73% swapper [kernel.kallsyms] [k] native_safe_halt
8.31% postgres [vdso] [.] __vdso_gettimeofday
4.46% postgres [kernel.kallsyms] [k] do_syscall_64
4.16% postgres libpthread-2.26.so [.] __libc_recv
1.58% postgres [kernel.kallsyms] [k] __lock_text_start
1.52% postgres [kernel.kallsyms] [k] pvclock_clocksource_read
0.81% postgres [kernel.kallsyms] [k] copy_user_enhanced_fast_string
* Network:
# Samples: 6M of event 'skb:consume_skb'
# Event count (approx.): 6048795
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ...........................
#
85.81% postgres [kernel.kallsyms] [k] consume_skb
12.03% swapper [kernel.kallsyms] [k] consume_skb
0.97% postgres [kernel.kallsyms] [k] __consume_stateless_skb
0.85% ksoftirqd/3 [kernel.kallsyms] [k] consume_skb
0.24% perf [kernel.kallsyms] [k] consume_skb
case 2: pg_basebackup 4 workers: with patch and with load
* Disk I/O:
# Samples: 372K of event 'block:block_rq_insert'
# Event count (approx.): 372360
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. .....................
#
97.26% kworker/u30:0 [kernel.kallsyms] [k] __elv_add_request
1.45% pg_basebackup [kernel.kallsyms] [k] __elv_add_request
0.95% kworker/u30:1 [kernel.kallsyms] [k] __elv_add_request
0.14% xfsaild/xvda1 [kernel.kallsyms] [k] __elv_add_request
* CPU:
# Samples: 4M of event 'cpu-clock'
# Event count (approx.): 1234071000000
#
# Overhead Command Shared Object Symbol
# ........ ............... ........................ .................................................
#
89.25% swapper [kernel.kallsyms] [k] native_safe_halt
0.93% pg_basebackup [kernel.kallsyms] [k] __lock_text_start
0.91% swapper [kernel.kallsyms] [k] __lock_text_start
0.69% pg_basebackup [kernel.kallsyms] [k] copy_user_enhanced_fast_string
0.45% swapper [kernel.kallsyms] [k] do_csum
* Network:
# Samples: 6M of event 'skb:consume_skb'
# Event count (approx.): 6449013
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ...........................
#
90.28% pg_basebackup [kernel.kallsyms] [k] consume_skb
9.09% swapper [kernel.kallsyms] [k] consume_skb
0.29% ksoftirqd/1 [kernel.kallsyms] [k] consume_skb
0.21% sshd [kernel.kallsyms] [k] consume_skb
The detailed perf report is attached, with different scenarios, i.e. without patch (with and without load for server and client) , with patch (with and without load for 1, 2, 4, 8 workers for both server and client). The file name should self explain the cases.
Let me know if more information required.
Best regards,
David
Hi,We at EnterpriseDB did some performance testing around this parallel backup to check how this is beneficial and below are the results. In this testing, we run the backup -1) Without Asif’s patch2) With Asif’s patch and combination of workers 1,2,4,8.We run those test on two setup
1) Client and Server both on the same machine (Local backups)
2) Client and server on a different machine (remote backups)
Machine details:
1: Server (on which local backups performed and used as server for remote backups)
2: Client (Used as a client for remote backups)
Server:
RAM: 500 GB CPU details:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 8
NUMA node(s): 8Filesystem: ext4
Client:
RAM: 490 GB CPU details:
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 24Filesystem: ext4 Below are the results for the local test:
Data size without paralle backup
patchparallel backup with
1 worker% performance
increased/decreased
compare to normal
backup
(without patch)parallel backup with
2 worker% performance
increased/decreased
compare to normal
backup
(without patch)parallel backup with
4 worker% performance
increased/decreased
compare to normal
backup
(without patch)parallel backup with
8 worker% performance
increased/decreased
compare to normal
backup
(without patch)10 GB
(10 tables - each table around 1.05 GB)real 0m27.016s
user 0m3.378s
sys 0m23.059sreal 0m30.314s
user 0m3.575s
sys 0m22.946s12% performance
decreasedreal 0m20.400s
user 0m3.622s
sys 0m29.670s27% performace
increasedreal 0m15.331s
user 0m3.706s
sys 0m39.189s43% performance
increasedreal 0m15.094s
user 0m3.915s
sys 1m23.350s44% performace
increased.50GB
(50 tables - each table around 1.05 GB)real 2m11.049s
user 0m16.464s
sys 2m1.757sreal 2m26.621s
user 0m18.497s
sys 2m4.792s21% performance
decreasedreal 1m9.581s
user 0m18.298s
sys 2m12.030s46% performance
increasedreal 0m53.894s
user 0m18.588s
sys 2m47.390s58% performance
increased.real 0m55.373s
user 0m18.423s
sys 5m57.470s57% performance
increased.100GB
(100 tables - each table around 1.05 GB)real 4m4.776s
user 0m33.699s
sys 3m27.777sreal 4m20.862s
user 0m35.753s
sys 3m28.262s6% performance
decreasedreal 2m37.411s
user 0m36.440s
sys 4m16.424s"35% performance
increasedreal 1m49.503s
user 0m37.200s
sys 5m58.077s55% performace
increasedreal 1m36.762s
user 0m36.987s
sys 9m36.906s60% performace
increased.200GB
(200 tables - each table around 1.05 GB)real 10m34.998s
user 1m8.471s
sys 7m21.520sreal 11m30.899s
user 1m12.933s
sys 8m14.496s8% performance
decreasedreal 6m8.481s
user 1m13.771s
sys 9m31.216s41% performance
increasedreal 4m2.403s
user 1m18.331s
sys 12m29.661s61% performance
increasedreal 4m3.768s
user 1m24.547s
sys 15m21.421s61% performance
increasedResults for the remote test:
Data size without paralle backup
patchparallel backup with
1 worker% performance
increased/decreased
compare to normal
backup
(without patch)parallel backup with
2 worker% performance
increased/decreased
compare to normal
backup
(without patch)parallel backup with
4 worker% performance
increased/decreased
compare to normal
backup
(without patch)parallel backup with
8 worker% performance
increased/decreased
compare to normal
backup
(without patch)10 GB
(10 tables - each table around 1.05 GB)real 1m36.829s
user 0m2.124s
sys 0m14.004sreal 1m37.598s
user 0m3.272s
sys 0m11.110s0.8% performance
decreasedreal 1m36.753s
user 0m2.627s
sys 0m15.312s0.08% performance
increased.real 1m37.212s
user 0m3.835s
sys 0m13.221s0.3% performance
decreased.real 1m36.977s
user 0m4.475s
sys 0m17.937s0.1% perfomance
decreased.50GB
(50 tables - each table around 1.05 GB)real 7m54.211s
user 0m10.826s
sys 1m10.435sreal 7m55.603s
user 0m16.535s
sys 1m8.147s0.2% performance
decreasedreal 7m53.499s
user 0m18.131s
sys 1m8.822s0.1% performance
increased.real 7m54.687s
user 0m15.818s
sys 1m30.991s0.1% performance
decreasedreal 7m54.658s
user 0m20.783s
sys 1m34.460s0.1% performance
decreased100GB
(100 tables - each table around 1.05 GB)real 15m45.776s
user 0m21.802s
sys 2m59.006sreal 15m46.315s
user 0m32.499s
sys 2m47.245s0.05% performance
decreasedreal 15m46.065s
user 0m28.877s
sys 2m21.181s0.03% performacne
drcreasedreal 15m47.793s
user 0m30.932s
sys 2m36.708s0.2% performance
decresedreal 15m47.129s
user 0m35.151s
sys 3m23.572s0.14% performance
decreased.200GB
(200 tables - each table around 1.05 GB)real 32m55.720s
user 0m50.602s
sys 5m38.875sreal 31m30.602s
user 0m45.377s
sys 4m57.405s4% performance
increasedreal 31m30.214s
user 0m55.023s
sys 5m8.689s4% performance
increasedreal 31m31.187s
user 1m13.390s
sys 5m40.861s4% performance
increasedreal 31m31.729s
user 1m4.955s
sys 6m35.774s4% performance
decreasedClient & Server on the same machine, the result shows around 50% improvement in parallel run with worker 4 and 8. We don’t see the huge performance improvement with more workers been added.
Whereas, when the client and server on a different machine, we don’t see any major benefit in performance. This testing result matches the testing results posted by David Zhang up thread.
We ran the test for 100GB backup with parallel worker 4 to see the CPU usage and other information. What we noticed is that server is consuming the CPU almost 100% whole the time and pg_stat_activity shows that server is busy with ClientWrite most of the time.
Attaching captured output for
1) Top command output on the server after every 5 second
2) pg_stat_activity output after every 5 second
3) Top command output on the client after every 5 second
Do let me know if anyone has further questions/inputs for the benchmarking.
Thanks to Rushabh Lathia for helping me with this testing.On Tue, Apr 28, 2020 at 8:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:On Mon, Apr 27, 2020 at 10:23 PM David Zhang <david.zhang@highgo.ca> wrote:
>
> Hi,
>
> Here is the parallel backup performance test results with and without
> the patch "parallel_backup_v15" on AWS cloud environment. Two
> "t2.xlarge" machines were used: one for Postgres server and the other
> one for pg_basebackup with the same machine configuration showing below.
>
> Machine configuration:
> Instance Type :t2.xlarge
> Volume type :io1
> Memory (MiB) :16GB
> vCPU # :4
> Architecture :x86_64
> IOP :6000
> Database Size (GB) :108
>
> Performance test results:
> without patch:
> real 18m49.346s
> user 1m24.178s
> sys 7m2.966s
>
> 1 worker with patch:
> real 18m43.201s
> user 1m55.787s
> sys 7m24.724s
>
> 2 worker with patch:
> real 18m47.373s
> user 2m22.970s
> sys 11m23.891s
>
> 4 worker with patch:
> real 18m46.878s
> user 2m26.791s
> sys 13m14.716s
>
> As required, I didn't have the pgbench running in parallel like we did
> in the previous benchmark.
>
So, there doesn't seem to be any significant improvement in this
scenario. Now, it is not clear why there was a significant
improvement in the previous run where pgbench was also running
simultaneously. I am not sure but maybe it is because when a lot of
other backends were running (performing read-only workload) the
backend that was responsible for doing backup was getting frequently
scheduled out and it slowed down the overall backup process. And when
we start using multiple backends for backup one or other backup
process is always running making the overall backup faster. One idea
to find this out is to check how much time backup takes when we run it
with and without pgbench workload on HEAD (aka unpatched code). Even
if what I am saying is true or there is some other reason due to which
we are seeing speedup in some cases (where there is a concurrent
workload), it might not make the case for using multiple backends for
backup but still, it is good to find that information as it might help
in designing this feature better.
> The perf report files for both Postgres server and pg_basebackup sides
> are attached.
>
It is not clear which functions are taking more time or for which
functions time is reduced as function symbols are not present in the
reports. I think you can refer
"https://wiki.postgresql.org/wiki/Profiling_with_perf" to see how to
take profiles and additionally use -fno-omit-frame-pointer during
configure (you can use CFLAGS="-fno-omit-frame-pointer during
configure).
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com----Thanks & Regards,Suraj kharage,EnterpriseDB Corporation,The Postgres Database Company.
Attachment
Hi,
Thanks a lot for sharing the test results. Here is the our test results using perf on three ASW t2.xlarge with below configuration.
Machine configuration:
Instance Type :t2.xlarge
Volume type :io1
Memory (MiB) :16GB
vCPU # :4
Architecture :x86_64
IOP :6000
Database Size (GB) :45 (Server)case 1: postgres server: without patch and without load
* Disk I/O:
# Samples: 342K of event 'block:block_rq_insert'
# Event count (approx.): 342834
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. .....................
#
97.65% postgres [kernel.kallsyms] [k] __elv_add_request
2.27% kworker/u30:0 [kernel.kallsyms] [k] __elv_add_request
* CPU:
# Samples: 6M of event 'cpu-clock'
# Event count (approx.): 1559444750000
#
# Overhead Command Shared Object Symbol
# ........ ............... .................... .............................................
#
64.73% swapper [kernel.kallsyms] [k] native_safe_halt
10.89% postgres [vdso] [.] __vdso_gettimeofday
5.64% postgres [kernel.kallsyms] [k] do_syscall_64
5.43% postgres libpthread-2.26.so [.] __libc_recv
1.72% postgres [kernel.kallsyms] [k] pvclock_clocksource_read* Network:
# Samples: 2M of event 'skb:consume_skb'
# Event count (approx.): 2739785
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ...........................
#
91.58% swapper [kernel.kallsyms] [k] consume_skb
7.09% postgres [kernel.kallsyms] [k] consume_skb
0.61% kswapd0 [kernel.kallsyms] [k] consume_skb
0.44% ksoftirqd/3 [kernel.kallsyms] [k] consume_skb
case 1: pg_basebackup client: without patch and without load
* Disk I/O:
# Samples: 371K of event 'block:block_rq_insert'
# Event count (approx.): 371362
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. .....................
#
96.78% kworker/u30:0 [kernel.kallsyms] [k] __elv_add_request
2.82% pg_basebackup [kernel.kallsyms] [k] __elv_add_request
0.29% kworker/u30:1 [kernel.kallsyms] [k] __elv_add_request
0.09% xfsaild/xvda1 [kernel.kallsyms] [k] __elv_add_request
* CPU:
# Samples: 3M of event 'cpu-clock'
# Event count (approx.): 903527000000
#
# Overhead Command Shared Object Symbol
# ........ ............... .................. .............................................
#
87.99% swapper [kernel.kallsyms] [k] native_safe_halt
3.14% swapper [kernel.kallsyms] [k] __lock_text_start
0.48% swapper [kernel.kallsyms] [k] __softirqentry_text_start
0.37% pg_basebackup [kernel.kallsyms] [k] copy_user_enhanced_fast_string
0.35% swapper [kernel.kallsyms] [k] do_csum* Network:
# Samples: 12M of event 'skb:consume_skb'
# Event count (approx.): 12260713
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ...........................
#
95.12% swapper [kernel.kallsyms] [k] consume_skb
3.23% pg_basebackup [kernel.kallsyms] [k] consume_skb
0.83% ksoftirqd/1 [kernel.kallsyms] [k] consume_skb
0.45% kswapd0 [kernel.kallsyms] [k] consume_skb
case 2: postgres server: with patch and with load, 4 backup workers on client side
* Disk I/O:
# Samples: 3M of event 'block:block_rq_insert'
# Event count (approx.): 3634542
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. .....................
#
98.88% postgres [kernel.kallsyms] [k] __elv_add_request
0.66% perf [kernel.kallsyms] [k] __elv_add_request
0.42% kworker/u30:1 [kernel.kallsyms] [k] __elv_add_request
0.01% sshd [kernel.kallsyms] [k] __elv_add_request* CPU:
# Samples: 9M of event 'cpu-clock'
# Event count (approx.): 2299129250000
#
# Overhead Command Shared Object Symbol
# ........ ............... ..................... .............................................
#
52.73% swapper [kernel.kallsyms] [k] native_safe_halt
8.31% postgres [vdso] [.] __vdso_gettimeofday
4.46% postgres [kernel.kallsyms] [k] do_syscall_64
4.16% postgres libpthread-2.26.so [.] __libc_recv
1.58% postgres [kernel.kallsyms] [k] __lock_text_start
1.52% postgres [kernel.kallsyms] [k] pvclock_clocksource_read
0.81% postgres [kernel.kallsyms] [k] copy_user_enhanced_fast_string
* Network:
# Samples: 6M of event 'skb:consume_skb'
# Event count (approx.): 6048795
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ...........................
#
85.81% postgres [kernel.kallsyms] [k] consume_skb
12.03% swapper [kernel.kallsyms] [k] consume_skb
0.97% postgres [kernel.kallsyms] [k] __consume_stateless_skb
0.85% ksoftirqd/3 [kernel.kallsyms] [k] consume_skb
0.24% perf [kernel.kallsyms] [k] consume_skb
case 2: pg_basebackup 4 workers: with patch and with load
* Disk I/O:
# Samples: 372K of event 'block:block_rq_insert'
# Event count (approx.): 372360
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. .....................
#
97.26% kworker/u30:0 [kernel.kallsyms] [k] __elv_add_request
1.45% pg_basebackup [kernel.kallsyms] [k] __elv_add_request
0.95% kworker/u30:1 [kernel.kallsyms] [k] __elv_add_request
0.14% xfsaild/xvda1 [kernel.kallsyms] [k] __elv_add_request
* CPU:
# Samples: 4M of event 'cpu-clock'
# Event count (approx.): 1234071000000
#
# Overhead Command Shared Object Symbol
# ........ ............... ........................ .................................................
#
89.25% swapper [kernel.kallsyms] [k] native_safe_halt
0.93% pg_basebackup [kernel.kallsyms] [k] __lock_text_start
0.91% swapper [kernel.kallsyms] [k] __lock_text_start
0.69% pg_basebackup [kernel.kallsyms] [k] copy_user_enhanced_fast_string
0.45% swapper [kernel.kallsyms] [k] do_csum
* Network:
# Samples: 6M of event 'skb:consume_skb'
# Event count (approx.): 6449013
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ...........................
#
90.28% pg_basebackup [kernel.kallsyms] [k] consume_skb
9.09% swapper [kernel.kallsyms] [k] consume_skb
0.29% ksoftirqd/1 [kernel.kallsyms] [k] consume_skb
0.21% sshd [kernel.kallsyms] [k] consume_skb
The detailed perf report is attached, with different scenarios, i.e. without patch (with and without load for server and client) , with patch (with and without load for 1, 2, 4, 8 workers for both server and client). The file name should self explain the cases.
Let me know if more information required.
Best regards,
David
On 2020-04-29 5:41 a.m., Suraj Kharage wrote:Hi,We at EnterpriseDB did some performance testing around this parallel backup to check how this is beneficial and below are the results. In this testing, we run the backup -1) Without Asif’s patch2) With Asif’s patch and combination of workers 1,2,4,8.We run those test on two setup
1) Client and Server both on the same machine (Local backups)
2) Client and server on a different machine (remote backups)
Machine details:
1: Server (on which local backups performed and used as server for remote backups)
2: Client (Used as a client for remote backups)
Server:
RAM: 500 GB CPU details:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 8
NUMA node(s): 8Filesystem: ext4
Client:
RAM: 490 GB CPU details:
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 24Filesystem: ext4 Below are the results for the local test:
Data size without paralle backup
patchparallel backup with
1 worker% performance
increased/decreased
compare to normal
backup
(without patch)parallel backup with
2 worker% performance
increased/decreased
compare to normal
backup
(without patch)parallel backup with
4 worker% performance
increased/decreased
compare to normal
backup
(without patch)parallel backup with
8 worker% performance
increased/decreased
compare to normal
backup
(without patch)10 GB
(10 tables - each table around 1.05 GB)real 0m27.016s
user 0m3.378s
sys 0m23.059sreal 0m30.314s
user 0m3.575s
sys 0m22.946s12% performance
decreasedreal 0m20.400s
user 0m3.622s
sys 0m29.670s27% performace
increasedreal 0m15.331s
user 0m3.706s
sys 0m39.189s43% performance
increasedreal 0m15.094s
user 0m3.915s
sys 1m23.350s44% performace
increased.50GB
(50 tables - each table around 1.05 GB)real 2m11.049s
user 0m16.464s
sys 2m1.757sreal 2m26.621s
user 0m18.497s
sys 2m4.792s21% performance
decreasedreal 1m9.581s
user 0m18.298s
sys 2m12.030s46% performance
increasedreal 0m53.894s
user 0m18.588s
sys 2m47.390s58% performance
increased.real 0m55.373s
user 0m18.423s
sys 5m57.470s57% performance
increased.100GB
(100 tables - each table around 1.05 GB)real 4m4.776s
user 0m33.699s
sys 3m27.777sreal 4m20.862s
user 0m35.753s
sys 3m28.262s6% performance
decreasedreal 2m37.411s
user 0m36.440s
sys 4m16.424s"35% performance
increasedreal 1m49.503s
user 0m37.200s
sys 5m58.077s55% performace
increasedreal 1m36.762s
user 0m36.987s
sys 9m36.906s60% performace
increased.200GB
(200 tables - each table around 1.05 GB)real 10m34.998s
user 1m8.471s
sys 7m21.520sreal 11m30.899s
user 1m12.933s
sys 8m14.496s8% performance
decreasedreal 6m8.481s
user 1m13.771s
sys 9m31.216s41% performance
increasedreal 4m2.403s
user 1m18.331s
sys 12m29.661s61% performance
increasedreal 4m3.768s
user 1m24.547s
sys 15m21.421s61% performance
increasedResults for the remote test:
Data size without paralle backup
patchparallel backup with
1 worker% performance
increased/decreased
compare to normal
backup
(without patch)parallel backup with
2 worker% performance
increased/decreased
compare to normal
backup
(without patch)parallel backup with
4 worker% performance
increased/decreased
compare to normal
backup
(without patch)parallel backup with
8 worker% performance
increased/decreased
compare to normal
backup
(without patch)10 GB
(10 tables - each table around 1.05 GB)real 1m36.829s
user 0m2.124s
sys 0m14.004sreal 1m37.598s
user 0m3.272s
sys 0m11.110s0.8% performance
decreasedreal 1m36.753s
user 0m2.627s
sys 0m15.312s0.08% performance
increased.real 1m37.212s
user 0m3.835s
sys 0m13.221s0.3% performance
decreased.real 1m36.977s
user 0m4.475s
sys 0m17.937s0.1% perfomance
decreased.50GB
(50 tables - each table around 1.05 GB)real 7m54.211s
user 0m10.826s
sys 1m10.435sreal 7m55.603s
user 0m16.535s
sys 1m8.147s0.2% performance
decreasedreal 7m53.499s
user 0m18.131s
sys 1m8.822s0.1% performance
increased.real 7m54.687s
user 0m15.818s
sys 1m30.991s0.1% performance
decreasedreal 7m54.658s
user 0m20.783s
sys 1m34.460s0.1% performance
decreased100GB
(100 tables - each table around 1.05 GB)real 15m45.776s
user 0m21.802s
sys 2m59.006sreal 15m46.315s
user 0m32.499s
sys 2m47.245s0.05% performance
decreasedreal 15m46.065s
user 0m28.877s
sys 2m21.181s0.03% performacne
drcreasedreal 15m47.793s
user 0m30.932s
sys 2m36.708s0.2% performance
decresedreal 15m47.129s
user 0m35.151s
sys 3m23.572s0.14% performance
decreased.200GB
(200 tables - each table around 1.05 GB)real 32m55.720s
user 0m50.602s
sys 5m38.875sreal 31m30.602s
user 0m45.377s
sys 4m57.405s4% performance
increasedreal 31m30.214s
user 0m55.023s
sys 5m8.689s4% performance
increasedreal 31m31.187s
user 1m13.390s
sys 5m40.861s4% performance
increasedreal 31m31.729s
user 1m4.955s
sys 6m35.774s4% performance
decreasedClient & Server on the same machine, the result shows around 50% improvement in parallel run with worker 4 and 8. We don’t see the huge performance improvement with more workers been added.
Whereas, when the client and server on a different machine, we don’t see any major benefit in performance. This testing result matches the testing results posted by David Zhang up thread.
We ran the test for 100GB backup with parallel worker 4 to see the CPU usage and other information. What we noticed is that server is consuming the CPU almost 100% whole the time and pg_stat_activity shows that server is busy with ClientWrite most of the time.
Attaching captured output for
1) Top command output on the server after every 5 second
2) pg_stat_activity output after every 5 second
3) Top command output on the client after every 5 second
Do let me know if anyone has further questions/inputs for the benchmarking.
Thanks to Rushabh Lathia for helping me with this testing.On Tue, Apr 28, 2020 at 8:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:On Mon, Apr 27, 2020 at 10:23 PM David Zhang <david.zhang@highgo.ca> wrote:
>
> Hi,
>
> Here is the parallel backup performance test results with and without
> the patch "parallel_backup_v15" on AWS cloud environment. Two
> "t2.xlarge" machines were used: one for Postgres server and the other
> one for pg_basebackup with the same machine configuration showing below.
>
> Machine configuration:
> Instance Type :t2.xlarge
> Volume type :io1
> Memory (MiB) :16GB
> vCPU # :4
> Architecture :x86_64
> IOP :6000
> Database Size (GB) :108
>
> Performance test results:
> without patch:
> real 18m49.346s
> user 1m24.178s
> sys 7m2.966s
>
> 1 worker with patch:
> real 18m43.201s
> user 1m55.787s
> sys 7m24.724s
>
> 2 worker with patch:
> real 18m47.373s
> user 2m22.970s
> sys 11m23.891s
>
> 4 worker with patch:
> real 18m46.878s
> user 2m26.791s
> sys 13m14.716s
>
> As required, I didn't have the pgbench running in parallel like we did
> in the previous benchmark.
>
So, there doesn't seem to be any significant improvement in this
scenario. Now, it is not clear why there was a significant
improvement in the previous run where pgbench was also running
simultaneously. I am not sure but maybe it is because when a lot of
other backends were running (performing read-only workload) the
backend that was responsible for doing backup was getting frequently
scheduled out and it slowed down the overall backup process. And when
we start using multiple backends for backup one or other backup
process is always running making the overall backup faster. One idea
to find this out is to check how much time backup takes when we run it
with and without pgbench workload on HEAD (aka unpatched code). Even
if what I am saying is true or there is some other reason due to which
we are seeing speedup in some cases (where there is a concurrent
workload), it might not make the case for using multiple backends for
backup but still, it is good to find that information as it might help
in designing this feature better.
> The perf report files for both Postgres server and pg_basebackup sides
> are attached.
>
It is not clear which functions are taking more time or for which
functions time is reduced as function symbols are not present in the
reports. I think you can refer
"https://wiki.postgresql.org/wiki/Profiling_with_perf" to see how to
take profiles and additionally use -fno-omit-frame-pointer during
configure (you can use CFLAGS="-fno-omit-frame-pointer during
configure).
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com----Thanks & Regards,Suraj kharage,EnterpriseDB Corporation,The Postgres Database Company.
On Wed, Apr 29, 2020 at 6:11 PM Suraj Kharage <suraj.kharage@enterprisedb.com> wrote: > > Hi, > > We at EnterpriseDB did some performance testing around this parallel backup to check how this is beneficial and below arethe results. In this testing, we run the backup - > 1) Without Asif’s patch > 2) With Asif’s patch and combination of workers 1,2,4,8. > > We run those test on two setup > > 1) Client and Server both on the same machine (Local backups) > > 2) Client and server on a different machine (remote backups) > > > Machine details: > > 1: Server (on which local backups performed and used as server for remote backups) > > 2: Client (Used as a client for remote backups) > > ... > > > Client & Server on the same machine, the result shows around 50% improvement in parallel run with worker 4 and 8. We don’tsee the huge performance improvement with more workers been added. > > > Whereas, when the client and server on a different machine, we don’t see any major benefit in performance. This testingresult matches the testing results posted by David Zhang up thread. > > > > We ran the test for 100GB backup with parallel worker 4 to see the CPU usage and other information. What we noticed isthat server is consuming the CPU almost 100% whole the time and pg_stat_activity shows that server is busy with ClientWritemost of the time. > > Was this for a setup where the client and server were on the same machine or where the client was on a different machine? If it was for the case where both are on the same machine, then ideally, we should see ClientRead events in a similar proportion? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Apr 30, 2020 at 4:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Apr 29, 2020 at 6:11 PM Suraj Kharage > <suraj.kharage@enterprisedb.com> wrote: > > > > Hi, > > > > We at EnterpriseDB did some performance testing around this parallel backup to check how this is beneficial and beloware the results. In this testing, we run the backup - > > 1) Without Asif’s patch > > 2) With Asif’s patch and combination of workers 1,2,4,8. > > > > We run those test on two setup > > > > 1) Client and Server both on the same machine (Local backups) > > > > 2) Client and server on a different machine (remote backups) > > > > > > Machine details: > > > > 1: Server (on which local backups performed and used as server for remote backups) > > > > 2: Client (Used as a client for remote backups) > > > > > ... > > > > > > Client & Server on the same machine, the result shows around 50% improvement in parallel run with worker 4 and 8. Wedon’t see the huge performance improvement with more workers been added. > > > > > > Whereas, when the client and server on a different machine, we don’t see any major benefit in performance. This testingresult matches the testing results posted by David Zhang up thread. > > > > > > > > We ran the test for 100GB backup with parallel worker 4 to see the CPU usage and other information. What we noticed isthat server is consuming the CPU almost 100% whole the time and pg_stat_activity shows that server is busy with ClientWritemost of the time. > > > > > > Was this for a setup where the client and server were on the same > machine or where the client was on a different machine? If it was for > the case where both are on the same machine, then ideally, we should > see ClientRead events in a similar proportion? > During an offlist discussion with Robert, he pointed out that current basebackup's code doesn't account for the wait event for the reading of files which can change what pg_stat_activity shows? Can you please apply his latest patch to improve basebackup.c's code [1] which will take care of that waitevent before getting the data again? [1] - https://www.postgresql.org/message-id/CA%2BTgmobBw-3573vMosGj06r72ajHsYeKtksT_oTxH8XvTL7DxA%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 2020-04-30 2:18 a.m., Sumanta Mukherjee wrote:
Hi,Would it be possible to put in the absolute numbers of the perfso that it is easier to understand the amount of improvement withand without the patch and different loads and workers.
Here is the parameters used to record the perf data on both server and client side, for example, after applied the patch15 using 4 workers with load,
perf record -o postgres_patch_j4_load -e block:block_rq_insert -e cpu-clock -e cycles:k -e skb:consume_skb -aR -s -- /home/ec2-user/after/bin/postgres -D /mnt/test/data
perf record -o backup_patch_j4_load -e block:block_rq_insert -e cpu-clock -e cycles:k -e skb:consume_skb -aR -s -- /home/ec2-user/after/bin/pg_basebackup -h ${PG_SERVER} -p 5432 -D /mnt/backup/data -v
And this is how the report is generated.
perf report -i postgres_patch_j4_load --stdio > postgres_patch_j4_load.txt
The original perf data files are still available, can you please clarify which parameter you would like to be added for regenerating the report, or any other parameters need to be added to recreate the perf.data and then generate the report?
I am also unsure why the swapper is taking such a huge percentage of the absolute timein the base run of just the postgres server and pg_basebackup client.
On Thu, Apr 30, 2020 at 1:18 PM David Zhang <david.zhang@highgo.ca> wrote:Hi,
Thanks a lot for sharing the test results. Here is the our test results using perf on three ASW t2.xlarge with below configuration.
Machine configuration:
Instance Type :t2.xlarge
Volume type :io1
Memory (MiB) :16GB
vCPU # :4
Architecture :x86_64
IOP :6000
Database Size (GB) :45 (Server)case 1: postgres server: without patch and without load
* Disk I/O:
# Samples: 342K of event 'block:block_rq_insert'
# Event count (approx.): 342834
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. .....................
#
97.65% postgres [kernel.kallsyms] [k] __elv_add_request
2.27% kworker/u30:0 [kernel.kallsyms] [k] __elv_add_request
* CPU:
# Samples: 6M of event 'cpu-clock'
# Event count (approx.): 1559444750000
#
# Overhead Command Shared Object Symbol
# ........ ............... .................... .............................................
#
64.73% swapper [kernel.kallsyms] [k] native_safe_halt
10.89% postgres [vdso] [.] __vdso_gettimeofday
5.64% postgres [kernel.kallsyms] [k] do_syscall_64
5.43% postgres libpthread-2.26.so [.] __libc_recv
1.72% postgres [kernel.kallsyms] [k] pvclock_clocksource_read* Network:
# Samples: 2M of event 'skb:consume_skb'
# Event count (approx.): 2739785
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ...........................
#
91.58% swapper [kernel.kallsyms] [k] consume_skb
7.09% postgres [kernel.kallsyms] [k] consume_skb
0.61% kswapd0 [kernel.kallsyms] [k] consume_skb
0.44% ksoftirqd/3 [kernel.kallsyms] [k] consume_skb
case 1: pg_basebackup client: without patch and without load
* Disk I/O:
# Samples: 371K of event 'block:block_rq_insert'
# Event count (approx.): 371362
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. .....................
#
96.78% kworker/u30:0 [kernel.kallsyms] [k] __elv_add_request
2.82% pg_basebackup [kernel.kallsyms] [k] __elv_add_request
0.29% kworker/u30:1 [kernel.kallsyms] [k] __elv_add_request
0.09% xfsaild/xvda1 [kernel.kallsyms] [k] __elv_add_request
* CPU:
# Samples: 3M of event 'cpu-clock'
# Event count (approx.): 903527000000
#
# Overhead Command Shared Object Symbol
# ........ ............... .................. .............................................
#
87.99% swapper [kernel.kallsyms] [k] native_safe_halt
3.14% swapper [kernel.kallsyms] [k] __lock_text_start
0.48% swapper [kernel.kallsyms] [k] __softirqentry_text_start
0.37% pg_basebackup [kernel.kallsyms] [k] copy_user_enhanced_fast_string
0.35% swapper [kernel.kallsyms] [k] do_csum* Network:
# Samples: 12M of event 'skb:consume_skb'
# Event count (approx.): 12260713
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ...........................
#
95.12% swapper [kernel.kallsyms] [k] consume_skb
3.23% pg_basebackup [kernel.kallsyms] [k] consume_skb
0.83% ksoftirqd/1 [kernel.kallsyms] [k] consume_skb
0.45% kswapd0 [kernel.kallsyms] [k] consume_skb
case 2: postgres server: with patch and with load, 4 backup workers on client side
* Disk I/O:
# Samples: 3M of event 'block:block_rq_insert'
# Event count (approx.): 3634542
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. .....................
#
98.88% postgres [kernel.kallsyms] [k] __elv_add_request
0.66% perf [kernel.kallsyms] [k] __elv_add_request
0.42% kworker/u30:1 [kernel.kallsyms] [k] __elv_add_request
0.01% sshd [kernel.kallsyms] [k] __elv_add_request* CPU:
# Samples: 9M of event 'cpu-clock'
# Event count (approx.): 2299129250000
#
# Overhead Command Shared Object Symbol
# ........ ............... ..................... .............................................
#
52.73% swapper [kernel.kallsyms] [k] native_safe_halt
8.31% postgres [vdso] [.] __vdso_gettimeofday
4.46% postgres [kernel.kallsyms] [k] do_syscall_64
4.16% postgres libpthread-2.26.so [.] __libc_recv
1.58% postgres [kernel.kallsyms] [k] __lock_text_start
1.52% postgres [kernel.kallsyms] [k] pvclock_clocksource_read
0.81% postgres [kernel.kallsyms] [k] copy_user_enhanced_fast_string
* Network:
# Samples: 6M of event 'skb:consume_skb'
# Event count (approx.): 6048795
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ...........................
#
85.81% postgres [kernel.kallsyms] [k] consume_skb
12.03% swapper [kernel.kallsyms] [k] consume_skb
0.97% postgres [kernel.kallsyms] [k] __consume_stateless_skb
0.85% ksoftirqd/3 [kernel.kallsyms] [k] consume_skb
0.24% perf [kernel.kallsyms] [k] consume_skb
case 2: pg_basebackup 4 workers: with patch and with load
* Disk I/O:
# Samples: 372K of event 'block:block_rq_insert'
# Event count (approx.): 372360
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. .....................
#
97.26% kworker/u30:0 [kernel.kallsyms] [k] __elv_add_request
1.45% pg_basebackup [kernel.kallsyms] [k] __elv_add_request
0.95% kworker/u30:1 [kernel.kallsyms] [k] __elv_add_request
0.14% xfsaild/xvda1 [kernel.kallsyms] [k] __elv_add_request
* CPU:
# Samples: 4M of event 'cpu-clock'
# Event count (approx.): 1234071000000
#
# Overhead Command Shared Object Symbol
# ........ ............... ........................ .................................................
#
89.25% swapper [kernel.kallsyms] [k] native_safe_halt
0.93% pg_basebackup [kernel.kallsyms] [k] __lock_text_start
0.91% swapper [kernel.kallsyms] [k] __lock_text_start
0.69% pg_basebackup [kernel.kallsyms] [k] copy_user_enhanced_fast_string
0.45% swapper [kernel.kallsyms] [k] do_csum
* Network:
# Samples: 6M of event 'skb:consume_skb'
# Event count (approx.): 6449013
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ...........................
#
90.28% pg_basebackup [kernel.kallsyms] [k] consume_skb
9.09% swapper [kernel.kallsyms] [k] consume_skb
0.29% ksoftirqd/1 [kernel.kallsyms] [k] consume_skb
0.21% sshd [kernel.kallsyms] [k] consume_skb
The detailed perf report is attached, with different scenarios, i.e. without patch (with and without load for server and client) , with patch (with and without load for 1, 2, 4, 8 workers for both server and client). The file name should self explain the cases.
Let me know if more information required.
Best regards,
David
On 2020-04-29 5:41 a.m., Suraj Kharage wrote:Hi,We at EnterpriseDB did some performance testing around this parallel backup to check how this is beneficial and below are the results. In this testing, we run the backup -1) Without Asif’s patch2) With Asif’s patch and combination of workers 1,2,4,8.We run those test on two setup
1) Client and Server both on the same machine (Local backups)
2) Client and server on a different machine (remote backups)
Machine details:
1: Server (on which local backups performed and used as server for remote backups)
2: Client (Used as a client for remote backups)
Server:
RAM: 500 GB CPU details:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 8
NUMA node(s): 8Filesystem: ext4
Client:
RAM: 490 GB CPU details:
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 192
On-line CPU(s) list: 0-191
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 24Filesystem: ext4 Below are the results for the local test:
Data size without paralle backup
patchparallel backup with
1 worker% performance
increased/decreased
compare to normal
backup
(without patch)parallel backup with
2 worker% performance
increased/decreased
compare to normal
backup
(without patch)parallel backup with
4 worker% performance
increased/decreased
compare to normal
backup
(without patch)parallel backup with
8 worker% performance
increased/decreased
compare to normal
backup
(without patch)10 GB
(10 tables - each table around 1.05 GB)real 0m27.016s
user 0m3.378s
sys 0m23.059sreal 0m30.314s
user 0m3.575s
sys 0m22.946s12% performance
decreasedreal 0m20.400s
user 0m3.622s
sys 0m29.670s27% performace
increasedreal 0m15.331s
user 0m3.706s
sys 0m39.189s43% performance
increasedreal 0m15.094s
user 0m3.915s
sys 1m23.350s44% performace
increased.50GB
(50 tables - each table around 1.05 GB)real 2m11.049s
user 0m16.464s
sys 2m1.757sreal 2m26.621s
user 0m18.497s
sys 2m4.792s21% performance
decreasedreal 1m9.581s
user 0m18.298s
sys 2m12.030s46% performance
increasedreal 0m53.894s
user 0m18.588s
sys 2m47.390s58% performance
increased.real 0m55.373s
user 0m18.423s
sys 5m57.470s57% performance
increased.100GB
(100 tables - each table around 1.05 GB)real 4m4.776s
user 0m33.699s
sys 3m27.777sreal 4m20.862s
user 0m35.753s
sys 3m28.262s6% performance
decreasedreal 2m37.411s
user 0m36.440s
sys 4m16.424s"35% performance
increasedreal 1m49.503s
user 0m37.200s
sys 5m58.077s55% performace
increasedreal 1m36.762s
user 0m36.987s
sys 9m36.906s60% performace
increased.200GB
(200 tables - each table around 1.05 GB)real 10m34.998s
user 1m8.471s
sys 7m21.520sreal 11m30.899s
user 1m12.933s
sys 8m14.496s8% performance
decreasedreal 6m8.481s
user 1m13.771s
sys 9m31.216s41% performance
increasedreal 4m2.403s
user 1m18.331s
sys 12m29.661s61% performance
increasedreal 4m3.768s
user 1m24.547s
sys 15m21.421s61% performance
increasedResults for the remote test:
Data size without paralle backup
patchparallel backup with
1 worker% performance
increased/decreased
compare to normal
backup
(without patch)parallel backup with
2 worker% performance
increased/decreased
compare to normal
backup
(without patch)parallel backup with
4 worker% performance
increased/decreased
compare to normal
backup
(without patch)parallel backup with
8 worker% performance
increased/decreased
compare to normal
backup
(without patch)10 GB
(10 tables - each table around 1.05 GB)real 1m36.829s
user 0m2.124s
sys 0m14.004sreal 1m37.598s
user 0m3.272s
sys 0m11.110s0.8% performance
decreasedreal 1m36.753s
user 0m2.627s
sys 0m15.312s0.08% performance
increased.real 1m37.212s
user 0m3.835s
sys 0m13.221s0.3% performance
decreased.real 1m36.977s
user 0m4.475s
sys 0m17.937s0.1% perfomance
decreased.50GB
(50 tables - each table around 1.05 GB)real 7m54.211s
user 0m10.826s
sys 1m10.435sreal 7m55.603s
user 0m16.535s
sys 1m8.147s0.2% performance
decreasedreal 7m53.499s
user 0m18.131s
sys 1m8.822s0.1% performance
increased.real 7m54.687s
user 0m15.818s
sys 1m30.991s0.1% performance
decreasedreal 7m54.658s
user 0m20.783s
sys 1m34.460s0.1% performance
decreased100GB
(100 tables - each table around 1.05 GB)real 15m45.776s
user 0m21.802s
sys 2m59.006sreal 15m46.315s
user 0m32.499s
sys 2m47.245s0.05% performance
decreasedreal 15m46.065s
user 0m28.877s
sys 2m21.181s0.03% performacne
drcreasedreal 15m47.793s
user 0m30.932s
sys 2m36.708s0.2% performance
decresedreal 15m47.129s
user 0m35.151s
sys 3m23.572s0.14% performance
decreased.200GB
(200 tables - each table around 1.05 GB)real 32m55.720s
user 0m50.602s
sys 5m38.875sreal 31m30.602s
user 0m45.377s
sys 4m57.405s4% performance
increasedreal 31m30.214s
user 0m55.023s
sys 5m8.689s4% performance
increasedreal 31m31.187s
user 1m13.390s
sys 5m40.861s4% performance
increasedreal 31m31.729s
user 1m4.955s
sys 6m35.774s4% performance
decreasedClient & Server on the same machine, the result shows around 50% improvement in parallel run with worker 4 and 8. We don’t see the huge performance improvement with more workers been added.
Whereas, when the client and server on a different machine, we don’t see any major benefit in performance. This testing result matches the testing results posted by David Zhang up thread.
We ran the test for 100GB backup with parallel worker 4 to see the CPU usage and other information. What we noticed is that server is consuming the CPU almost 100% whole the time and pg_stat_activity shows that server is busy with ClientWrite most of the time.
Attaching captured output for
1) Top command output on the server after every 5 second
2) pg_stat_activity output after every 5 second
3) Top command output on the client after every 5 second
Do let me know if anyone has further questions/inputs for the benchmarking.
Thanks to Rushabh Lathia for helping me with this testing.On Tue, Apr 28, 2020 at 8:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote:On Mon, Apr 27, 2020 at 10:23 PM David Zhang <david.zhang@highgo.ca> wrote:
>
> Hi,
>
> Here is the parallel backup performance test results with and without
> the patch "parallel_backup_v15" on AWS cloud environment. Two
> "t2.xlarge" machines were used: one for Postgres server and the other
> one for pg_basebackup with the same machine configuration showing below.
>
> Machine configuration:
> Instance Type :t2.xlarge
> Volume type :io1
> Memory (MiB) :16GB
> vCPU # :4
> Architecture :x86_64
> IOP :6000
> Database Size (GB) :108
>
> Performance test results:
> without patch:
> real 18m49.346s
> user 1m24.178s
> sys 7m2.966s
>
> 1 worker with patch:
> real 18m43.201s
> user 1m55.787s
> sys 7m24.724s
>
> 2 worker with patch:
> real 18m47.373s
> user 2m22.970s
> sys 11m23.891s
>
> 4 worker with patch:
> real 18m46.878s
> user 2m26.791s
> sys 13m14.716s
>
> As required, I didn't have the pgbench running in parallel like we did
> in the previous benchmark.
>
So, there doesn't seem to be any significant improvement in this
scenario. Now, it is not clear why there was a significant
improvement in the previous run where pgbench was also running
simultaneously. I am not sure but maybe it is because when a lot of
other backends were running (performing read-only workload) the
backend that was responsible for doing backup was getting frequently
scheduled out and it slowed down the overall backup process. And when
we start using multiple backends for backup one or other backup
process is always running making the overall backup faster. One idea
to find this out is to check how much time backup takes when we run it
with and without pgbench workload on HEAD (aka unpatched code). Even
if what I am saying is true or there is some other reason due to which
we are seeing speedup in some cases (where there is a concurrent
workload), it might not make the case for using multiple backends for
backup but still, it is good to find that information as it might help
in designing this feature better.
> The perf report files for both Postgres server and pg_basebackup sides
> are attached.
>
It is not clear which functions are taking more time or for which
functions time is reduced as function symbols are not present in the
reports. I think you can refer
"https://wiki.postgresql.org/wiki/Profiling_with_perf" to see how to
take profiles and additionally use -fno-omit-frame-pointer during
configure (you can use CFLAGS="-fno-omit-frame-pointer during
configure).
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com----Thanks & Regards,Suraj kharage,EnterpriseDB Corporation,The Postgres Database Company.
On Wed, Apr 29, 2020 at 6:11 PM Suraj Kharage
<suraj.kharage@enterprisedb.com> wrote:
>
> Hi,
>
> We at EnterpriseDB did some performance testing around this parallel backup to check how this is beneficial and below are the results. In this testing, we run the backup -
> 1) Without Asif’s patch
> 2) With Asif’s patch and combination of workers 1,2,4,8.
>
> We run those test on two setup
>
> 1) Client and Server both on the same machine (Local backups)
>
> 2) Client and server on a different machine (remote backups)
>
>
> Machine details:
>
> 1: Server (on which local backups performed and used as server for remote backups)
>
> 2: Client (Used as a client for remote backups)
>
>
...
>
>
> Client & Server on the same machine, the result shows around 50% improvement in parallel run with worker 4 and 8. We don’t see the huge performance improvement with more workers been added.
>
>
> Whereas, when the client and server on a different machine, we don’t see any major benefit in performance. This testing result matches the testing results posted by David Zhang up thread.
>
>
>
> We ran the test for 100GB backup with parallel worker 4 to see the CPU usage and other information. What we noticed is that server is consuming the CPU almost 100% whole the time and pg_stat_activity shows that server is busy with ClientWrite most of the time.
>
>
Was this for a setup where the client and server were on the same
machine or where the client was on a different machine? If it was for
the case where both are on the same machine, then ideally, we should
see ClientRead events in a similar proportion?
During an offlist discussion with Robert, he pointed out that current
basebackup's code doesn't account for the wait event for the reading
of files which can change what pg_stat_activity shows? Can you please
apply his latest patch to improve basebackup.c's code [1] which will
take care of that waitevent before getting the data again?
[1] - https://www.postgresql.org/message-id/CA%2BTgmobBw-3573vMosGj06r72ajHsYeKtksT_oTxH8XvTL7DxA%40mail.gmail.com
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On Thu, Apr 30, 2020 at 4:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:On Wed, Apr 29, 2020 at 6:11 PM Suraj Kharage
<suraj.kharage@enterprisedb.com> wrote:
>
> Hi,
>
> We at EnterpriseDB did some performance testing around this parallel backup to check how this is beneficial and below are the results. In this testing, we run the backup -
> 1) Without Asif’s patch
> 2) With Asif’s patch and combination of workers 1,2,4,8.
>
> We run those test on two setup
>
> 1) Client and Server both on the same machine (Local backups)
>
> 2) Client and server on a different machine (remote backups)
>
>
> Machine details:
>
> 1: Server (on which local backups performed and used as server for remote backups)
>
> 2: Client (Used as a client for remote backups)
>
>
...
>
>
> Client & Server on the same machine, the result shows around 50% improvement in parallel run with worker 4 and 8. We don’t see the huge performance improvement with more workers been added.
>
>
> Whereas, when the client and server on a different machine, we don’t see any major benefit in performance. This testing result matches the testing results posted by David Zhang up thread.
>
>
>
> We ran the test for 100GB backup with parallel worker 4 to see the CPU usage and other information. What we noticed is that server is consuming the CPU almost 100% whole the time and pg_stat_activity shows that server is busy with ClientWrite most of the time.
>
>
Was this for a setup where the client and server were on the same
machine or where the client was on a different machine? If it was for
the case where both are on the same machine, then ideally, we should
see ClientRead events in a similar proportion?In the particular setup, the client and server were on different machines.
During an offlist discussion with Robert, he pointed out that current
basebackup's code doesn't account for the wait event for the reading
of files which can change what pg_stat_activity shows? Can you please
apply his latest patch to improve basebackup.c's code [1] which will
take care of that waitevent before getting the data again?
[1] - https://www.postgresql.org/message-id/CA%2BTgmobBw-3573vMosGj06r72ajHsYeKtksT_oTxH8XvTL7DxA%40mail.gmail.comSure, we can try out this and do a similar run to collect the pg_stat_activity output.
URL : http://www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
EMAIL: mailto: ahsan.hadi@highgo.ca
On Mon, May 4, 2020 at 6:22 PM Rushabh Lathia <rushabh.lathia@gmail.com> wrote:On Thu, Apr 30, 2020 at 4:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:On Wed, Apr 29, 2020 at 6:11 PM Suraj Kharage
<suraj.kharage@enterprisedb.com> wrote:
>
> Hi,
>
> We at EnterpriseDB did some performance testing around this parallel backup to check how this is beneficial and below are the results. In this testing, we run the backup -
> 1) Without Asif’s patch
> 2) With Asif’s patch and combination of workers 1,2,4,8.
>
> We run those test on two setup
>
> 1) Client and Server both on the same machine (Local backups)
>
> 2) Client and server on a different machine (remote backups)
>
>
> Machine details:
>
> 1: Server (on which local backups performed and used as server for remote backups)
>
> 2: Client (Used as a client for remote backups)
>
>
...
>
>
> Client & Server on the same machine, the result shows around 50% improvement in parallel run with worker 4 and 8. We don’t see the huge performance improvement with more workers been added.
>
>
> Whereas, when the client and server on a different machine, we don’t see any major benefit in performance. This testing result matches the testing results posted by David Zhang up thread.
>
>
>
> We ran the test for 100GB backup with parallel worker 4 to see the CPU usage and other information. What we noticed is that server is consuming the CPU almost 100% whole the time and pg_stat_activity shows that server is busy with ClientWrite most of the time.
>
>
Was this for a setup where the client and server were on the same
machine or where the client was on a different machine? If it was for
the case where both are on the same machine, then ideally, we should
see ClientRead events in a similar proportion?In the particular setup, the client and server were on different machines.
During an offlist discussion with Robert, he pointed out that current
basebackup's code doesn't account for the wait event for the reading
of files which can change what pg_stat_activity shows? Can you please
apply his latest patch to improve basebackup.c's code [1] which will
take care of that waitevent before getting the data again?
[1] - https://www.postgresql.org/message-id/CA%2BTgmobBw-3573vMosGj06r72ajHsYeKtksT_oTxH8XvTL7DxA%40mail.gmail.comSure, we can try out this and do a similar run to collect the pg_stat_activity output.Have you had the chance to try this out?
1: parallel backup for 100GB data with 4 workers (pg_stat_activity_normal_backup_100GB.txt)
2: Normal backup (without parallel backup patch) for 100GB data (pg_stat_activity_j4_100GB.txt)
The total number of events (pg_stat_activity) captured during above runs:
- 314 events for normal backups
- 316 events for parallel backups (-j 4)
BaseBackupRead wait event numbers: (newly added)
37 - in normal backups
25 - in the parallel backup (-j 4)
ClientWrite wait event numbers:
175 - in normal backup
1098 - in parallel backups
ClientRead wait event numbers:
0 - ClientRead in normal backup
326 - ClientRead in parallel backups for diff processes. (all in idle state)
Thanks,
Attachment
On Thu, May 21, 2020 at 11:36 AM Rushabh Lathia <rushabh.lathia@gmail.com> wrote: > > On Thu, May 21, 2020 at 10:47 AM Ahsan Hadi <ahsan.hadi@gmail.com> wrote: >> >>>> >>>> During an offlist discussion with Robert, he pointed out that current >>>> basebackup's code doesn't account for the wait event for the reading >>>> of files which can change what pg_stat_activity shows? Can you please >>>> apply his latest patch to improve basebackup.c's code [1] which will >>>> take care of that waitevent before getting the data again? >>>> >>>> [1] - https://www.postgresql.org/message-id/CA%2BTgmobBw-3573vMosGj06r72ajHsYeKtksT_oTxH8XvTL7DxA%40mail.gmail.com >>> >>> >>> >>> Sure, we can try out this and do a similar run to collect the pg_stat_activity output. >> >> >> Have you had the chance to try this out? > > > Yes. My colleague Suraj tried this and here are the pg_stat_activity output files. > > Captured wait events after every 3 seconds during the backup for - > 1: parallel backup for 100GB data with 4 workers (pg_stat_activity_normal_backup_100GB.txt) > 2: Normal backup (without parallel backup patch) for 100GB data (pg_stat_activity_j4_100GB.txt) > > Here is the observation: > > The total number of events (pg_stat_activity) captured during above runs: > - 314 events for normal backups > - 316 events for parallel backups (-j 4) > > BaseBackupRead wait event numbers: (newly added) > 37 - in normal backups > 25 - in the parallel backup (-j 4) > > ClientWrite wait event numbers: > 175 - in normal backup > 1098 - in parallel backups > > ClientRead wait event numbers: > 0 - ClientRead in normal backup > 326 - ClientRead in parallel backups for diff processes. (all in idle state) > It might be interesting to see why ClientRead/ClientWrite has increased so much and can we reduce it? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, May 21, 2020 at 2:06 AM Rushabh Lathia <rushabh.lathia@gmail.com> wrote: > Yes. My colleague Suraj tried this and here are the pg_stat_activity output files. > > Captured wait events after every 3 seconds during the backup for - > 1: parallel backup for 100GB data with 4 workers (pg_stat_activity_normal_backup_100GB.txt) > 2: Normal backup (without parallel backup patch) for 100GB data (pg_stat_activity_j4_100GB.txt) > > Here is the observation: > > The total number of events (pg_stat_activity) captured during above runs: > - 314 events for normal backups > - 316 events for parallel backups (-j 4) > > BaseBackupRead wait event numbers: (newly added) > 37 - in normal backups > 25 - in the parallel backup (-j 4) > > ClientWrite wait event numbers: > 175 - in normal backup > 1098 - in parallel backups > > ClientRead wait event numbers: > 0 - ClientRead in normal backup > 326 - ClientRead in parallel backups for diff processes. (all in idle state) So, basically, when we go from 1 process to 4, the additional processes spend all of their time waiting rather than doing any useful work, and that's why there is no performance benefit. Presumably, the reason they spend all their time waiting for ClientRead/ClientWrite is because the network between the two machines is saturated, so adding more processes that are trying to use it at maximum speed just leads to spending more time waiting for it to be available. Do we have the same results for the local backup case, where the patch helped? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
So, basically, when we go from 1 process to 4, the additional
processes spend all of their time waiting rather than doing any useful
work, and that's why there is no performance benefit. Presumably, the
reason they spend all their time waiting for ClientRead/ClientWrite is
because the network between the two machines is saturated, so adding
more processes that are trying to use it at maximum speed just leads
to spending more time waiting for it to be available.
Do we have the same results for the local backup case, where the patch helped?
- 82 events for normal backups
- 31 events for parallel backups (-j 4)
BaseBackupRead wait event numbers: (newly added)
24 - in normal backups
14 - in parallel backup (-j 4)
ClientWrite wait event numbers:
8 - in normal backup
43 - in parallel backups
ClientRead wait event numbers:
0 - ClientRead in normal backup
32 - ClientRead in parallel backups for diff processes.
Attachment
Taking a backup is an I/O intensive workload. So by parallelizing it through multiple worker threads/processes, creates an overhead of its own. So what precisely are we optimizing here. Looking at a running database system in any environment, I see the following potential scenarios playing out. These are probably clear to everyone here, but I'm listing these for completeness and clarity.
Locally Running Backup:
(1) Server has no clients connected other than base backup.
(2) Server has other clients connected which are actively performing operations causing disk I/O.
Remotely Running Backup:
(3) Server has no clients connected other than remote base backup.
(4) Server has other clients connected which are actively performing operations causing disk I/O.
Others:
(5) Server or the system running base backup has other processes competing for disk or network bandwidth.
Generally speaking, I see that parallelization could potentially benefit in scenarios (2), (4) and (5) with the reason being that having more than one thread increases the likelihood that backup will now get a bigger time slice for disk I/O and network bandwidth. With (1) and (3), since there are no competing processes, addition of multiple threads or processes will only increase CPU overhead whilst still getting the same network and disk time slice. In this particular case, the performance will degrade.
IMHO, that’s why by adding other load on the server, perhaps by running pgbench simultaneously may show improved performance for parallel backup. Also, running parallel backup on a local laptop more often than yields improved performance.
There are obviously other factors that may impact the performance like the type of I/O scheduler being used whether CFQ or some other.
IMHO, parallel backup has obvious performance benefits, but we need to ensure that users understand that there is potential for slower backup if there is no competition for resources.
On Thu, May 21, 2020 at 7:12 PM Robert Haas <robertmhaas@gmail.com> wrote:
So, basically, when we go from 1 process to 4, the additional
processes spend all of their time waiting rather than doing any useful
work, and that's why there is no performance benefit. Presumably, the
reason they spend all their time waiting for ClientRead/ClientWrite is
because the network between the two machines is saturated, so adding
more processes that are trying to use it at maximum speed just leads
to spending more time waiting for it to be available.
Do we have the same results for the local backup case, where the patch helped?Here is the result for local backup case (100GB data). Attaching the captured logs.The total number of events (pg_stat_activity) captured during local runs:
- 82 events for normal backups
- 31 events for parallel backups (-j 4)
BaseBackupRead wait event numbers: (newly added)
24 - in normal backups
14 - in parallel backup (-j 4)
ClientWrite wait event numbers:
8 - in normal backup
43 - in parallel backups
ClientRead wait event numbers:
0 - ClientRead in normal backup
32 - ClientRead in parallel backups for diff processes.----Thanks & Regards,Suraj kharage,EnterpriseDB Corporation,The Postgres Database Company.
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL:+923335449950 EMAIL: mailto:hamid.akhtar@highgo.ca
On Thu, Jun 11, 2020 at 1:41 PM Hamid Akhtar <hamid.akhtar@gmail.com> wrote: > As far I understand, parallel backup is not a mandatory performance feature, rather, one at user's discretion. This IMHOindicates that it will benefit some users and it may not others. > > IMHO, parallel backup has obvious performance benefits, but we need to ensure that users understand that there is potentialfor slower backup if there is no competition for resources. I am sure that nobody is arguing that the patch has to be beneficial in all cases in order to justify applying it. However, there are several good arguments against proceding with this patch: * Every version of the patch that has been reviewed by anybody has been riddled with errors. Over and over again, testers have found serious bugs, and code reviewers have noticed lots of problems, too. * This approach requires rewriting a lot of current functionality, either by moving it to the client side or by restructuring it to work with parallelism. That's a lot of work, and it seems likely to generate more work in the future as people continue to add features. It's one thing to add a feature that doesn't benefit everybody; it's another thing to add a feature that doesn't benefit everybody and also hinders future development. See http://postgr.es/m/CA+TgmoZubLXYR+Pd_gi3MVgyv5hQdLm-GBrVXkun-Lewaw12Kg@mail.gmail.com for more discussion of these issues. * The scenarios in which the patch delivers a performance benefit are narrow and somewhat contrived. In remote backup scenarios, AIUI, the patch hasn't been shown to help. In local backups, it does, but how likely is it that you are going to do your local backups over the wire protocol instead of by direct file copy, which is probably much faster? I agree that if your server is overloaded, having multiple processes competing for the server resources will allow backup to get a larger slice relative to other things, but that seems like a pretty hackish and inefficient solution to that problem. You could also argue that we could provide a feature to prioritize some queries over other queries by running them with tons of parallel workers just to convince the OS to give them more resources, and I guess that would work, but it would also waste tons of resources and possibly cripple or even crash your system if you used it enough. The same argument applies here. * Even when the patch does provide a benefit, it seems to max out at about 2.5X. Clearly it's nice to have something go 2.5X faster, but the point is that it doesn't scale beyond that no matter how many workers you add. That doesn't automatically mean that something is a bad idea, but it is a concern. At the very least, we should be able to say why it doesn't scale any better than that. * Actually, we have some hints about that. Over at http://postgr.es/m/20200503174922.mfzzdafa5g4rlhez@alap3.anarazel.de Andres has shown that too much concurrency when copying files results in a dramatic performance reduction, and that a lot of the reason why concurrency helps in the first place has to do with the fact that pg_basebackup does not have any cache control (no fallocate, sync_file_range(WRITE), posix_fadvise(DONTNEED)). When those things are added the performance gets better and the benefits of concurrency are reduced. I suspect that would also be true for this patch. It would be unreasonable to commit a large patch, especially one that would hinder future development, if we could get the same benefits from a small patch that would not do so. I am not in a position to tell you how to spend your time, so you can certainly pursue this patch if you wish. However, I think it's probably not the best use of time. Even if you fixed all the bugs and reimplemented all of the functionality that needs reimplementing in order to make this approach work, it still doesn't make sense to commit the patch if either (a) we can obtain the same benefit, or most of it, from a much simpler patch or (b) the patch is going to make it significantly harder to develop other features that we want to have, especially if those features seem likely to be more beneficial than what this patch offers. I think both of those are likely true here. For an example of (b), consider compression of tar files on the server side before transmission to the client. If you take the approach this patch does and move tarfile construction to the client, that is impossible. Now you can argue (and perhaps you will) that this would just mean someone has to choose between using this feature and using that feature, and why should users not have such a choice? That is a fair argument, but my counter-argument is that users shouldn't be forced into making that choice. If the parallel feature is beneficial enough to justify having it, then it ought to be designed in such a way that it works with the other features we also want to have rather than forcing users to choose between them. Since I have already proposed (on the other thread linked above) a design that would make that possible, and this design does not, I have a hard time understanding why we would pick this one, especially given all of the other disadvantages which it seems to have. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On 12 Jun 2020, at 19:28, Robert Haas <robertmhaas@gmail.com> wrote: > I am sure that nobody is arguing that the patch has to be beneficial > in all cases in order to justify applying it. However, there are > several good arguments against proceding with this patch: This thread has stalled with no resolution to the raised issues, and the latest version of the patch (v15) posted no longer applies (I only tried 0001 which failed, the green tick in the CFBot is due it mistakenlt thinking an attached report is a patch). I'm marking this patch Returned with Feedback. Please open a new CF entry when there is a new version of the patch. cheers ./daniel
> On 12 Jun 2020, at 19:28, Robert Haas <robertmhaas@gmail.com> wrote:
> I am sure that nobody is arguing that the patch has to be beneficial
> in all cases in order to justify applying it. However, there are
> several good arguments against proceding with this patch:
This thread has stalled with no resolution to the raised issues, and the latest
version of the patch (v15) posted no longer applies (I only tried 0001 which
failed, the green tick in the CFBot is due it mistakenlt thinking an attached
report is a patch). I'm marking this patch Returned with Feedback. Please
open a new CF entry when there is a new version of the patch.
cheers ./daniel
URL : www.highgo.ca
ADDR: 10318 WHALLEY BLVD, Surrey, BC
CELL:+923335449950 EMAIL: mailto:hamid.akhtar@highgo.ca