Home > mailing lists

Re: WIP/PoC for parallel backup - Mailing list pgsql-hackers

From	Rajkumar Raghuwanshi
Subject	Re: WIP/PoC for parallel backup
Date	March 11, 2020 09:38:20
Msg-id	CAKcux6mMk-F2LmUy9arVu0QQiZfVCuBWGZbSxjn=dAjUWMSvew@mail.gmail.com Whole thread Raw
In response to	Re: WIP/PoC for parallel backup (Asif Rehman <asifr.rehman@gmail.com>)
Responses	Re: WIP/PoC for parallel backup
List	pgsql-hackers

Tree view

Hi Asif

I have started testing this feature. I have applied v6 patch on commit a069218163704c44a8996e7e98e765c56e2b9c8e (30 Jan).

I got few observations, please take a look.

--if backup failed, backup directory is not getting removed.
[edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=9 -D /tmp/test_bkp/bkp6
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
[edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=8 -D /tmp/test_bkp/bkp6
pg_basebackup: error: directory "/tmp/test_bkp/bkp6" exists but is not empty

--giving large number of jobs leading segmentation fault.
./pg_basebackup -p 5432 --jobs=1000 -D /tmp/t3
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
.
.
.
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: could not fork new process for connection: Resource temporarily unavailable

could not fork new process for connection: Resource temporarily unavailable
pg_basebackup: error: failed to create thread: Resource temporarily unavailable
Segmentation fault (core dumped)

--stack-trace
gdb -q -c core.11824 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 --jobs=1000 -D /tmp/test_bkp/bkp10'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=140503120623360, thread_return=0x0) at pthread_join.c:46
46 if (INVALID_NOT_TERMINATED_TD_P (pd))
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 pthread_join (threadid=140503120623360, thread_return=0x0) at pthread_join.c:46
#1 0x0000000000408e21 in cleanup_workers () at pg_basebackup.c:2840
#2 0x0000000000403846 in disconnect_atexit () at pg_basebackup.c:316
#3 0x0000003921235a02 in __run_exit_handlers (status=1) at exit.c:78
#4 exit (status=1) at exit.c:100
#5 0x0000000000408aa6 in create_parallel_workers (backupinfo=0x1a4b8c0) at pg_basebackup.c:2713
#6 0x0000000000407946 in BaseBackup () at pg_basebackup.c:2127
#7 0x000000000040895c in main (argc=6, argv=0x7ffd566f4718) at pg_basebackup.c:2668

--with tablespace is in the same directory as data, parallel_backup crashed
[edb@localhost bin]$ ./initdb -D /tmp/data
[edb@localhost bin]$ ./pg_ctl -D /tmp/data -l /tmp/logfile start
[edb@localhost bin]$ mkdir /tmp/ts
[edb@localhost bin]$ ./psql postgres
psql (13devel)
Type "help" for help.

postgres=# create tablespace ts location '/tmp/ts';
CREATE TABLESPACE
postgres=# create table tx (a int) tablespace ts;
CREATE TABLE
postgres=# \q
[edb@localhost bin]$ ./pg_basebackup -j 2 -D /tmp/tts -T /tmp/ts=/tmp/ts1
Segmentation fault (core dumped)

--stack-trace
[edb@localhost bin]$ gdb -q -c core.15778 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -j 2 -D /tmp/tts -T /tmp/ts=/tmp/ts1'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000409442 in get_backup_filelist (conn=0x140cb20, backupInfo=0x14210a0) at pg_basebackup.c:3000
3000 backupInfo->curr->next = file;
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x0000000000409442 in get_backup_filelist (conn=0x140cb20, backupInfo=0x14210a0) at pg_basebackup.c:3000
#1 0x0000000000408b56 in parallel_backup_run (backupinfo=0x14210a0) at pg_basebackup.c:2739
#2 0x0000000000407955 in BaseBackup () at pg_basebackup.c:2128
#3 0x000000000040895c in main (argc=7, argv=0x7ffca2910c58) at pg_basebackup.c:2668
(gdb)

Thanks & Regards,

Rajkumar Raghuwanshi

On Tue, Feb 25, 2020 at 7:49 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

Hi,

I have created a commitfest entry.
https://commitfest.postgresql.org/27/2472/

On Mon, Feb 17, 2020 at 1:39 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
Thanks Jeevan. Here is the documentation patch.

On Mon, Feb 10, 2020 at 6:49 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:
Hi Asif,

On Thu, Jan 30, 2020 at 7:10 PM Asif Rehman <asifr.rehman@gmail.com> wrote:

Here are the the updated patches, taking care of the issues pointed
earlier. This patch adds the following commands (with specified option):

START_BACKUP [LABEL '<label>'] [FAST]
STOP_BACKUP [NOWAIT]
LIST_TABLESPACES [PROGRESS]
LIST_FILES [TABLESPACE]
LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']
SEND_FILES '(' FILE, FILE... ')' [START_WAL_LOCATION 'X/X']
[NOVERIFY_CHECKSUMS]

Parallel backup is not making any use of tablespace map, so I have
removed that option from the above commands. There is a patch pending
to remove the exclusive backup; we can further refactor the do_pg_start_backup
function at that time, to remove the tablespace information and move the
creation of tablespace_map file to the client.

I have disabled the maxrate option for parallel backup. I intend to send
out a separate patch for it. Robert previously suggested to implement
throttling on the client-side. I found the original email thread [1]
where throttling was proposed and added to the server. In that thread,
it was originally implemented on the client-side, but per many suggestions,
it was moved to server-side.

So, I have a few suggestions on how we can implement this:

1- have another option for pg_basebackup (i.e. per-worker-maxrate) where
the user could choose the bandwidth allocation for each worker. This approach
can be implemented on the client-side as well as on the server-side.

2- have the maxrate, be divided among workers equally at first. and the
let the main thread keep adjusting it whenever one of the workers finishes.
I believe this would only be possible if we handle throttling on the client.
Also, as I understand it, implementing this will introduce additional mutex
for handling of bandwidth consumption data so that rate may be adjusted
according to data received by threads.

[1] https://www.postgresql.org/message-id/flat/521B4B29.20009%402ndquadrant.com#189bf840c87de5908c0b4467d31b50af

--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

The latest changes look good to me. However, the patch set is missing the documentation.
Please add those.

Thanks

--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

--
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

--
--
Asif Rehman
Highgo Software (Canada/China/Pakistan)
URL : www.highgo.ca

pgsql-hackers by date:

From: Amit Kapila
Date: 11 March 2020, 09:06:23
Subject: Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

From: Dilip Kumar
Date: 11 March 2020, 09:51:50
Subject: Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

Re: WIP/PoC for parallel backup - Mailing list pgsql-hackers

Previous

Next