Re: WIP/PoC for parallel backup - Mailing list pgsql-hackers
From | Rajkumar Raghuwanshi |
---|---|
Subject | Re: WIP/PoC for parallel backup |
Date | |
Msg-id | CAKcux6mMk-F2LmUy9arVu0QQiZfVCuBWGZbSxjn=dAjUWMSvew@mail.gmail.com Whole thread Raw |
In response to | Re: WIP/PoC for parallel backup (Asif Rehman <asifr.rehman@gmail.com>) |
Responses |
Re: WIP/PoC for parallel backup
|
List | pgsql-hackers |
Hi Asif
I have started testing this feature. I have applied v6 patch on commit a069218163704c44a8996e7e98e765c56e2b9c8e (30 Jan).
I got few observations, please take a look.--if backup failed, backup directory is not getting removed.
[edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=9 -D /tmp/test_bkp/bkp6
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
[edb@localhost bin]$ ./pg_basebackup -p 5432 --jobs=8 -D /tmp/test_bkp/bkp6
pg_basebackup: error: directory "/tmp/test_bkp/bkp6" exists but is not empty
--giving large number of jobs leading segmentation fault.
./pg_basebackup -p 5432 --jobs=1000 -D /tmp/t3
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
.
.
.
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 10)
pg_basebackup: error: could not connect to server: could not fork new process for connection: Resource temporarily unavailable
could not fork new process for connection: Resource temporarily unavailable
pg_basebackup: error: failed to create thread: Resource temporarily unavailable
Segmentation fault (core dumped)
--stack-trace
gdb -q -c core.11824 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -p 5432 --jobs=1000 -D /tmp/test_bkp/bkp10'.
Program terminated with signal 11, Segmentation fault.
#0 pthread_join (threadid=140503120623360, thread_return=0x0) at pthread_join.c:46
46 if (INVALID_NOT_TERMINATED_TD_P (pd))
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 pthread_join (threadid=140503120623360, thread_return=0x0) at pthread_join.c:46
#1 0x0000000000408e21 in cleanup_workers () at pg_basebackup.c:2840
#2 0x0000000000403846 in disconnect_atexit () at pg_basebackup.c:316
#3 0x0000003921235a02 in __run_exit_handlers (status=1) at exit.c:78
#4 exit (status=1) at exit.c:100
#5 0x0000000000408aa6 in create_parallel_workers (backupinfo=0x1a4b8c0) at pg_basebackup.c:2713
#6 0x0000000000407946 in BaseBackup () at pg_basebackup.c:2127
#7 0x000000000040895c in main (argc=6, argv=0x7ffd566f4718) at pg_basebackup.c:2668
--with tablespace is in the same directory as data, parallel_backup crashed
[edb@localhost bin]$ ./initdb -D /tmp/data
[edb@localhost bin]$ ./pg_ctl -D /tmp/data -l /tmp/logfile start
[edb@localhost bin]$ mkdir /tmp/ts
[edb@localhost bin]$ ./psql postgres
psql (13devel)
Type "help" for help.
postgres=# create tablespace ts location '/tmp/ts';
CREATE TABLESPACE
postgres=# create table tx (a int) tablespace ts;
CREATE TABLE
postgres=# \q
[edb@localhost bin]$ ./pg_basebackup -j 2 -D /tmp/tts -T /tmp/ts=/tmp/ts1
Segmentation fault (core dumped)
--stack-trace
[edb@localhost bin]$ gdb -q -c core.15778 pg_basebackup
Loaded symbols for /lib64/libnss_files.so.2
Core was generated by `./pg_basebackup -j 2 -D /tmp/tts -T /tmp/ts=/tmp/ts1'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000409442 in get_backup_filelist (conn=0x140cb20, backupInfo=0x14210a0) at pg_basebackup.c:3000
3000 backupInfo->curr->next = file;
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-24.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-58.el6_10.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0 0x0000000000409442 in get_backup_filelist (conn=0x140cb20, backupInfo=0x14210a0) at pg_basebackup.c:3000
#1 0x0000000000408b56 in parallel_backup_run (backupinfo=0x14210a0) at pg_basebackup.c:2739
#2 0x0000000000407955 in BaseBackup () at pg_basebackup.c:2128
#3 0x000000000040895c in main (argc=7, argv=0x7ffca2910c58) at pg_basebackup.c:2668
(gdb)
Thanks & Regards,
Rajkumar Raghuwanshi
On Tue, Feb 25, 2020 at 7:49 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
Hi,I have created a commitfest entry.On Mon, Feb 17, 2020 at 1:39 PM Asif Rehman <asifr.rehman@gmail.com> wrote:Thanks Jeevan. Here is the documentation patch.On Mon, Feb 10, 2020 at 6:49 PM Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote:Hi Asif,On Thu, Jan 30, 2020 at 7:10 PM Asif Rehman <asifr.rehman@gmail.com> wrote:Here are the the updated patches, taking care of the issues pointedearlier. This patch adds the following commands (with specified option):START_BACKUP [LABEL '<label>'] [FAST]STOP_BACKUP [NOWAIT]LIST_TABLESPACES [PROGRESS]LIST_FILES [TABLESPACE]LIST_WAL_FILES [START_WAL_LOCATION 'X/X'] [END_WAL_LOCATION 'X/X']SEND_FILES '(' FILE, FILE... ')' [START_WAL_LOCATION 'X/X'][NOVERIFY_CHECKSUMS]Parallel backup is not making any use of tablespace map, so I haveremoved that option from the above commands. There is a patch pendingto remove the exclusive backup; we can further refactor the do_pg_start_backupfunction at that time, to remove the tablespace information and move thecreation of tablespace_map file to the client.I have disabled the maxrate option for parallel backup. I intend to sendout a separate patch for it. Robert previously suggested to implementthrottling on the client-side. I found the original email thread [1]where throttling was proposed and added to the server. In that thread,it was originally implemented on the client-side, but per many suggestions,it was moved to server-side.So, I have a few suggestions on how we can implement this:1- have another option for pg_basebackup (i.e. per-worker-maxrate) wherethe user could choose the bandwidth allocation for each worker. This approachcan be implemented on the client-side as well as on the server-side.2- have the maxrate, be divided among workers equally at first. and thelet the main thread keep adjusting it whenever one of the workers finishes.I believe this would only be possible if we handle throttling on the client.Also, as I understand it, implementing this will introduce additional mutexfor handling of bandwidth consumption data so that rate may be adjustedaccording to data received by threads.--Asif RehmanThe latest changes look good to me. However, the patch set is missing the documentation.Please add those.Thanks
--Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company----Asif Rehman----Asif Rehman
pgsql-hackers by date: