Re: WIP/PoC for parallel backup - Mailing list pgsql-hackers
From | Jeevan Chalke |
---|---|
Subject | Re: WIP/PoC for parallel backup |
Date | |
Msg-id | CAM2+6=UYH799S2qZ4V=f7EK1Y1HzFQKx8GnQWw2gEET1J=mZhQ@mail.gmail.com Whole thread Raw |
In response to | Re: WIP/PoC for parallel backup (Asif Rehman <asifr.rehman@gmail.com>) |
Responses |
Re: WIP/PoC for parallel backup
Re: WIP/PoC for parallel backup |
List | pgsql-hackers |
On Wed, Nov 13, 2019 at 7:04 PM Asif Rehman <asifr.rehman@gmail.com> wrote:
Sorry, I sent the wrong patches. Please see the correct version of the patches (_v6).
Review comments on these patches:
1.
+ XLogRecPtr wal_location;
Looking at the other field names in basebackup_options structure, let's use
wallocation instead. Or better startwallocation to be precise.
2.
+ int32 size;
Should we use size_t here?
3.
I am still not sure why we need SEND_BACKUP_FILELIST as a separate command.
Can't we return the file list with START_BACKUP itself?
4.
+ else if (
+#ifndef WIN32
+ S_ISLNK(statbuf.st_mode)
+#else
+ pgwin32_is_junction(pathbuf)
+#endif
+ )
+ {
+ /*
+ * If symlink, write it as a directory. file symlinks only allowed
+ * in pg_tblspc
+ */
+ statbuf.st_mode = S_IFDIR | pg_dir_create_mode;
+ _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf, false);
+ }
In normal backup mode, we skip the special file which is not a regular file or
a directory or a symlink inside pg_tblspc. But in your patch, above code,
treats it as a directory. Should parallel backup too skip such special files?
5.
Please keep header file inclusions in alphabetical order in basebackup.c and
pg_basebackup.c
6.
+ /*
+ * build query in form of: SEND_BACKUP_FILES ('base/1/1245/32683',
+ * 'base/1/1245/32683', ...) [options]
+ */
Please update these comments as we fetch one file at a time.
7.
+backup_file:
+ SCONST { $$ = (Node *) makeString($1); }
+ ;
+
Instead of having this rule with only one constant terminal, we can use
SCONST directly in backup_files_list. However, I don't see any issue with
this approach either, just trying to reduce the rules.
8.
Please indent code within 80 char limit at all applicable places.
9.
Please fix following typos:
identifing => identifying
optionaly => optionally
structre => structure
progrsss => progress
Retrive => Retrieve
direcotries => directories
=====
The other mail thread related to backup manifest [1], is creating a
backup_manifest file and sends that to the client which has optional
checksum and other details including filename, file size, mtime, etc.
There is a patch on the same thread which is then validating the backup too.
Since this patch too gets a file list from the server and has similar
details (except checksum), can somehow parallel backup use the backup-manifest
infrastructure from that patch?
When the parallel backup is in use, will there be a backup_manifest file
created too? I am just visualizing what will be the scenario when both these
features are checked-in.
[1] https://www.postgresql.org/message-id/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
1.
+ XLogRecPtr wal_location;
Looking at the other field names in basebackup_options structure, let's use
wallocation instead. Or better startwallocation to be precise.
2.
+ int32 size;
Should we use size_t here?
3.
I am still not sure why we need SEND_BACKUP_FILELIST as a separate command.
Can't we return the file list with START_BACKUP itself?
4.
+ else if (
+#ifndef WIN32
+ S_ISLNK(statbuf.st_mode)
+#else
+ pgwin32_is_junction(pathbuf)
+#endif
+ )
+ {
+ /*
+ * If symlink, write it as a directory. file symlinks only allowed
+ * in pg_tblspc
+ */
+ statbuf.st_mode = S_IFDIR | pg_dir_create_mode;
+ _tarWriteHeader(pathbuf + basepathlen + 1, NULL, &statbuf, false);
+ }
In normal backup mode, we skip the special file which is not a regular file or
a directory or a symlink inside pg_tblspc. But in your patch, above code,
treats it as a directory. Should parallel backup too skip such special files?
5.
Please keep header file inclusions in alphabetical order in basebackup.c and
pg_basebackup.c
6.
+ /*
+ * build query in form of: SEND_BACKUP_FILES ('base/1/1245/32683',
+ * 'base/1/1245/32683', ...) [options]
+ */
Please update these comments as we fetch one file at a time.
7.
+backup_file:
+ SCONST { $$ = (Node *) makeString($1); }
+ ;
+
Instead of having this rule with only one constant terminal, we can use
SCONST directly in backup_files_list. However, I don't see any issue with
this approach either, just trying to reduce the rules.
8.
Please indent code within 80 char limit at all applicable places.
9.
Please fix following typos:
identifing => identifying
optionaly => optionally
structre => structure
progrsss => progress
Retrive => Retrieve
direcotries => directories
=====
The other mail thread related to backup manifest [1], is creating a
backup_manifest file and sends that to the client which has optional
checksum and other details including filename, file size, mtime, etc.
There is a patch on the same thread which is then validating the backup too.
Since this patch too gets a file list from the server and has similar
details (except checksum), can somehow parallel backup use the backup-manifest
infrastructure from that patch?
When the parallel backup is in use, will there be a backup_manifest file
created too? I am just visualizing what will be the scenario when both these
features are checked-in.
[1] https://www.postgresql.org/message-id/CA+TgmoZV8dw1H2bzZ9xkKwdrk8+XYa+DC9H=F7heO2zna5T6qg@mail.gmail.com
--Asif Rehman
Thanks
--
Jeevan Chalke
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
Associate Database Architect & Team Lead, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company
pgsql-hackers by date: