Thread: More issues with pg_verify_checksums and checksum verification inbase backups
More issues with pg_verify_checksums and checksum verification inbase backups
From
Michael Paquier
Date:
Hi all, This is a follow-up of the following thread: https://www.postgresql.org/message-id/20181012010411.re53cwcistcpip5j@alap3.anarazel.de In a nutshell, b34e84f1 has added TAP tests for pg_verify_checksums, and the buildfarm has immediately complained about files generated by EXEC_BACKEND missing, causing pg_verify_checksums to fail for such builds. An extra issue has been noticed by Andres in the tool: it misses to check for temporary files, hence files like base/pgsql_tmp/pgsql_tmp$PID.NN.sharedfileset/1.1 can also cause the tool to fail. After a crash, those files would not be removed, and stopping the instance would still let them around. pg_verify_checksums used first a blacklist to decide if files have checksums or not. So with this approach all files should have checksums except files like pg_control, pg_filenode.map, PG_VERSION, etc. d55241a has added a first fix to silence the buildfarm for EXEC_BACKEND builds. After discussion, Andres has pointed out that some extensions may want to drop files within global/ or base/ as part of their logic. cstore_fdw was mentioned but if you look at their README that's not the case. However, I think that Andres argument is pretty good as with pluggable storage we should allow extensions to put custom files for different tablespaces. So this commit has changed the logic of pg_verify_checksums to use a whitelist, which assumes that only normal relation files have checksums: <digits> <digits>.<segment> <digits>_<forkname> <digits>_<forkname>.<segment After more discussion on the thread mentioned above, Stephen has pointed out that base backups use the same blacklist logic when verifying checksums. This has the same problem with EXEC_BACKEND-specific files, resulting in spurious warnings (that's an example!) which are confusing for the user: $ pg_basebackup -D popo WARNING: cannot verify checksum in file "./global/config_exec_params", block 0: read buffer size 5 and page size 8192 differ Folks on this thread agreed that both pg_verify_checksums and checksum verification for base backups should use the same logic. It seems to me that using a whitelist-based approach for both is easier to maintain as the patterns of files supporting checksums is more limited than files which do not support checksums. So this way we allow extensions bundling custom files to still work with pg_verify_checksums and checksum verification in base backups. Something else which has been discussed on this thread is that we should have a dedicated API to decide if a file has checksums or not, and if it should be skipped in both cases. That's work only for HEAD though, so we need to do something for HEAD and v11, which is simple. Attached is a patch I have cooked for this purpose. I have studied the file patterns opened by the backend, and we actually need to only skip temporary files and folders as those can include legit relation file names (like 1.1 for example). At the same time I have found about parse_filename_for_nontemp_relation() which is a copycat of the recently-added isRelFileName(), so we can easily shave some code by reusing it in both cases. This also takes care of the issues for temporary files, and it also fixes an extra bug coming from the original implementation which checked the file patterns without looking at the file type first, so the tool could miss some cascading paths. This should be applied to v11 and HEAD. Please feel free to comment. Thanks for reading. -- Michael
Attachment
Re: More issues with pg_verify_checksums and checksum verificationin base backups
From
Stephen Frost
Date:
Greetings, * Michael Paquier (michael@paquier.xyz) wrote: > This is a follow-up of the following thread: > https://www.postgresql.org/message-id/20181012010411.re53cwcistcpip5j@alap3.anarazel.de Thanks for starting this thread Michael. > pg_verify_checksums used first a blacklist to decide if files have > checksums or not. So with this approach all files should have checksums > except files like pg_control, pg_filenode.map, PG_VERSION, etc. So, this is a great example- pg_control actually *does* have a CRC and we really should be checking it in tools like pg_verify_checksums and pg_basebackup. Similairly, we should be trying to get to a point where we have checksums for anything else that we actually care about. > d55241a has added a first fix to silence the buildfarm for EXEC_BACKEND > builds. After discussion, Andres has pointed out that some extensions > may want to drop files within global/ or base/ as part of their logic. > cstore_fdw was mentioned but if you look at their README that's not the > case. So the one example that's been put forward doesn't actually do this..? > However, I think that Andres argument is pretty good as with > pluggable storage we should allow extensions to put custom files for > different tablespaces. Andres' wasn't argueing, that I saw at least, that pluggable storage would result in random files showing up in tablespace directories that the core code has no knowledge of. In fact, he seemed to be saying in 20181019205724.ewnuhvrsanacihzq@alap3.anarazel.de that pluggable storage results in the core code being aware of the files that those other storage mechanisms create, so this entire line of argument seems to be without merit. > So this commit has changed the logic of > pg_verify_checksums to use a whitelist, which assumes that only normal > relation files have checksums: > <digits> > <digits>.<segment> > <digits>_<forkname> > <digits>_<forkname>.<segment > After more discussion on the thread mentioned above, Stephen has pointed > out that base backups use the same blacklist logic when verifying > checksums. This has the same problem with EXEC_BACKEND-specific files, > resulting in spurious warnings (that's an example!) which are confusing > for the user: > $ pg_basebackup -D popo > WARNING: cannot verify checksum in file "./global/config_exec_params", > block 0: read buffer size 5 and page size 8192 differ Stephen also extensively argued that extensions shouldn't be dropping random files into PG's database and tablespace directories and that we shouldn't be writing code which attempts to work around that (and, indeed, ultimately can't since technically extension authors could drop files into those directories which match these "whitelist patterns"). Further, using a whitelist risks possibly missing files that should be included, unlike having an exclude list which ensures that we actually check everything and complain about anything found that's out of the ordinary. Additionally, having these checks would hopefully make it clear that people shouldn't be dropping random files into our database and tablespace directories- something we didn't try to do for the root of the data directory, resulting in, frankly, a big mess. Allowing random files to exist in the data directory has lead to quite a few issues including things like pg_basebackup breaking because of a root-owned file or similar that can't be read, and this extends that. I thought the point of this new thread was to encourage discussion of that point and the pros and cons seen for each side, not to only include one side of it, so I'm rather disappointed. > Folks on this thread agreed that both pg_verify_checksums and checksum > verification for base backups should use the same logic. It seems to me > that using a whitelist-based approach for both is easier to maintain as > the patterns of files supporting checksums is more limited than files > which do not support checksums. So this way we allow extensions > bundling custom files to still work with pg_verify_checksums and > checksum verification in base backups. This is not an accurate statement- those random files which some extension drops into these directories don't "work" with pg_verify_checksums, this just makes pg_verify_checksums ignore them. That is not the same thing at all. Worse, we end up with things like pg_basebackup copying/backing up those files, but skipping them for validation checking, when we have no reason to expect that those files will be at all valid when they're copied and no way to see if they are valid in the resulting restore. Other parts of the system will continue to similairly do things that people might not expect- DROP DATABASE will happily nuke any file it finds, no matter if it matches these patterns or not. Basically, the way I see it, at least, is that we should either maintain that PG's database and tablespace directories are under the purview of PG and other things shouldn't be putting files there without the core code's knowledge, or we decide that it's ok for random things to drop files into these directories and we'll just ignore them- across the board. I don't like this half-and-half approach where *some* things will operate on files we don't recognize, including removing them in some cases!, but other parts of the system will ignore them. > Something else which has been discussed on this thread is that we should > have a dedicated API to decide if a file has checksums or not, and if it > should be skipped in both cases. That's work only for HEAD though, so > we need to do something for HEAD and v11, which is simple. Yes, some kind of API, which is then in libpgcommon for other tools to use, would be good. I agree that should go into HEAD though. > Attached is a patch I have cooked for this purpose. I have studied the > file patterns opened by the backend, and we actually need to only skip > temporary files and folders as those can include legit relation file > names (like 1.1 for example). At the same time I have found about > parse_filename_for_nontemp_relation() which is a copycat of the > recently-added isRelFileName(), so we can easily shave some code by > reusing it in both cases. This also takes care of the issues for > temporary files, and it also fixes an extra bug coming from the original > implementation which checked the file patterns without looking at the > file type first, so the tool could miss some cascading paths. Using some existing code is certainly better than writing new code. This doesn't change my opinion of the bigger question though, which is to what extent we should be implicitly supporting extensions and whatever else putting files into the database and tablespace directories. Even if we go with this proposed change to look at the relation filename, I'd be happier with some kind of warning being thrown when we come across files we don't recognize in directories that aren't intended to have random files showing up. Thanks! Stephen
Attachment
Hello,
I have been using psql for quite a few days. And I have accidentally pressed the CTRL + \ keys that sends the signal QUIT+Coredump (CTRL+4 also sends the same signal).
I hope it's relevant to more people. (This has bothered me.)
this patch avoids the output of the CLI using ctrl + \ is the same as ctrl + c
Renato dos Santos
Attachment
Renato dos Santos <shazaum@gmail.com> writes: > I have been using psql for quite a few days. And I have accidentally pressed the CTRL + \ keys that sends the signal QUIT+Coredump(CTRL+4 also sends the same signal). > I hope it's relevant to more people. (This has bothered me.) > this patch avoids the output of the CLI using ctrl + \ is the same as ctrl + c I'm fairly confused about why this would be a good idea, for several reasons: * If you have a tendency to make that typo, why would you want a fix that only affects psql and not any of your other applications? (If you do want the latter, there are already ways to do it, eg you could remap SIGQUIT to some other key via stty, or disable core dumps via ulimit.) * If we put this in, what becomes of people who actually want a core dump, eg for debugging? * SIGQUIT is a fairly well-known way to get out of an application when all else fails. People who aren't familiar with psql's exit commands might find it pretty unfriendly of us to block this off. regards, tom lane
I agree with your arguments, and if instead we put an option to disable this before compiling or a set in the psql cli?
On Sun, Oct 21, 2018, 20:20 Tom Lane <tgl@sss.pgh.pa.us> wrote:
Renato dos Santos <shazaum@gmail.com> writes:
> I have been using psql for quite a few days. And I have accidentally pressed the CTRL + \ keys that sends the signal QUIT+Coredump (CTRL+4 also sends the same signal).
> I hope it's relevant to more people. (This has bothered me.)
> this patch avoids the output of the CLI using ctrl + \ is the same as ctrl + c
I'm fairly confused about why this would be a good idea, for several
reasons:
* If you have a tendency to make that typo, why would you want a fix that
only affects psql and not any of your other applications? (If you do
want the latter, there are already ways to do it, eg you could remap
SIGQUIT to some other key via stty, or disable core dumps via ulimit.)
* If we put this in, what becomes of people who actually want a core dump,
eg for debugging?
* SIGQUIT is a fairly well-known way to get out of an application when all
else fails. People who aren't familiar with psql's exit commands might
find it pretty unfriendly of us to block this off.
regards, tom lane
Re: More issues with pg_verify_checksums and checksum verificationin base backups
From
Michael Paquier
Date:
On Sun, Oct 21, 2018 at 11:03:30AM -0400, Stephen Frost wrote: > This doesn't change my opinion of the bigger question though, which is > to what extent we should be implicitly supporting extensions and > whatever else putting files into the database and tablespace > directories. Well, the whole point is that I have never seen either that it is forbidden for extensions to drop files into global/ and/or base/. I am pretty sure that I'd actually want to be able to do that myself by the way. If I had pluggable storage APIs and the possibility to write by myself a storage engine out-of-the-box, I would like to be able to rely on the default tablespace path and use other tablespace paths. > Even if we go with this proposed change to look at the relation > filename, I'd be happier with some kind of warning being thrown when we > come across files we don't recognize in directories that aren't intended > to have random files showing up. Yes, that could be something we could do, as an option I would guess as this does not match with what v10 does. I'll wait for more people to provide input on this thread before answering more, but if possible I think that we should focus on fixing v11 and HEAD first, then try to figure out what we'd want to do later on. -- Michael
Attachment
Re: More issues with pg_verify_checksums and checksum verificationin base backups
From
Stephen Frost
Date:
Greetings, * Michael Paquier (michael@paquier.xyz) wrote: > On Sun, Oct 21, 2018 at 11:03:30AM -0400, Stephen Frost wrote: > > This doesn't change my opinion of the bigger question though, which is > > to what extent we should be implicitly supporting extensions and > > whatever else putting files into the database and tablespace > > directories. > > Well, the whole point is that I have never seen either that it is > forbidden for extensions to drop files into global/ and/or base/. I am > pretty sure that I'd actually want to be able to do that myself by the > way. If I had pluggable storage APIs and the possibility to write by > myself a storage engine out-of-the-box, I would like to be able to rely > on the default tablespace path and use other tablespace paths. All of this pie-in-the-sky about what pluggable storage might have is just hand-waving, in my opinion, and not worth much more than that. I hope (and suspect..) that the actual pluggable storage that's being worked on doesn't do any of this "just drop a file somewhere" because there's a lot of downsides to it- and if it did, it wouldn't be much more than what we can do with an FDW, so why go through and add it? Considering the example given doesn't today do that (maybe it once did?) as you mentioned up-thread, it seems like maybe there was a realization that it's not a good idea even without this issue around pg_basebackup and pg_verify_checksums. > > Even if we go with this proposed change to look at the relation > > filename, I'd be happier with some kind of warning being thrown when we > > come across files we don't recognize in directories that aren't intended > > to have random files showing up. > > Yes, that could be something we could do, as an option I would guess as > this does not match with what v10 does. I'll wait for more people to > provide input on this thread before answering more, but if possible I > think that we should focus on fixing v11 and HEAD first, then try to > figure out what we'd want to do later on. pg_basebackup works the way it does in v10 because we've accepted letting users drop files into the root of PGDATA and even encouraged it to some extent. I don't think there was ever a serious intent that it would be used to back up an extension's self-created files on the filesystem in tablespaces, since there's no way for it to know how to do so in a way that would ensure that they're consistent or useful or sensible to backup online. What are you thinking the 'option' would look like? Everything I come up with seems awful confusing and not very sensible. Thanks! Stephen
Attachment
Re: More issues with pg_verify_checksums and checksum verificationin base backups
From
Michael Paquier
Date:
On Sun, Oct 21, 2018 at 08:56:32PM -0400, Stephen Frost wrote: > All of this pie-in-the-sky about what pluggable storage might have is > just hand-waving, in my opinion, and not worth much more than that. I > hope (and suspect..) that the actual pluggable storage that's being > worked on doesn't do any of this "just drop a file somewhere" because > there's a lot of downsides to it- and if it did, it wouldn't be much > more than what we can do with an FDW, so why go through and add it? Well, there is no point in enforcing a rule that something is forbidden if if was never implied and never documented (the rule here is to be able to drop custom files into global/, base/ or pg_tblspc.). Postgres is highly-customizable, I would prefer if features in core are designed so as we keep things extensible, the checksum verification for base backup on the contrary restricts things. So, do we have other opinions about this thread? -- Michael
Attachment
Re: More issues with pg_verify_checksums and checksum verificationin base backups
From
Kyotaro HORIGUCHI
Date:
Mmm. It took too long time than expected because I was repeatedly teased by git.. At Wed, 24 Oct 2018 14:31:37 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20181024053137.GL1658@paquier.xyz> > On Sun, Oct 21, 2018 at 08:56:32PM -0400, Stephen Frost wrote: > > All of this pie-in-the-sky about what pluggable storage might have is > > just hand-waving, in my opinion, and not worth much more than that. I > > hope (and suspect..) that the actual pluggable storage that's being > > worked on doesn't do any of this "just drop a file somewhere" because > > there's a lot of downsides to it- and if it did, it wouldn't be much > > more than what we can do with an FDW, so why go through and add it? > > Well, there is no point in enforcing a rule that something is forbidden > if if was never implied and never documented (the rule here is to be > able to drop custom files into global/, base/ or pg_tblspc.). Postgres > is highly-customizable, I would prefer if features in core are designed > so as we keep things extensible, the checksum verification for base > backup on the contrary restricts things. > > So, do we have other opinions about this thread? I believe of freedom for anyone of dropping any files into anywhere in $PGDATA if he thinks it sefe for the time being. Changes in core sometimes breaks some extensions and they used to be 'fixed' in the manner or claim for a replacement to core. This is the same for changes of (undocumented) APIs. I think things have been going around in this manner for years and I don't think core side is unable to define a definite border of what extensions are allowed to do since we are not knowledgeable of all extensions that will be created in future or that have created. So I'm +1 for the Michael's current patch as (I think) we can't make visible or large changes. That said, I agree with Stephen's concern on the point we could omit requried files in future, but on the other hand I don't want random files are simply rejected. So I propose to sort files into roughly three categories. One is files we know of but to skip. Second is files we know of and need checksum verification. The last is the files we just don't know of. In the attached PoC (checksum_)find_file_type categorizes a file into 6 (or 7) categories. ENTRY_TO_IGNORE: to be always ignored, for "." and "..". FILE_TO_SKIP: we know of and to skip. files in the black list. DIR_TO_SKIP: directories same to the above. DIR_TO_SCAN: we know any file to scan may be in it. HEAP_TO_SCAN: we know it has checksums in heap format. FILE_UNKNOWN: we don't know of. (STAT_FAILED: stat filed for the file) Among these types, what are related to the discussion above are FILE_TO_SKIP, HEAP_TO_SCAN and FILE_UNKNOWN. Others are for controlling search loop in pg_verify_checksums. Based on this categorization, pg_verify_checksum's result is shown as: > Checksum scan completed > Data checksum version: 1 > Files scanned: 1095 > Blocks scanned: 2976 > Bad checksums: 0 + Files skipped: 8 + Unknown files skipped: 1 + Skipped directories: 1 (Data checksum version is bonded with heap checksums..) If this sort of change is acceptable for us, I believe it comforms everyone here's wishes. If skipped unknown is not zero on a buildfarm animal, it is a sign of something is forgotten. The second attached (also PoC) is common'ize the file sorter. The function is moved to src/common/file_checksums.c and both pg_verify_checksums.c and basebackup.c uses it. With log_min_messages=debug1, you will see the following message for unkown files during basebackup. > DEBUG: checksum verification was skipped for unknown file: ./base/hoge This changed one behavior of the tool. -r now can take 'dbid/relid' addition to just 'relid'. I think we could have .verify_checksum.exlucde file so that extensions can declare files. regards. -- Kyotaro Horiguchi NTT Open Source Software Center From 187d84a0ffc94fb9d5c9c0f6708227cc8f47fa3c Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp> Date: Thu, 25 Oct 2018 16:48:47 +0900 Subject: [PATCH 1/2] Make pg_verify_checksums conscious of unknown files --- src/bin/pg_verify_checksums/pg_verify_checksums.c | 225 +++++++++++++++++----- 1 file changed, 179 insertions(+), 46 deletions(-) diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c index f0e09bea20..4b527913c1 100644 --- a/src/bin/pg_verify_checksums/pg_verify_checksums.c +++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c @@ -26,6 +26,9 @@ static int64 files = 0; static int64 blocks = 0; static int64 badblocks = 0; +static int64 skipped_known = 0; +static int64 skipped_unknown = 0; +static int64 skipped_dirs = 0; static ControlFileData *ControlFile; static char *only_relfilenode = NULL; @@ -33,6 +36,46 @@ static bool verbose = false; static const char *progname; +/* struct for checksum verification paremter*/ +typedef struct +{ + union + { + struct + { + BlockNumber segmentno; + } heap_param; + } params; +} checksum_scan_context; + +/* enum for return value of find_file_type */ +typedef enum +{ + ENTRY_TO_IGNORE, + DIR_TO_SCAN, + HEAP_TO_SCAN, + FILE_TO_SKIP, + DIR_TO_SKIP, + FILE_UNKNOWN +} checksum_file_types; + +/* black (explisit exclusion) list for checksum verification */ +static const char *const checksum_known_to_skip[] = { + "pg_control", + "pg_internal.init", + "pg_filenode.map", + "PG_VERSION", + "config_exec_params", + "config_exec_params.new", + "pgsql_tmp", /* this is a directory */ + NULL +}; + +static checksum_file_types find_file_type(const char *fn, + const char *relfilenode, + checksum_scan_context *ctx); + + static void usage(void) { @@ -116,11 +159,12 @@ isRelFileName(const char *fn) } static void -scan_file(const char *fn, BlockNumber segmentno) +scan_heap_file(const char *fn, checksum_scan_context *ctx) { PGAlignedBlock buf; PageHeader header = (PageHeader) buf.data; int f; + BlockNumber segmentno = ctx->params.heap_param.segmentno; BlockNumber blockno; f = open(fn, O_RDONLY | PG_BINARY, 0); @@ -187,63 +231,147 @@ scan_directory(const char *basedir, const char *subdir) while ((de = readdir(dir)) != NULL) { char fn[MAXPGPATH]; - struct stat st; - - if (!isRelFileName(de->d_name)) - continue; + checksum_scan_context ctx; snprintf(fn, sizeof(fn), "%s/%s", path, de->d_name); - if (lstat(fn, &st) < 0) + switch (find_file_type(fn, only_relfilenode, &ctx)) { - fprintf(stderr, _("%s: could not stat file \"%s\": %s\n"), - progname, fn, strerror(errno)); - exit(1); + case ENTRY_TO_IGNORE: + continue; /* ignore completely silently */ + case FILE_TO_SKIP: + if (verbose) + fprintf(stderr, "skipped file: %s/%s/%s\n", + basedir, subdir, de->d_name); + skipped_known++; + continue; + case DIR_TO_SKIP: + if (verbose) + fprintf(stderr, "skipped directory: %s/%s/%s\n", + basedir, subdir, de->d_name); + skipped_dirs++; + continue; + case FILE_UNKNOWN: + if (verbose) + fprintf(stderr, "skipped unknown file: %s/%s/%s\n", + basedir, subdir, de->d_name); + skipped_unknown++; + continue; + case HEAP_TO_SCAN: + scan_heap_file(fn, &ctx); + break; + case DIR_TO_SCAN: + scan_directory(path, de->d_name); + break; } - if (S_ISREG(st.st_mode)) + } + + closedir(dir); +} + +/* + * find_file_type: identify what to do on a file + * + * fn is a file path in full path or relative down from the current directory. + * relfilenode is filter string of file. Only specified files of node number or + * databaseid/filenodenum will be verified checksum. + * ctx is the parameter needed for following checksum scan. + */ +static checksum_file_types +find_file_type(const char *fn, const char *relfilenode, + checksum_scan_context *ctx) +{ + struct stat st; + char fnonly[MAXPGPATH]; + const char *fname; + char *forkpath; + char *segmentpath; + const char *const *p; + bool is_subdir = false; + + /* find file name the full path */ + fname = strrchr(fn, '/'); + if (fname) + fname++; + else + fname = fn; + + if (strcmp(fname, ".") == 0 || + strcmp(fname, "..") == 0) + return ENTRY_TO_IGNORE; + + if (lstat(fn, &st) < 0) + { + fprintf(stderr, _("%s: could not stat file \"%s\": %s\n"), + progname, fn, strerror(errno)); + exit(1); + } + +#ifndef WIN32 + if (S_ISDIR(st.st_mode) || S_ISLNK(st.st_mode)) +#else + if (S_ISDIR(st.st_mode) || pgwin32_is_junction(fn)) +#endif + is_subdir = true; + + /* exluded by blacklist */ + + for (p = checksum_known_to_skip ; *p ; p++) + { + if (strcmp(*p, fname) != 0) + continue; + + if (!is_subdir) + return FILE_TO_SKIP; + else + return DIR_TO_SKIP; + } + + if (is_subdir) + return DIR_TO_SCAN; + + /* we now know only of relfiles */ + if (isRelFileName(fname)) + { + /* copy the path so that we can scribble on it */ + strlcpy(fnonly, fn, sizeof(fnonly)); + ctx->params.heap_param.segmentno = 0; + segmentpath = strchr(fnonly, '.'); + + /* make sure that the dot is in the last segment in the path */ + if (segmentpath != NULL && strchr(segmentpath, '/') == NULL) { - char fnonly[MAXPGPATH]; - char *forkpath, - *segmentpath; - BlockNumber segmentno = 0; + *segmentpath++ = '\0'; + ctx->params.heap_param.segmentno = atoi(segmentpath); - /* - * Cut off at the segment boundary (".") to get the segment number - * in order to mix it into the checksum. Then also cut off at the - * fork boundary, to get the relfilenode the file belongs to for - * filtering. - */ - strlcpy(fnonly, de->d_name, sizeof(fnonly)); - segmentpath = strchr(fnonly, '.'); - if (segmentpath != NULL) - { - *segmentpath++ = '\0'; - segmentno = atoi(segmentpath); - if (segmentno == 0) - { - fprintf(stderr, _("%s: invalid segment number %d in file name \"%s\"\n"), - progname, segmentno, fn); - exit(1); - } - } + /* something's wrong, treat it as unknown file */ + if (ctx->params.heap_param.segmentno == 0) + return FILE_UNKNOWN; + } + + if (only_relfilenode) + { + char *p; - forkpath = strchr(fnonly, '_'); - if (forkpath != NULL) + /* find file suffix if any */ + forkpath = strrchr(fnonly, '_'); + + /* the underscore must be in the last segment in the path */ + if (forkpath != NULL && strchr(forkpath, '/') == NULL) *forkpath++ = '\0'; - if (only_relfilenode && strcmp(only_relfilenode, fnonly) != 0) + /* make a tail match with only_relfilenode */ + p = fnonly + strlen(fnonly) - strlen(relfilenode); + if (fnonly > p || /* cannot match*/ + (fnonly < p && *(p-1) != '/') || /* avoid false match */ + strcmp(relfilenode, p) != 0) /* Relfilenode not to be included */ - continue; - - scan_file(fn, segmentno); + return FILE_TO_SKIP; } -#ifndef WIN32 - else if (S_ISDIR(st.st_mode) || S_ISLNK(st.st_mode)) -#else - else if (S_ISDIR(st.st_mode) || pgwin32_is_junction(fn)) -#endif - scan_directory(path, de->d_name); + + return HEAP_TO_SCAN; } - closedir(dir); + + return FILE_UNKNOWN; } int @@ -359,6 +487,11 @@ main(int argc, char *argv[]) printf(_("Files scanned: %s\n"), psprintf(INT64_FORMAT, files)); printf(_("Blocks scanned: %s\n"), psprintf(INT64_FORMAT, blocks)); printf(_("Bad checksums: %s\n"), psprintf(INT64_FORMAT, badblocks)); + printf(_("Files skipped: %s\n"), psprintf(INT64_FORMAT, skipped_known)); + printf(_("Unknown files skipped: %s\n"), + psprintf(INT64_FORMAT, skipped_unknown)); + printf(_("Skipped directories: %s\n"), + psprintf(INT64_FORMAT, skipped_dirs)); if (badblocks > 0) return 1; -- 2.16.3 From 87baf73f56f688f4532ef78f6684934a47be3ba2 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horiguchi.kyotaro@lab.ntt.co.jp> Date: Thu, 25 Oct 2018 16:49:46 +0900 Subject: [PATCH 2/2] Common'ize file type checker for checksums pg_verify_checksums.c and basebackup.c has the same notion of 'files that have checksums'. This patch moves the core logic so that src/common and the both files share the logic. --- src/backend/replication/basebackup.c | 43 +++-- src/bin/pg_verify_checksums/Makefile | 3 +- src/bin/pg_verify_checksums/pg_verify_checksums.c | 220 +--------------------- src/common/Makefile | 3 +- src/common/file_checksums.c | 197 +++++++++++++++++++ src/include/common/file_checksums.h | 42 +++++ 6 files changed, 273 insertions(+), 235 deletions(-) create mode 100644 src/common/file_checksums.c create mode 100644 src/include/common/file_checksums.h diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index b20f6c379c..4ebc969f3d 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -19,6 +19,7 @@ #include "access/xlog_internal.h" /* for pg_start/stop_backup */ #include "catalog/pg_type.h" #include "common/file_perm.h" +#include "common/file_checksums.h" #include "lib/stringinfo.h" #include "libpq/libpq.h" #include "libpq/pqformat.h" @@ -187,18 +188,6 @@ static const char *excludeFiles[] = NULL }; -/* - * List of files excluded from checksum validation. - */ -static const char *const noChecksumFiles[] = { - "pg_control", - "pg_filenode.map", - "pg_internal.init", - "PG_VERSION", - NULL, -}; - - /* * Called when ERROR or FATAL happens in perform_base_backup() after * we have started the backup - make sure we end it! @@ -1321,22 +1310,36 @@ sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces, static bool is_checksummed_file(const char *fullpath, const char *filename) { - const char *const *f; + checksum_scan_context ctx; /* Check that the file is in a tablespace */ if (strncmp(fullpath, "./global/", 9) == 0 || strncmp(fullpath, "./base/", 7) == 0 || strncmp(fullpath, "/", 1) == 0) { - /* Compare file against noChecksumFiles skiplist */ - for (f = noChecksumFiles; *f; f++) - if (strcmp(*f, filename) == 0) - return false; + /* check if the file has checksums */ + switch (checksum_find_file_type(fullpath, NULL, &ctx)) + { + case HEAP_TO_SCAN: + return true; + case STAT_FAILED: + ereport(ERROR, + (errcode_for_file_access(), + errmsg("failed to stat \"%s\": %m", + fullpath))); - return true; + case ENTRY_TO_IGNORE: + case FILE_TO_SKIP: + case DIR_TO_SKIP: + case DIR_TO_SCAN: + break; + case FILE_UNKNOWN: + elog(DEBUG1, "checksum verification was skipped for unknown file: %s", fullpath); + break; + } } - else - return false; + + return false; } /***** diff --git a/src/bin/pg_verify_checksums/Makefile b/src/bin/pg_verify_checksums/Makefile index cfe4ab1b8b..3d0a9baf24 100644 --- a/src/bin/pg_verify_checksums/Makefile +++ b/src/bin/pg_verify_checksums/Makefile @@ -15,7 +15,8 @@ subdir = src/bin/pg_verify_checksums top_builddir = ../../.. include $(top_builddir)/src/Makefile.global -OBJS= pg_verify_checksums.o $(WIN32RES) +OBJS= pg_verify_checksums.o $(top_builddir)/src/common/file_checksums.o \ + $(WIN32RES) all: pg_verify_checksums diff --git a/src/bin/pg_verify_checksums/pg_verify_checksums.c b/src/bin/pg_verify_checksums/pg_verify_checksums.c index 4b527913c1..dc2143ea65 100644 --- a/src/bin/pg_verify_checksums/pg_verify_checksums.c +++ b/src/bin/pg_verify_checksums/pg_verify_checksums.c @@ -15,7 +15,7 @@ #include "catalog/pg_control.h" #include "common/controldata_utils.h" -#include "common/relpath.h" +#include "common/file_checksums.h" #include "getopt_long.h" #include "pg_getopt.h" #include "storage/bufpage.h" @@ -36,46 +36,6 @@ static bool verbose = false; static const char *progname; -/* struct for checksum verification paremter*/ -typedef struct -{ - union - { - struct - { - BlockNumber segmentno; - } heap_param; - } params; -} checksum_scan_context; - -/* enum for return value of find_file_type */ -typedef enum -{ - ENTRY_TO_IGNORE, - DIR_TO_SCAN, - HEAP_TO_SCAN, - FILE_TO_SKIP, - DIR_TO_SKIP, - FILE_UNKNOWN -} checksum_file_types; - -/* black (explisit exclusion) list for checksum verification */ -static const char *const checksum_known_to_skip[] = { - "pg_control", - "pg_internal.init", - "pg_filenode.map", - "PG_VERSION", - "config_exec_params", - "config_exec_params.new", - "pgsql_tmp", /* this is a directory */ - NULL -}; - -static checksum_file_types find_file_type(const char *fn, - const char *relfilenode, - checksum_scan_context *ctx); - - static void usage(void) { @@ -93,71 +53,6 @@ usage(void) printf(_("Report bugs to <pgsql-bugs@postgresql.org>.\n")); } -/* - * isRelFileName - * - * Check if the given file name is authorized for checksum verification. - */ -static bool -isRelFileName(const char *fn) -{ - int pos; - - /*---------- - * Only files including data checksums are authorized for verification. - * This is guessed based on the file name by reverse-engineering - * GetRelationPath() so make sure to update both code paths if any - * updates are done. The following file name formats are allowed: - * <digits> - * <digits>.<segment> - * <digits>_<forkname> - * <digits>_<forkname>.<segment> - * - * Note that temporary files, beginning with 't', are also skipped. - * - *---------- - */ - - /* A non-empty string of digits should follow */ - for (pos = 0; isdigit((unsigned char) fn[pos]); ++pos) - ; - /* leave if no digits */ - if (pos == 0) - return false; - /* good to go if only digits */ - if (fn[pos] == '\0') - return true; - - /* Authorized fork files can be scanned */ - if (fn[pos] == '_') - { - int forkchar = forkname_chars(&fn[pos + 1], NULL); - - if (forkchar <= 0) - return false; - - pos += forkchar + 1; - } - - /* Check for an optional segment number */ - if (fn[pos] == '.') - { - int segchar; - - for (segchar = 1; isdigit((unsigned char) fn[pos + segchar]); ++segchar) - ; - - if (segchar <= 1) - return false; - pos += segchar; - } - - /* Now this should be the end */ - if (fn[pos] != '\0') - return false; - return true; -} - static void scan_heap_file(const char *fn, checksum_scan_context *ctx) { @@ -234,7 +129,7 @@ scan_directory(const char *basedir, const char *subdir) checksum_scan_context ctx; snprintf(fn, sizeof(fn), "%s/%s", path, de->d_name); - switch (find_file_type(fn, only_relfilenode, &ctx)) + switch (checksum_find_file_type(fn, only_relfilenode, &ctx)) { case ENTRY_TO_IGNORE: continue; /* ignore completely silently */ @@ -262,118 +157,16 @@ scan_directory(const char *basedir, const char *subdir) case DIR_TO_SCAN: scan_directory(path, de->d_name); break; + case STAT_FAILED: + fprintf(stderr, _("%s: could not stat file \"%s\": %s\n"), + progname, fn, strerror(errno)); + exit(1); } } closedir(dir); } -/* - * find_file_type: identify what to do on a file - * - * fn is a file path in full path or relative down from the current directory. - * relfilenode is filter string of file. Only specified files of node number or - * databaseid/filenodenum will be verified checksum. - * ctx is the parameter needed for following checksum scan. - */ -static checksum_file_types -find_file_type(const char *fn, const char *relfilenode, - checksum_scan_context *ctx) -{ - struct stat st; - char fnonly[MAXPGPATH]; - const char *fname; - char *forkpath; - char *segmentpath; - const char *const *p; - bool is_subdir = false; - - /* find file name the full path */ - fname = strrchr(fn, '/'); - if (fname) - fname++; - else - fname = fn; - - if (strcmp(fname, ".") == 0 || - strcmp(fname, "..") == 0) - return ENTRY_TO_IGNORE; - - if (lstat(fn, &st) < 0) - { - fprintf(stderr, _("%s: could not stat file \"%s\": %s\n"), - progname, fn, strerror(errno)); - exit(1); - } - -#ifndef WIN32 - if (S_ISDIR(st.st_mode) || S_ISLNK(st.st_mode)) -#else - if (S_ISDIR(st.st_mode) || pgwin32_is_junction(fn)) -#endif - is_subdir = true; - - /* exluded by blacklist */ - - for (p = checksum_known_to_skip ; *p ; p++) - { - if (strcmp(*p, fname) != 0) - continue; - - if (!is_subdir) - return FILE_TO_SKIP; - else - return DIR_TO_SKIP; - } - - if (is_subdir) - return DIR_TO_SCAN; - - /* we now know only of relfiles */ - if (isRelFileName(fname)) - { - /* copy the path so that we can scribble on it */ - strlcpy(fnonly, fn, sizeof(fnonly)); - ctx->params.heap_param.segmentno = 0; - segmentpath = strchr(fnonly, '.'); - - /* make sure that the dot is in the last segment in the path */ - if (segmentpath != NULL && strchr(segmentpath, '/') == NULL) - { - *segmentpath++ = '\0'; - ctx->params.heap_param.segmentno = atoi(segmentpath); - - /* something's wrong, treat it as unknown file */ - if (ctx->params.heap_param.segmentno == 0) - return FILE_UNKNOWN; - } - - if (only_relfilenode) - { - char *p; - - /* find file suffix if any */ - forkpath = strrchr(fnonly, '_'); - - /* the underscore must be in the last segment in the path */ - if (forkpath != NULL && strchr(forkpath, '/') == NULL) - *forkpath++ = '\0'; - - /* make a tail match with only_relfilenode */ - p = fnonly + strlen(fnonly) - strlen(relfilenode); - if (fnonly > p || /* cannot match*/ - (fnonly < p && *(p-1) != '/') || /* avoid false match */ - strcmp(relfilenode, p) != 0) - /* Relfilenode not to be included */ - return FILE_TO_SKIP; - } - - return HEAP_TO_SCAN; - } - - return FILE_UNKNOWN; -} - int main(int argc, char *argv[]) { @@ -397,6 +190,7 @@ main(int argc, char *argv[]) if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0) { usage(); + exit(0); } if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0) diff --git a/src/common/Makefile b/src/common/Makefile index ec8139f014..54b7c9f440 100644 --- a/src/common/Makefile +++ b/src/common/Makefile @@ -44,7 +44,8 @@ override CPPFLAGS += -DVAL_LIBS="\"$(LIBS)\"" override CPPFLAGS := -DFRONTEND $(CPPFLAGS) LIBS += $(PTHREAD_LIBS) -OBJS_COMMON = base64.o config_info.o controldata_utils.o exec.o file_perm.o \ +OBJS_COMMON = base64.o config_info.o controldata_utils.o exec.o \ + file_checksums.o file_perm.o \ ip.o keywords.o link-canary.o md5.o pg_lzcompress.o \ pgfnames.o psprintf.o relpath.o \ rmtree.o saslprep.o scram-common.o string.o unicode_norm.o \ diff --git a/src/common/file_checksums.c b/src/common/file_checksums.c new file mode 100644 index 0000000000..f83bb52c1d --- /dev/null +++ b/src/common/file_checksums.c @@ -0,0 +1,197 @@ +/*------------------------------------------------------------------------- + * file_checksums.c + * checksumming files + * + * This implements Unicode normalization, per the documentation at + * http://www.unicode.org/reports/tr15/. + * + * Portions Copyright (c) 2018, PostgreSQL Global Development Group + * + * IDENTIFICATION + * src/common/file_checksums.c + * + *------------------------------------------------------------------------- + */ +#include <sys/stat.h> + +#include "c.h" +#include "common/file_checksums.h" +#include "common/relpath.h" + +/* black (explisit exclusion) list for checksum verification */ +static const char *const checksum_known_to_skip[] = { + "pg_control", + "pg_internal.init", + "pg_filenode.map", + "PG_VERSION", + "config_exec_params", + "config_exec_params.new", + "pgsql_tmp", /* directory */ + NULL +}; + +/* + * isRelFileName + * + * Check if the given file name is authorized for checksum verification. + */ +static bool +isRelFileName(const char *fn) +{ + int pos; + + /*---------- + * Only files including data checksums are authorized for verification. + * This is guessed based on the file name by reverse-engineering + * GetRelationPath() so make sure to update both code paths if any + * updates are done. The following file name formats are allowed: + * <digits> + * <digits>.<segment> + * <digits>_<forkname> + * <digits>_<forkname>.<segment> + * + * Note that temporary files, beginning with 't', are also skipped. + * + *---------- + */ + + /* A non-empty string of digits should follow */ + for (pos = 0; isdigit((unsigned char) fn[pos]); ++pos) + ; + /* leave if no digits */ + if (pos == 0) + return false; + /* good to go if only digits */ + if (fn[pos] == '\0') + return true; + + /* Authorized fork files can be scanned */ + if (fn[pos] == '_') + { + int forkchar = forkname_chars(&fn[pos + 1], NULL); + + if (forkchar <= 0) + return false; + + pos += forkchar + 1; + } + + /* Check for an optional segment number */ + if (fn[pos] == '.') + { + int segchar; + + for (segchar = 1; isdigit((unsigned char) fn[pos + segchar]); ++segchar) + ; + + if (segchar <= 1) + return false; + pos += segchar; + } + + /* Now this should be the end */ + if (fn[pos] != '\0') + return false; + return true; +} + +/* + * checksum_find_file_type: identify a file from the viewpoint of checksum + * + * fn is file name with full path to check + * relfilenode is relfilenode in string to exclude files other than that. + * ctx is the context to scan checksum, which contains parameters for scanners. + */ +checksum_file_types +checksum_find_file_type(const char *fn, + const char *relfilenode, checksum_scan_context *ctx) +{ + struct stat st; + char fnonly[MAXPGPATH]; + char *fname; + char *forkpath; + char *segmentpath; + const char *const *p; + bool is_subdir = false; + + fname = strrchr(fn, '/'); + + if (fname == NULL) + return ENTRY_TO_IGNORE; + + fname++; + + if (strcmp(fname, ".") == 0 || + strcmp(fname, "..") == 0) + return ENTRY_TO_IGNORE; + + if (lstat(fn, &st) < 0) + return STAT_FAILED; + +#ifndef WIN32 + if (S_ISDIR(st.st_mode) || S_ISLNK(st.st_mode)) +#else + if (S_ISDIR(st.st_mode) || pgwin32_is_junction(fn)) +#endif + is_subdir = true; + + /* exluded by blacklist */ + + for (p = checksum_known_to_skip ; *p ; p++) + { + if (strcmp(*p, fname) != 0) + continue; + + if (is_subdir) + return DIR_TO_SKIP; + + return FILE_TO_SKIP; + } + + if (is_subdir) + return DIR_TO_SCAN; + + /* we now know only of relfiles */ + if (isRelFileName(fname)) + { + /* copy the path so that we can scribble on it */ + strlcpy(fnonly, fn, sizeof(fnonly)); + ctx->params.heap_param.segmentno = 0; + segmentpath = strchr(fnonly, '.'); + + /* make sure that the dot is in the last segment in the path */ + if (segmentpath != NULL && strchr(segmentpath, '/') == NULL) + { + *segmentpath++ = '\0'; + ctx->params.heap_param.segmentno = atoi(segmentpath); + + /* something's wrong, treat it as unknown file */ + if (ctx->params.heap_param.segmentno == 0) + return FILE_UNKNOWN; + } + + if (relfilenode) + { + char *p; + + /* find file suffix if any */ + forkpath = strrchr(fnonly, '_'); + + /* the underscore must be in the last segment in the path */ + if (forkpath != NULL && strchr(forkpath, '/') == NULL) + *forkpath++ = '\0'; + + /* make a tail match with only_relfilenode */ + p = fnonly + strlen(fnonly) - strlen(relfilenode); + if (fnonly > p || /* cannot match*/ + (fnonly < p && *(p-1) != '/') || /* avoid false match */ + strcmp(relfilenode, p) != 0) + /* Relfilenode not to be included */ + return FILE_TO_SKIP; + } + + return HEAP_TO_SCAN; + } + + return FILE_UNKNOWN; +} diff --git a/src/include/common/file_checksums.h b/src/include/common/file_checksums.h new file mode 100644 index 0000000000..3ead25c97f --- /dev/null +++ b/src/include/common/file_checksums.h @@ -0,0 +1,42 @@ +/* + * file_checksums.h + * checksumming files + * + * Copyright (c) 2018, PostgreSQL Global Development Group + * + * src/include/common/file_checksums.h + */ +#ifndef FILE_CHECKSUMS_H +#define FILE_CHECKSUMS_H + +#include "storage/block.h" + +/* struct for checksum verification paremter*/ +typedef struct +{ + union + { + struct + { + BlockNumber segmentno; + } heap_param; + } params; +} checksum_scan_context; + +/* enum for return value of find_file_type */ +typedef enum +{ + ENTRY_TO_IGNORE, + DIR_TO_SCAN, + HEAP_TO_SCAN, + FILE_TO_SKIP, + DIR_TO_SKIP, + FILE_UNKNOWN, + STAT_FAILED +} checksum_file_types; + +checksum_file_types checksum_find_file_type(const char *fn, + const char *relfilenode, + checksum_scan_context *ctx); + +#endif /* FILE_CHECKSUMS_H */ -- 2.16.3
Re: More issues with pg_verify_checksums and checksum verificationin base backups
From
Stephen Frost
Date:
Greetings, * Kyotaro HORIGUCHI (horiguchi.kyotaro@lab.ntt.co.jp) wrote: > At Wed, 24 Oct 2018 14:31:37 +0900, Michael Paquier <michael@paquier.xyz> wrote in <20181024053137.GL1658@paquier.xyz> > > On Sun, Oct 21, 2018 at 08:56:32PM -0400, Stephen Frost wrote: > > > All of this pie-in-the-sky about what pluggable storage might have is > > > just hand-waving, in my opinion, and not worth much more than that. I > > > hope (and suspect..) that the actual pluggable storage that's being > > > worked on doesn't do any of this "just drop a file somewhere" because > > > there's a lot of downsides to it- and if it did, it wouldn't be much > > > more than what we can do with an FDW, so why go through and add it? > > > > Well, there is no point in enforcing a rule that something is forbidden > > if if was never implied and never documented (the rule here is to be > > able to drop custom files into global/, base/ or pg_tblspc.). Postgres > > is highly-customizable, I would prefer if features in core are designed > > so as we keep things extensible, the checksum verification for base > > backup on the contrary restricts things. > > > > So, do we have other opinions about this thread? > > I believe of freedom for anyone of dropping any files into > anywhere in $PGDATA if he thinks it sefe for the time > being. Changes in core sometimes breaks some extensions and they > used to be 'fixed' in the manner or claim for a replacement to > core. This is the same for changes of (undocumented) APIs. I > think things have been going around in this manner for years and > I don't think core side is unable to define a definite border of > what extensions are allowed to do since we are not knowledgeable > of all extensions that will be created in future or that have > created. > > So I'm +1 for the Michael's current patch as (I think) we can't > make visible or large changes. > > > That said, I agree with Stephen's concern on the point we could > omit requried files in future, but on the other hand I don't want > random files are simply rejected. They aren't rejected- there's a warning thrown about them. > So I propose to sort files into roughly three categories. One is > files we know of but to skip. Second is files we know of and need > checksum verification. The last is the files we just don't know of. That last set really should be empty though. > In the attached PoC (checksum_)find_file_type categorizes a file > into 6 (or 7) categories. I'm certainly not thrilled with the idea of adding a bunch more code to v11 for this. > ENTRY_TO_IGNORE: to be always ignored, for "." and "..". > FILE_TO_SKIP: we know of and to skip. files in the black list. > DIR_TO_SKIP: directories same to the above. > DIR_TO_SCAN: we know any file to scan may be in it. > HEAP_TO_SCAN: we know it has checksums in heap format. > FILE_UNKNOWN: we don't know of. > (STAT_FAILED: stat filed for the file) > > Among these types, what are related to the discussion above are > FILE_TO_SKIP, HEAP_TO_SCAN and FILE_UNKNOWN. Others are for > controlling search loop in pg_verify_checksums. > > Based on this categorization, pg_verify_checksum's result is shown as: > > > Checksum scan completed > > Data checksum version: 1 > > Files scanned: 1095 > > Blocks scanned: 2976 > > Bad checksums: 0 > + Files skipped: 8 > + Unknown files skipped: 1 > + Skipped directories: 1 I'd rather those files be shown as a warning than just listed as 'Unknown'. Previously, throwing a warning is exactly what we did and seems like the most sensible thing to do when we come across random files which have shown up in the data directory that we don't recognize or understand. > (Data checksum version is bonded with heap checksums..) > > If this sort of change is acceptable for us, I believe it > comforms everyone here's wishes. If skipped unknown is not zero > on a buildfarm animal, it is a sign of something is forgotten. Having a buildfarm checking would certainly be good, but I'm really not sure that the added complexity here makes sense. If we're going to go down this road, it also seems like it'd make sense to complain about things that we don't recognize in other directories too. > The second attached (also PoC) is common'ize the file sorter. The > function is moved to src/common/file_checksums.c and both > pg_verify_checksums.c and basebackup.c uses it. With > log_min_messages=debug1, you will see the following message for > unkown files during basebackup. I'm definitely on-board with figuring out a way to provide more inspection of the data directory through src/common functions that can be leveraged by the frontend and backend tools, as well as other tools. > > DEBUG: checksum verification was skipped for unknown file: ./base/hoge > > This changed one behavior of the tool. -r now can take > 'dbid/relid' addition to just 'relid'. I think we could have > .verify_checksum.exlucde file so that extensions can declare > files. Generally speaking, if we really want to support this, then we would need to have some amount of documented "this is what extensions are allowed to do, and what they aren't" and I'd probably say we would want to have a directory for extensions to drop their files into, and then have that directory be split up by extension name (and maybe version..?) and then we could just skip the entire extension directory. Having flags or parameters around for users to provide a list of files to ignore because some extension created those files seems like a really poor approach. I didn't reallly look at the patches you sent much, just to be clear. I did discuss this with some folks at PGConf EU and encouraged them to comment as this thread too. Hopefully some will. Thanks! Stephen
Attachment
Re: More issues with pg_verify_checksums and checksum verification inbase backups
From
David Steele
Date:
On 10/30/18 11:59 AM, Stephen Frost wrote: > > * Kyotaro HORIGUCHI (horiguchi.kyotaro@lab.ntt.co.jp) wrote: >> >> So I'm +1 for the Michael's current patch as (I think) we can't >> make visible or large changes. >> >> That said, I agree with Stephen's concern on the point we could >> omit requried files in future, but on the other hand I don't want >> random files are simply rejected. > > They aren't rejected- there's a warning thrown about them. pgBackRest has been using a whitelist/blacklist method for identifying checksummable files for almost 2 years we haven't seen any issues. The few times a "random" file appeared in the logs with checksum warnings it was later identified as having been mistakenly copied into $PGDATA. The backup still completed successfully in these cases. So to be clear, we whitelist the global, base, and pg_tblspc dirs and blacklist PG_VERSION, pg_filenode.map, pg_internal.init, and pg_control (just for global) when deciding which files to checksum. Recently we added logic to exclude unlogged and temporary relations as well, though that's not required. For PG11 I would recommend just adding the param file generated by exec backend to the black list for both pg_basebackup and pg_verifychecksums, then create a common facility for blacklisting for PG12. I'm not very excited about the idea of encouraging extensions to drop files in the postgres relation directories (base, global, pg_tblspc). If we don't say we support it then in my mind that means we don't. There are lots of ways extension authors could make naming mistakes that would lead to their files being cleaned up by Postgres at startup or included in a DROP DATABASE. I am OK with allowing an extension directory for each tablespace/db dir where extensions can safe drop files for PG12, if we decide that's something worth doing. Regards, -- -David david@pgmasters.net
Attachment
Re: More issues with pg_verify_checksums and checksum verification inbase backups
From
Amit Kapila
Date:
On Sun, Oct 21, 2018 at 7:12 PM Michael Paquier <michael@paquier.xyz> wrote: > > Hi all, > > This is a follow-up of the following thread: > https://www.postgresql.org/message-id/20181012010411.re53cwcistcpip5j@alap3.anarazel.de > > In a nutshell, b34e84f1 has added TAP tests for pg_verify_checksums, and > the buildfarm has immediately complained about files generated by > EXEC_BACKEND missing, causing pg_verify_checksums to fail for such > builds. An extra issue has been noticed by Andres in the tool: it > misses to check for temporary files, hence files like > base/pgsql_tmp/pgsql_tmp$PID.NN.sharedfileset/1.1 can also cause the > tool to fail. After a crash, those files would not be removed, and > stopping the instance would still let them around. > > pg_verify_checksums used first a blacklist to decide if files have > checksums or not. So with this approach all files should have checksums > except files like pg_control, pg_filenode.map, PG_VERSION, etc. > > d55241a has added a first fix to silence the buildfarm for EXEC_BACKEND > builds. After discussion, Andres has pointed out that some extensions > may want to drop files within global/ or base/ as part of their logic. > cstore_fdw was mentioned but if you look at their README that's not the > case. However, I think that Andres argument is pretty good as with > pluggable storage we should allow extensions to put custom files for > different tablespaces. So this commit has changed the logic of > pg_verify_checksums to use a whitelist, which assumes that only normal > relation files have checksums: > <digits> > <digits>.<segment> > <digits>_<forkname> > <digits>_<forkname>.<segment > > After more discussion on the thread mentioned above, Stephen has pointed > out that base backups use the same blacklist logic when verifying > checksums. This has the same problem with EXEC_BACKEND-specific files, > resulting in spurious warnings (that's an example!) which are confusing > for the user: > $ pg_basebackup -D popo > WARNING: cannot verify checksum in file "./global/config_exec_params", > block 0: read buffer size 5 and page size 8192 differ > > Folks on this thread agreed that both pg_verify_checksums and checksum > verification for base backups should use the same logic. It seems to me > that using a whitelist-based approach for both is easier to maintain as > the patterns of files supporting checksums is more limited than files > which do not support checksums. So this way we allow extensions > bundling custom files to still work with pg_verify_checksums and > checksum verification in base backups. > This sounds like a good argument for having a whitelist approach, but is it really a big problem if a user gets warning for files that the utility is not able to verify checksums for? I think in some sense this message can be useful to the user as it can allow him to know which files are not verified by the utility for some form of corruption. I guess one can say that users might not be interested in this information in which case such a check could be optional as you seem to be suggesting in the following paragraph. > Something else which has been discussed on this thread is that we should > have a dedicated API to decide if a file has checksums or not, and if it > should be skipped in both cases. That's work only for HEAD though, so > we need to do something for HEAD and v11, which is simple. > +1. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: More issues with pg_verify_checksums and checksum verificationin base backups
From
Michael Paquier
Date:
On Thu, Nov 01, 2018 at 04:44:40PM +0530, Amit Kapila wrote: > This sounds like a good argument for having a whitelist approach, but > is it really a big problem if a user gets warning for files that the > utility is not able to verify checksums for? I think in some sense > this message can be useful to the user as it can allow him to know > which files are not verified by the utility for some form of > corruption. I guess one can say that users might not be interested in > this information in which case such a check could be optional as you > seem to be suggesting in the following paragraph. The replication protocol supports NOVERIFY_CHECKSUMS to avoid the warnings so they enabled by default, and can be disabled at will. pg_basebackup supports the same interface. -- Michael
Attachment
Re: More issues with pg_verify_checksums and checksum verificationin base backups
From
Stephen Frost
Date:
Greetings, * David Steele (david@pgmasters.net) wrote: > On 10/30/18 11:59 AM, Stephen Frost wrote: > > * Kyotaro HORIGUCHI (horiguchi.kyotaro@lab.ntt.co.jp) wrote: > >> So I'm +1 for the Michael's current patch as (I think) we can't > >> make visible or large changes. > >> > >> That said, I agree with Stephen's concern on the point we could > >> omit requried files in future, but on the other hand I don't want > >> random files are simply rejected. > > > > They aren't rejected- there's a warning thrown about them. > > pgBackRest has been using a whitelist/blacklist method for identifying > checksummable files for almost 2 years we haven't seen any issues. The > few times a "random" file appeared in the logs with checksum warnings it > was later identified as having been mistakenly copied into $PGDATA. The > backup still completed successfully in these cases. > > So to be clear, we whitelist the global, base, and pg_tblspc dirs and > blacklist PG_VERSION, pg_filenode.map, pg_internal.init, and pg_control > (just for global) when deciding which files to checksum. Recently we > added logic to exclude unlogged and temporary relations as well, though > that's not required. > > For PG11 I would recommend just adding the param file generated by exec > backend to the black list for both pg_basebackup and pg_verifychecksums, > then create a common facility for blacklisting for PG12. Michael, this obviously didn't happen and instead we ended up releasing 11.1 with your changes, but I don't feel like this issue is closed and I'm a bit disappointed that there hasn't been any further responses or discussions on this. I discussed this at length with David and Amit, both of whom have now also commented on this issue, at PGConf.Eu, but still there hasn't been a response from you. Is your thought here that your lack of response should be taken as meaning I should simply revert your commit and then commit your earlier patch to just add the param file? While we typically take silence as acceptance, it's a bit different when it comes to reverting someone else's change, at least to my mind. I'm happy to go ahead and make those changes if there's no disagreement regarding it. Also, just to be clear, I don't intend this with any animosity and I certainly understand if it just has fallen through the cracks or been lost in the shuffle but I really don't like the implication put forward that we're happy to not throw any kind of warning or notice about random files showing up in the PG data directories. Thanks! Stephen
Attachment
Re: More issues with pg_verify_checksums and checksum verificationin base backups
From
Michael Paquier
Date:
On Mon, Nov 19, 2018 at 08:45:29PM -0500, Stephen Frost wrote: > Michael, this obviously didn't happen and instead we ended up releasing > 11.1 with your changes, but I don't feel like this issue is closed and > I'm a bit disappointed that there hasn't been any further responses or > discussions on this. This issue is not closed. > I discussed this at length with David and Amit, both of whom have now > also commented on this issue, at PGConf.Eu, but still there hasn't been > a response from you. Is your thought here that your lack of response > should be taken as meaning I should simply revert your commit and then > commit your earlier patch to just add the param file? While we > typically take silence as acceptance, it's a bit different when it comes > to reverting someone else's change, at least to my mind. > > I'm happy to go ahead and make those changes if there's no disagreement > regarding it. Well, I did not have any express feeling to offer a response as it seems to me that the topic of how to handle things has not moved a iota to an agreement. From what I can see, there are still two school of thoughts: - Use a white list. Individuals which have expressed an interest on this approach, based on the status of this thread are myself, Kyotaro Horiguchi. And at least a third person which I think would prefer the white-list approach is Andres, but he did not reply directly to this thread. - Use a black list, which a least a set of three people have expressed an opinion about on this thread: Amit Kapila, David Steele and yourself. > Also, just to be clear, I don't intend this with any animosity and I > certainly understand if it just has fallen through the cracks or been > lost in the shuffle but I really don't like the implication put forward > that we're happy to not throw any kind of warning or notice about random > files showing up in the PG data directories. Don't worry about that! Thanks for trying to make this thread moving on. I am still a fan of the whitelist approach as there is no actual point in restricting what people can do with Postgres in terms of extensibility (relying on tablespace paths for storage plugin looks like an important thing to me, and we would close doors with a black list, causing warnings to be generated for basically everything which is not from heap). What worries me the most is actually the fact that we have not heard from the original authors of the pg_verify_checksums what they think on the matter and how we ought to do things, because their opinion is important. If there is a clear agreement for the direction to take, I am of course perfectly fine if the conclusion is the opposite of what I think, but a 3vs2, (3vs3 if I count Andres) is kind of hard to conclude that we have an actual agreement. -- Michael
Attachment
Re: More issues with pg_verify_checksums and checksum verificationin base backups
From
Stephen Frost
Date:
Greetings Michael, * Michael Paquier (michael@paquier.xyz) wrote: > I am still a fan of the whitelist approach as there is no actual point > in restricting what people can do with Postgres in terms of > extensibility (relying on tablespace paths for storage plugin looks like > an important thing to me, and we would close doors with a black list, > causing warnings to be generated for basically everything which is not > from heap). What worries me the most is actually the fact that we have > not heard from the original authors of the pg_verify_checksums what they > think on the matter and how we ought to do things, because their > opinion is important. If there is a clear agreement for the direction > to take, I am of course perfectly fine if the conclusion is the opposite > of what I think, but a 3vs2, (3vs3 if I count Andres) is kind of hard to > conclude that we have an actual agreement. I can understand that we want PostgreSQL to be extensible, but as David pointed out up-thread, what we've actually seen in the wild are cases where random files have mistakenly ended up in the data directory and those have been cases where it's been quite good to have the warnings thrown to indicate that there's been some mistake. I don't think we do our users any service by simply ignoring random files showing up in the data directories. As has been mentioned elsewhere, there's really a 'right' way to do things and allowing PG to be 'extensible' by simply ignoring random files showing up isn't that- if we want PG to be extensible in this way then we need to provide a mechanism for that to happen. While I'd also like to hear from the authors of pg_verify_checksums as to their thoughts, I'm guessing that they're mostly watching from the sidelines while we discuss and not wanting to end up picking the wrong side. When it comes to what we typically do, at least imv, when there's an impass or a disagreement of approaches is to actually not move forward with one side of it over what was in place previously. David, in particular, was certainly involved in the verify checksums work and in the changes for pg_basebackup, having had quite a bit of experience implementing that same mechanism in pgbackrest quite a while before it got into PG proper. That real-world experience with exactly this feature is really quite relevant, imv. Thanks! Stephen
Attachment
Re: More issues with pg_verify_checksums and checksum verificationin base backups
From
Andres Freund
Date:
Hi, On 2018-11-19 21:18:43 -0500, Stephen Frost wrote: > As has been mentioned elsewhere, there's really a 'right' way to do > things and allowing PG to be 'extensible' by simply ignoring random > files showing up isn't that- if we want PG to be extensible in this way > then we need to provide a mechanism for that to happen. I still don't buy this argument. I'm giving up here, as I just don't have enough energy to keep up with this discussion. FWIW, I think it's bad, that we don't error out on checksum failures in basebackups by default. And that's only really feasible with a whitelisting approach. Greetings, Andres Freund
Re: More issues with pg_verify_checksums and checksum verificationin base backups
From
Stephen Frost
Date:
Greetings, * Andres Freund (andres@anarazel.de) wrote: > On 2018-11-19 21:18:43 -0500, Stephen Frost wrote: > > As has been mentioned elsewhere, there's really a 'right' way to do > > things and allowing PG to be 'extensible' by simply ignoring random > > files showing up isn't that- if we want PG to be extensible in this way > > then we need to provide a mechanism for that to happen. > > I still don't buy this argument. I'm giving up here, as I just don't > have enough energy to keep up with this discussion. > > FWIW, I think it's bad, that we don't error out on checksum failures in > basebackups by default. And that's only really feasible with a > whitelisting approach. No, we could error out on checksum failures in either approach, but we explicitly don't with good reason: if you're doing a backup, you probably want to actually capture the current data. This is something we've thought quite a bit about. In fact, as I recall, the original pg_basebackup code actually *did* error out, even with the blacklist approach, and we made a solid argument which was ultimately agreed to by those involved at the time that erroring out half-way through was a bad idea. What we do instead is exit with a non-zero exit code to make it clear that there was an issue, to allow the user to capture that and raise alarms, but to still have all of the data which we were able to capture in the hopes that the backup is at least salvagable. In addition, at least in pgbackrest, we don't consider such a backup to be pristine and therefore we don't expire out the prior backups- we don't do any backup expiration in pg_basebackup, so it's up to the user to make sure that if pg_basebackup exits with a non-zero exit code that they capture and handle that and *don't* blow away a previously good backup. The very last thing *any* backup tool should do is give up half-way through and throw a nasty error, leaving you with the knowledge that your system is hosed *and* no backup of what was there exist and making it extremely likely that whatever corruption exists is being propagated further. Let's try to not conflate these two issues though, they're quite independent. Thanks! Stephen
Attachment
Re: Tom Lane 2018-10-22 <80020.1540164045@sss.pgh.pa.us> > * SIGQUIT is a fairly well-known way to get out of an application when all > else fails. People who aren't familiar with psql's exit commands might > find it pretty unfriendly of us to block this off. Fwiw, pgadmin4 is one of those. ^C doesn't do anything, but ^\ works. Please don't "fix" that problem. Christoph
On Sun, Oct 21, 2018 at 7:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > * SIGQUIT is a fairly well-known way to get out of an application when all > else fails. People who aren't familiar with psql's exit commands might > find it pretty unfriendly of us to block this off. Also, sometimes psql gets into a state where it doesn't respond to ^C, but ^\ still kills it. I don't know why that happens, but I've seen it repeatedly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Also, sometimes psql gets into a state where it doesn't respond to ^C, > but ^\ still kills it. I don't know why that happens, but I've seen > it repeatedly. Surely a bug ... please poke into it next time you see it. regards, tom lane
Re: More issues with pg_verify_checksums and checksum verificationin base backups
From
Michael Paquier
Date:
On Mon, Nov 19, 2018 at 10:17:19PM -0500, Stephen Frost wrote: > Let's try to not conflate these two issues though, they're quite > independent. This is a poke about a recent issue raised by Michael Banck here: https://www.postgresql.org/message-id/f1543332405.17247.9.camel@credativ.de And for which I am suggesting a minimal fix, which is extracted from a patch at the most-top of this thread: https://www.postgresql.org/message-id/20181127213625.GM1716@paquier.xyz If there are no objections, I would like to fix the actual issue. Then I'll rebase a patch on top of what has been fixed for this thread for what I proposed in the base backup code. -- Michael
Attachment
Re: More issues with pg_verify_checksums and checksum verificationin base backups
From
Stephen Frost
Date:
Greetings, * Michael Paquier (michael@paquier.xyz) wrote: > On Mon, Nov 19, 2018 at 10:17:19PM -0500, Stephen Frost wrote: > > Let's try to not conflate these two issues though, they're quite > > independent. > > This is a poke about a recent issue raised by Michael Banck here: > https://www.postgresql.org/message-id/f1543332405.17247.9.camel@credativ.de > And for which I am suggesting a minimal fix, which is extracted from a > patch at the most-top of this thread: > https://www.postgresql.org/message-id/20181127213625.GM1716@paquier.xyz > > If there are no objections, I would like to fix the actual issue. Then > I'll rebase a patch on top of what has been fixed for this thread for > what I proposed in the base backup code. So the as-committed "whitelist" approach did, in fact, end up excluding huge portions of the database from being checked, that is, everything inside of tablespaces. This doesn't exactly change my opinion regarding this discussion and I'd rather we revert the "whitelist" patch and use the very minimal patch from here: https://www.postgresql.org/message-id/20181012013918.GA30064%40paquier.xyz Thanks! Stephen
Attachment
Re: More issues with pg_verify_checksums and checksum verificationin base backups
From
Michael Paquier
Date:
On Tue, Nov 27, 2018 at 05:45:41PM -0500, Stephen Frost wrote: > This doesn't exactly change my opinion regarding this discussion and I'd > rather we revert the "whitelist" patch and use the very minimal patch > from here: > > https://www.postgresql.org/message-id/20181012013918.GA30064%40paquier.xyz Believe me or not, but we have spent so much energy on this stuff that I am ready to give up on the whitelist patch and focus on other business. This doesn't change a couple of things though, so it is not *just* a simple revert with the patch you mention applied: - Adding a test for tablespaces makes sense. - skipfile should be called after making sure that we work on a file. - temporary files and temporary paths should be ignored. - it is necessary to exclude EXEC_BACKEND files. -- Michael
Attachment
Re: More issues with pg_verify_checksums and checksum verificationin base backups
From
Stephen Frost
Date:
Greetings, * Michael Paquier (michael@paquier.xyz) wrote: > On Tue, Nov 27, 2018 at 05:45:41PM -0500, Stephen Frost wrote: > > This doesn't exactly change my opinion regarding this discussion and I'd > > rather we revert the "whitelist" patch and use the very minimal patch > > from here: > > > > https://www.postgresql.org/message-id/20181012013918.GA30064%40paquier.xyz > > Believe me or not, but we have spent so much energy on this stuff that I > am ready to give up on the whitelist patch and focus on other business. I would have hoped that you'd see why I was concerned from the start about this now that we have a released version of pg_verify_checksums in 11.1 that is paper-bag-worthy. > This doesn't change a couple of things though, so it is not *just* a > simple revert with the patch you mention applied: > - Adding a test for tablespaces makes sense. I agree with this, of course. > - skipfile should be called after making sure that we work on a file. It's not clear to me what this is referring to, exactly...? Can you clarify? > - temporary files and temporary paths should be ignored. There's example code for how to do this in basebackup.c > - it is necessary to exclude EXEC_BACKEND files. Agreed, they should be added to the exclude list. Thanks! Stephen
Attachment
Re: More issues with pg_verify_checksums and checksum verificationin base backups
From
Michael Paquier
Date:
On Tue, Nov 27, 2018 at 06:27:57PM -0500, Stephen Frost wrote: > * Michael Paquier (michael@paquier.xyz) wrote: >> Believe me or not, but we have spent so much energy on this stuff that I >> am ready to give up on the whitelist patch and focus on other business. > > I would have hoped that you'd see why I was concerned from the start > about this now that we have a released version of pg_verify_checksums > in 11.1 that is paper-bag-worthy. That's unrelated. Let's be clear, I still don't like the fact that we don't use a whitelist approach, per the arguments raised upthread. The tool also clearly lacked testing and coverage from the beginning and has never been able to work on Windows, which was a very bad decision. Still we need to live with that and I take my share of the blame, willing to make this stuff work better for all. >> - skipfile should be called after making sure that we work on a file. > > It's not clear to me what this is referring to, exactly...? Can you > clarify? Please see 0002 attached, which moves the call to skipfile() where I think it should go. >> - it is necessary to exclude EXEC_BACKEND files. > > Agreed, they should be added to the exclude list. Base backups are impacted as well as this causes spurious warnings. Attached are two patches to fix all the mess: - 0001 is a revert of the whitelist, minus the set of regression tests checking after corrupted files and empty files. - 0002 is a fix for all the issues reported on this thread, with tests added (including the tablespace test from Michael Banck): -- Base backups gain EXEC_BACKEND files in their warning filters. -- pg_verify_checksums gains the same files. -- temporary files are filtered out. -- pg_verify_checksums performs filtering checks only on regular files, not on paths. 0001 and 0002 need to be merged as 0001 would cause the buildfarm to turn red on Windows if applied alone. Can you know see my point? -- Michael
Attachment
Re: More issues with pg_verify_checksums and checksum verificationin base backups
From
Stephen Frost
Date:
Greetings, * Michael Paquier (michael@paquier.xyz) wrote: > On Tue, Nov 27, 2018 at 06:27:57PM -0500, Stephen Frost wrote: > > * Michael Paquier (michael@paquier.xyz) wrote: > >> Believe me or not, but we have spent so much energy on this stuff that I > >> am ready to give up on the whitelist patch and focus on other business. > > > > I would have hoped that you'd see why I was concerned from the start > > about this now that we have a released version of pg_verify_checksums > > in 11.1 that is paper-bag-worthy. > > That's unrelated. Let's be clear, I still don't like the fact that we > don't use a whitelist approach, per the arguments raised upthread. The > tool also clearly lacked testing and coverage from the beginning and has > never been able to work on Windows, which was a very bad decision. > Still we need to live with that and I take my share of the blame, > willing to make this stuff work better for all. I don't agree that it's unrelated and I'm disappointed that you feel it is, but I'm not going to belabor the point. > >> - skipfile should be called after making sure that we work on a file. > > > > It's not clear to me what this is referring to, exactly...? Can you > > clarify? > > Please see 0002 attached, which moves the call to skipfile() where I > think it should go. Alright, on a quick glance that seems ok. > >> - it is necessary to exclude EXEC_BACKEND files. > > > > Agreed, they should be added to the exclude list. > > Base backups are impacted as well as this causes spurious warnings. Right- could you please also add comments around the two lists to make other hackers aware that the list shows up in two places and that any changes should be made in both places..? > Attached are two patches to fix all the mess: > - 0001 is a revert of the whitelist, minus the set of regression tests > checking after corrupted files and empty files. > - 0002 is a fix for all the issues reported on this thread, with tests > added (including the tablespace test from Michael Banck): > -- Base backups gain EXEC_BACKEND files in their warning filters. > -- pg_verify_checksums gains the same files. > -- temporary files are filtered out. > -- pg_verify_checksums performs filtering checks only on regular files, > not on paths. > > 0001 and 0002 need to be merged as 0001 would cause the buildfarm to > turn red on Windows if applied alone. Can you know see my point? Yes, I think they could be merged to address that, though I'm not sure that it's necessairly a huge deal either, if they're going to be pushed together. Thanks! Stephen
Attachment
Re: More issues with pg_verify_checksums and checksum verificationin base backups
From
Michael Paquier
Date:
On Tue, Nov 27, 2018 at 08:17:12PM -0500, Stephen Frost wrote: > * Michael Paquier (michael@paquier.xyz) wrote: >> Please see 0002 attached, which moves the call to skipfile() where I >> think it should go. > > Alright, on a quick glance that seems ok. Thanks. >> Base backups are impacted as well as this causes spurious warnings. > > Right- could you please also add comments around the two lists to make > other hackers aware that the list shows up in two places and that any > changes should be made in both places..? Good point. Added in my local branch. >> Attached are two patches to fix all the mess: >> - 0001 is a revert of the whitelist, minus the set of regression tests >> checking after corrupted files and empty files. >> - 0002 is a fix for all the issues reported on this thread, with tests >> added (including the tablespace test from Michael Banck): >> -- Base backups gain EXEC_BACKEND files in their warning filters. >> -- pg_verify_checksums gains the same files. >> -- temporary files are filtered out. >> -- pg_verify_checksums performs filtering checks only on regular files, >> not on paths. >> >> 0001 and 0002 need to be merged as 0001 would cause the buildfarm to >> turn red on Windows if applied alone. Can you know see my point? > > Yes, I think they could be merged to address that, though I'm not sure > that it's necessairly a huge deal either, if they're going to be pushed > together. This avoids noise failures when bisecting a regression, which matters in some cases. To keep the history cleaner perhaps you are right and it would be cleaner to split into two commits. Let's wait a bit and see if others have extra opinions to offer. -- Michael
Attachment
Re: More issues with pg_verify_checksums and checksum verification inbase backups
From
David Steele
Date:
On 11/27/18 8:32 PM, Michael Paquier wrote: > >>> Attached are two patches to fix all the mess: >>> - 0001 is a revert of the whitelist, minus the set of regression tests >>> checking after corrupted files and empty files. >>> - 0002 is a fix for all the issues reported on this thread, with tests >>> added (including the tablespace test from Michael Banck): >>> -- Base backups gain EXEC_BACKEND files in their warning filters. >>> -- pg_verify_checksums gains the same files. >>> -- temporary files are filtered out. >>> -- pg_verify_checksums performs filtering checks only on regular files, >>> not on paths. >>> >>> 0001 and 0002 need to be merged as 0001 would cause the buildfarm to >>> turn red on Windows if applied alone. Can you know see my point? >> >> Yes, I think they could be merged to address that, though I'm not sure >> that it's necessairly a huge deal either, if they're going to be pushed >> together. > > This avoids noise failures when bisecting a regression, which matters in > some cases. To keep the history cleaner perhaps you are right and it > would be cleaner to split into two commits. > > Let's wait a bit and see if others have extra opinions to offer. Looks good to me. -- -David david@pgmasters.net
Attachment
Re: More issues with pg_verify_checksums and checksum verificationin base backups
From
Michael Paquier
Date:
On Wed, Nov 28, 2018 at 12:32:29PM -0500, David Steele wrote: > Looks good to me. Thanks David for the input. I think that I will be able to finish wrapping and commit this stuff tomorrow. -- Michael
Attachment
Re: More issues with pg_verify_checksums and checksum verificationin base backups
From
Michael Paquier
Date:
On Thu, Nov 29, 2018 at 11:03:54AM +0900, Michael Paquier wrote: > Thanks David for the input. I think that I will be able to finish > wrapping and commit this stuff tomorrow. And done. I kept the split into two commits for clarity as suggested by Stephen, as bisect would actually complain only when using EXEC_BACKEND, and only for the test suite of pg_verify_checksums. -- Michael
Attachment
Re: More issues with pg_verify_checksums and checksum verificationin base backups
From
Michael Banck
Date:
Hi, first off, a bit of a meta-question: Did the whitelist approach die completely, or are we going to tackle it again for v13 or later? On Fri, Nov 30, 2018 at 10:44:33AM +0900, Michael Paquier wrote: > On Thu, Nov 29, 2018 at 11:03:54AM +0900, Michael Paquier wrote: > > Thanks David for the input. I think that I will be able to finish > > wrapping and commit this stuff tomorrow. > > And done. I kept the split into two commits for clarity as suggested by > Stephen, as bisect would actually complain only when using EXEC_BACKEND, > and only for the test suite of pg_verify_checksums. So if I do 'touch $PGDATA/global/123.', current pg_checksums errors out with |pg_checksums: error: invalid segment number 0 in file name |"<<$PGDATA>>/global/123." This is something I still have in the test suite of my pg_checksums fork, cause I never reverted that one from isRelFile() back to skipfile() (so it doesn't fail on the above cause `123.' is not considered a relation file worth checksumming). Independently of the whitelist/blacklist question, I believe pg_checksums should not error out as soon as it encounters a weird looking file, but either (i) still checksum it or (ii) skip it? Or is that to be considered a pilot error and it's fine for pg_checksums to fold? Thoughts? Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Re: More issues with pg_verify_checksums and checksum verificationin base backups
From
Michael Paquier
Date:
On Sat, Aug 03, 2019 at 06:47:48PM +0200, Michael Banck wrote: > first off, a bit of a meta-question: Did the whitelist approach die > completely, or are we going to tackle it again for v13 or later? At this stage, it is burried. Amen. > This is something I still have in the test suite of my pg_checksums > fork, cause I never reverted that one from isRelFile() back to > skipfile() (so it doesn't fail on the above cause `123.' is not > considered a relation file worth checksumming). We could actually fix this one. It is confusing to have pg_checksums generate a report about a segment number which is actually incorrect. > Independently of the whitelist/blacklist question, I believe > pg_checksums should not error out as soon as it encounters a weird looking > file, but either (i) still checksum it or (ii) skip it? Or is that to be > considered a pilot error and it's fine for pg_checksums to fold? That's actually the distinctions between the black and white lists which would have handled that. -- Michael
Attachment
Re: More issues with pg_verify_checksums and checksum verificationin base backups
From
Stephen Frost
Date:
Greetings, * Michael Banck (michael.banck@credativ.de) wrote: > Independently of the whitelist/blacklist question, I believe > pg_checksums should not error out as soon as it encounters a weird looking > file, but either (i) still checksum it or (ii) skip it? Or is that to be > considered a pilot error and it's fine for pg_checksums to fold? imv, random files that we don't know about are exactly 'pilot error' to be complained about.. This is exactly why the whitelist idea falls over. Thanks, Stephen
Attachment
Re: More issues with pg_verify_checksums and checksum verificationin base backups
From
Andres Freund
Date:
Hi, On 2019-08-06 10:58:15 -0400, Stephen Frost wrote: > * Michael Banck (michael.banck@credativ.de) wrote: > > Independently of the whitelist/blacklist question, I believe > > pg_checksums should not error out as soon as it encounters a weird looking > > file, but either (i) still checksum it or (ii) skip it? Or is that to be > > considered a pilot error and it's fine for pg_checksums to fold? > > imv, random files that we don't know about are exactly 'pilot error' to > be complained about.. This is exactly why the whitelist idea falls > over. I still think this whole assumption is bad, and that you're fixing non-problems, and creating serious usability issues with zero benefits. Greetings, Andres Freund
Re: More issues with pg_verify_checksums and checksum verificationin base backups
From
Stephen Frost
Date:
Greetings, * Andres Freund (andres@anarazel.de) wrote: > On 2019-08-06 10:58:15 -0400, Stephen Frost wrote: > > * Michael Banck (michael.banck@credativ.de) wrote: > > > Independently of the whitelist/blacklist question, I believe > > > pg_checksums should not error out as soon as it encounters a weird looking > > > file, but either (i) still checksum it or (ii) skip it? Or is that to be > > > considered a pilot error and it's fine for pg_checksums to fold? > > > > imv, random files that we don't know about are exactly 'pilot error' to > > be complained about.. This is exactly why the whitelist idea falls > > over. > > I still think this whole assumption is bad, and that you're fixing > non-problems, and creating serious usability issues with zero benefits. I doubt we're going to get to agreement on this, unfortunately. Thanks, Stephen
Attachment
Re: More issues with pg_verify_checksums and checksum verification inbase backups
From
Magnus Hagander
Date:
On Tue, Aug 6, 2019 at 6:07 PM Stephen Frost <sfrost@snowman.net> wrote:
Greetings,
* Andres Freund (andres@anarazel.de) wrote:
> On 2019-08-06 10:58:15 -0400, Stephen Frost wrote:
> > * Michael Banck (michael.banck@credativ.de) wrote:
> > > Independently of the whitelist/blacklist question, I believe
> > > pg_checksums should not error out as soon as it encounters a weird looking
> > > file, but either (i) still checksum it or (ii) skip it? Or is that to be
> > > considered a pilot error and it's fine for pg_checksums to fold?
> >
> > imv, random files that we don't know about are exactly 'pilot error' to
> > be complained about.. This is exactly why the whitelist idea falls
> > over.
>
> I still think this whole assumption is bad, and that you're fixing
> non-problems, and creating serious usability issues with zero benefits.
I doubt we're going to get to agreement on this, unfortunately.
When agreement cannot be found, perhaps a parameter is in order?
That is, have the tool complain about such files by default but with a HINT that it may or may not be a problem, and a switch that makes it stop complaining?
Re: More issues with pg_verify_checksums and checksum verification inbase backups
From
Stephen Frost
Date:
Greetings,
On Tue, Aug 6, 2019 at 15:45 Magnus Hagander <magnus@hagander.net> wrote:
On Tue, Aug 6, 2019 at 6:07 PM Stephen Frost <sfrost@snowman.net> wrote:Greetings,
* Andres Freund (andres@anarazel.de) wrote:
> On 2019-08-06 10:58:15 -0400, Stephen Frost wrote:
> > * Michael Banck (michael.banck@credativ.de) wrote:
> > > Independently of the whitelist/blacklist question, I believe
> > > pg_checksums should not error out as soon as it encounters a weird looking
> > > file, but either (i) still checksum it or (ii) skip it? Or is that to be
> > > considered a pilot error and it's fine for pg_checksums to fold?
> >
> > imv, random files that we don't know about are exactly 'pilot error' to
> > be complained about.. This is exactly why the whitelist idea falls
> > over.
>
> I still think this whole assumption is bad, and that you're fixing
> non-problems, and creating serious usability issues with zero benefits.
I doubt we're going to get to agreement on this, unfortunately.When agreement cannot be found, perhaps a parameter is in order?That is, have the tool complain about such files by default but with a HINT that it may or may not be a problem, and a switch that makes it stop complaining?
WFM.
Thanks!
Stephen
Re: More issues with pg_verify_checksums and checksum verificationin base backups
From
Michael Paquier
Date:
On Tue, Aug 06, 2019 at 04:20:43PM -0400, Stephen Frost wrote: > On Tue, Aug 6, 2019 at 15:45 Magnus Hagander <magnus@hagander.net> wrote: >> When agreement cannot be found, perhaps a parameter is in order? >> >> That is, have the tool complain about such files by default but with a >> HINT that it may or may not be a problem, and a switch that makes it stop >> complaining? > > WFM. Fine by me. I'd also rather not change the behavior that we have now without the switch. -- Michael