Thread: [Patch] Make pg_checksums skip foreign tablespace directories
Hi, we had a customer who ran pg_checksums on their v12 cluster and got a confusing error: |pg_checksums: error: invalid segment number 0 in file name "/PG- |Data/foo_12_data/pg_tblspc/16402/PG_10_201707211/16390/pg_internal.init |.10028" Turns out the customer ran a pg_ugprade in copy mode before and started up the old cluster again which pg_checksums decided to checked as well - note the PG_10_201707211 in the error message. The attached patch is a stab at teaching pg_checksums to only check its own TABLESPACE_VERSION_DIRECTORY directory. I guess this implies that it would ignore tablespace directories of outdated catversion instances during development, which I think should be ok, but others might not agree? The other question is whether it is possible to end up with a pg_internal.init.$PID file in a running cluster. E.g. if an instance crashes and gets started up again - are those cleaned up during crash recovery, or should pg_checksums ignore them? Right now pg_checksums only checks against a list of filenames and only skips on exact matches not prefixes so that might take a bit of work. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Attachment
On Thu, Jan 30, 2020 at 06:11:22PM +0100, Michael Banck wrote: > The other question is whether it is possible to end up with a > pg_internal.init.$PID file in a running cluster. E.g. if an instance > crashes and gets started up again - are those cleaned up during crash > recovery, or should pg_checksums ignore them? Right now pg_checksums > only checks against a list of filenames and only skips on exact matches > not prefixes so that might take a bit of work. Indeed, with a bad timing and a crash in the middle of write_relcache_init_file(), it could be possible to have such a file left around in the data folder. Having a past tablespace version left around after an upgrade is a pilot error in my opinion because pg_upgrade generates a script to cleanup past tablespaces, no? So your patch does not look like a good idea to me. And now that I look at it, if we happen to leave behind a temporary file for pg_internal.init, backups fail with the following error: 2020-01-31 13:26:18.345 JST [102076] 010_pg_basebackup.pl ERROR: invalid segment number 0 in file "pg_internal.init.123" So, I think that it would be better to change basebackup.c, pg_checksums and pg_rewind so as files are excluded if there is a prefix match with the exclude lists. Please see the attached. -- Michael
Attachment
At Fri, 31 Jan 2020 13:53:52 +0900, Michael Paquier <michael@paquier.xyz> wrote in > On Thu, Jan 30, 2020 at 06:11:22PM +0100, Michael Banck wrote: > > The other question is whether it is possible to end up with a > > pg_internal.init.$PID file in a running cluster. E.g. if an instance > > crashes and gets started up again - are those cleaned up during crash > > recovery, or should pg_checksums ignore them? Right now pg_checksums > > only checks against a list of filenames and only skips on exact matches > > not prefixes so that might take a bit of work. > > Indeed, with a bad timing and a crash in the middle of > write_relcache_init_file(), it could be possible to have such a file > left around in the data folder. Having a past tablespace version left > around after an upgrade is a pilot error in my opinion because > pg_upgrade generates a script to cleanup past tablespaces, no? So > your patch does not look like a good idea to me. And now that I look > at it, if we happen to leave behind a temporary file for > pg_internal.init, backups fail with the following error: > 2020-01-31 13:26:18.345 JST [102076] 010_pg_basebackup.pl ERROR: > invalid segment number 0 in file "pg_internal.init.123" Agreed. > So, I think that it would be better to change basebackup.c, > pg_checksums and pg_rewind so as files are excluded if there is a > prefix match with the exclude lists. Please see the attached. Agreed that the tools should ignore such files. Looking excludeFile, it seems to me that basebackup makes sure to exclude only files that should harm. I'm not sure whether it's explicitly, but tablespace_map.old and backup_label.old are not excluded. The patch excludes harmless files such like "backup_label.20200131" and the two files above. I don't think that is a problem right away, of course. It looks good to me except for the possible excessive exclusion. So, I don't object it if we don't mind that. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Fri, 31 Jan 2020 17:30:43 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > I don't think that is a problem right away, of course. It looks good > to me except for the possible excessive exclusion. So, I don't object > it if we don't mind that. That's a bit wrong. All the discussion is only on excludeFiles. I think we should refrain from letting more files match to nohecksumFiles. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Hi, Am Freitag, den 31.01.2020, 13:53 +0900 schrieb Michael Paquier: > On Thu, Jan 30, 2020 at 06:11:22PM +0100, Michael Banck wrote: > Having a past tablespace version left > around after an upgrade is a pilot error in my opinion because > pg_upgrade generates a script to cleanup past tablespaces, no? So > your patch does not look like a good idea to me. Not sure I agree with it, but if that (i.e. after pg_upgrade in copy mode, you have no business to use the old cluster as well as the new one) is project policy, fair enough. However, Postgres does not disallow to just create tablespaces in the same location from two different versions, so you don't need the pg_upgade scenario to get into this (pg_checksums checking the wrong cluster's data) problem: postgres@kohn:~$ psql -p 5437 -c "CREATE TABLESPACE bar LOCATION '/var/lib/postgresql/bar'" CREATE TABLESPACE postgres@kohn:~$ psql -p 5444 -c "CREATE TABLESPACE bar LOCATION '/var/lib/postgresql/bar'" CREATE TABLESPACE postgres@kohn:~$ ls bar PG_11_201809051 PG_12_201909212 postgres@kohn:~$ touch bar/PG_11_201809051/pg_internal.init.123 postgres@kohn:~$ pg_ctlcluster 12 main stop sudo systemctl stop postgresql@12-main postgres@kohn:~$ LANG=C /usr/lib/postgresql/12/bin/pg_checksums -D /var/lib/postgresql/12/main pg_checksums: error: invalid segment number 0 in file name "/var/lib/postgresql/12/main/pg_tblspc/16396/PG_11_201809051/pg_internal .init.123" I believe this is in order to allow pg_upgrade to run in the first place. But is this pilot error as well? In any case, it is a situation we allow to happen so IMO we should fix pg_checksums to skip the foreign tablespaces. As an aside, I would advocate to just skip files which fail the segment number determination step (and maybe log a warning), not bail out. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
Am Freitag, den 31.01.2020, 13:53 +0900 schrieb Michael Paquier: > Indeed, with a bad timing and a crash in the middle of > write_relcache_init_file(), it could be possible to have such a file > left around in the data folder. Having a past tablespace version > left > around after an upgrade is a pilot error in my opinion because > pg_upgrade generates a script to cleanup past tablespaces, no? I'm suprised, why should that be a problem in copy mode? For me this is a fair use case to test upgrades, e.g. for development purposes, if someone want's to still have application tests against the current old version, for fallback and whatever. And people might not want such upgrades as a "fire-and-forget" task. We even have the --clone feature now, making this even faster. If our project policy is to never ever touch an pg_upgrade'd PostgreSQL instance again in copy mode, i wasn't aware of it. And to be honest, even PostgreSQL itself allows you to reuse tablespace locations for multiple instances as well, so the described problem should exist not in upgraded clusters only. Bernd
On Fri, Jan 31, 2020 at 11:33:34AM +0100, Bernd Helmle wrote: > And to be honest, even PostgreSQL itself allows you to reuse tablespace > locations for multiple instances as well, so the described problem > should exist not in upgraded clusters only. Fair point. Now, while the proposed patch is right to use TABLESPACE_VERSION_DIRECTORY, shouldn't we use strncmp based on the length of TABLESPACE_VERSION_DIRECTORY instead of de->d_name? It seems also to me that the code as proposed is rather fragile, and that we had better be sure that the check only happens when we are scanning entries within pg_tblspc. The issue with pg_internal.init.XX is quite different, so I think that it would be better to commit that separately first. -- Michael
Attachment
On Fri, Jan 31, 2020 at 05:39:36PM +0900, Kyotaro Horiguchi wrote: > At Fri, 31 Jan 2020 17:30:43 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in >> I don't think that is a problem right away, of course. It looks good >> to me except for the possible excessive exclusion. So, I don't object >> it if we don't mind that. > > That's a bit wrong. All the discussion is only on excludeFiles. I > think we should refrain from letting more files match to > nohecksumFiles. I am not sure what you are saying here. Are you saying that we should not use a prefix matching for that part? Or are you saying that we should not touch this list at all? Please note that pg_internal.init is listed within noChecksumFiles in basebackup.c, so we would miss any temporary pg_internal.init.PID if we don't check after the file prefix and the base backup would issue extra WARNING messages, potentially masking messages that could matter. So let's fix that as well. I agree that a side effect of this change would be to discard anything prefixed with "backup_label" or "tablespace_map", including any old, renamed files. Do you know of any backup solutions which could be impacted by that? I am adding David Steele and Stephen Frost in CC so as they can comment based on their experience in this area. I recall that backrest stuff uses the replication protocol, but I may be wrong. -- Michael
Attachment
On 2/19/20 2:13 AM, Michael Paquier wrote: > On Fri, Jan 31, 2020 at 05:39:36PM +0900, Kyotaro Horiguchi wrote: >> At Fri, 31 Jan 2020 17:30:43 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in >>> I don't think that is a problem right away, of course. It looks good >>> to me except for the possible excessive exclusion. So, I don't object >>> it if we don't mind that. >> >> That's a bit wrong. All the discussion is only on excludeFiles. I >> think we should refrain from letting more files match to >> nohecksumFiles. > > I am not sure what you are saying here. Are you saying that we should > not use a prefix matching for that part? Or are you saying that we > should not touch this list at all? Perhaps he is saying that if it is already excluded it won't be checksummed. So, if pg_internal.init* is excluded from the backup, that is all that is needed. If so, I agree. This might not help pg_verify_checksums, though, except that it should be applying the same rules. > Please note that pg_internal.init is listed within noChecksumFiles in > basebackup.c, so we would miss any temporary pg_internal.init.PID if > we don't check after the file prefix and the base backup would issue > extra WARNING messages, potentially masking messages that could > matter. So let's fix that as well. Agreed. Though, I think pg_internal.init.XX should be excluded from the backup as well. As far as I can see, the pg_internal.init.XX will not be cleaned up by PostgreSQL on startup. I've only tested this in 9.6 so far, but I don't see any differences in the code since then that would lead to better behavior. Perhaps that's also something we should fix? > I agree that a side effect of this change would be to discard anything > prefixed with "backup_label" or "tablespace_map", including any old, > renamed files. Do you know of any backup solutions which could be > impacted by that? I am adding David Steele and Stephen Frost in CC so > as they can comment based on their experience in this area. I recall > that backrest stuff uses the replication protocol, but I may be > wrong. I'm really not a fan of a blind prefix match. I think we should stick with only excluding files that are created by Postgres. So backup_label.old and tablespace_map.old should just be added to the exclude list. That's how we have it in pgBackRest. Regards, -- -David david@pgmasters.net
On 1/31/20 3:59 AM, Michael Banck wrote: > Hi, > > Am Freitag, den 31.01.2020, 13:53 +0900 schrieb Michael Paquier: >> On Thu, Jan 30, 2020 at 06:11:22PM +0100, Michael Banck wrote: >> Having a past tablespace version left >> around after an upgrade is a pilot error in my opinion because >> pg_upgrade generates a script to cleanup past tablespaces, no? So >> your patch does not look like a good idea to me. > > Not sure I agree with it, but if that (i.e. after pg_upgrade in copy > mode, you have no business to use the old cluster as well as the new > one) is project policy, fair enough. I don't see how this is project policy. The directories for other versions of Postgres should be ignored as they are in other utilities, e.g. pg_basebackup. > However, Postgres does not disallow to just create tablespaces in the > same location from two different versions, so you don't need the > pg_upgade scenario to get into this (pg_checksums checking the wrong > cluster's data) problem: Exactly. Regards, -- -David david@pgmasters.net
On Wed, Feb 19, 2020 at 12:37:00PM -0600, David Steele wrote: > On 2/19/20 2:13 AM, Michael Paquier wrote: >> Please note that pg_internal.init is listed within noChecksumFiles in >> basebackup.c, so we would miss any temporary pg_internal.init.PID if >> we don't check after the file prefix and the base backup would issue >> extra WARNING messages, potentially masking messages that could >> matter. So let's fix that as well. > > Agreed. Though, I think pg_internal.init.XX should be excluded from the > backup as well. Sure. That's the intention. pg_rewind, pg_checksums and basebackup.c are all the things on my list. > As far as I can see, the pg_internal.init.XX will not be cleaned up by > PostgreSQL on startup. I've only tested this in 9.6 so far, but I don't see > any differences in the code since then that would lead to better behavior. > Perhaps that's also something we should fix? Not sure that it is worth spending cycles on that at the beginning of recovery as when a mapping file is written its temporary entry truncates any existing one present matching its name. > I'm really not a fan of a blind prefix match. I think we should stick with > only excluding files that are created by Postgres. Thinking more on that, excluding any backup_label with a custom suffix worries me as it could cause a potential breakage for exiting backup solutions. So attached is an updated patch making things in a smarter way: I have added to the exclusion lists the possibility to match an entry based on its prefix, or not, the choice being optional. This solves the problem with pg_internal.PID and is careful to not exclude unnecessary entries like suffixed backup labels or such. This leads to some extra duplication within pg_rewind, basebackup.c and pg_checksums but I think we can live with that, and that makes back-patching simpler. Refactoring is still tricky though as it relates to the use of paths across the backend and the frontend.. > So backup_label.old and > tablespace_map.old should just be added to the exclude list. That's how we > have it in pgBackRest. That would be a behavior change. We could change that on HEAD, but I don't think that this can be back-patched as this does not cause an actual problem. For now, my proposal is to fix the prefix first, and then let's look at the business with tablespaces where needed. -- Michael
Attachment
On 2/20/20 12:55 AM, Michael Paquier wrote: > On Wed, Feb 19, 2020 at 12:37:00PM -0600, David Steele wrote: > >> As far as I can see, the pg_internal.init.XX will not be cleaned up by >> PostgreSQL on startup. I've only tested this in 9.6 so far, but I don't see >> any differences in the code since then that would lead to better behavior. >> Perhaps that's also something we should fix? > > Not sure that it is worth spending cycles on that at the beginning of > recovery as when a mapping file is written its temporary entry > truncates any existing one present matching its name. But since the name includes the backend's pid you would need to get lucky and have a new backend with the same pid create the file after a restart. I tried it and the old temp file was left behind after restart and first connection to the database. I doubt this is a big issue in the field, but it seems like it would be nice to do something about it. >> I'm really not a fan of a blind prefix match. I think we should stick with >> only excluding files that are created by Postgres. > > Thinking more on that, excluding any backup_label with a custom suffix > worries me as it could cause a potential breakage for exiting backup > solutions. So attached is an updated patch making things in a > smarter way: I have added to the exclusion lists the possibility to > match an entry based on its prefix, or not, the choice being optional. > This solves the problem with pg_internal.PID and is careful to not > exclude unnecessary entries like suffixed backup labels or such. This > leads to some extra duplication within pg_rewind, basebackup.c and > pg_checksums but I think we can live with that, and that makes > back-patching simpler. Refactoring is still tricky though as it > relates to the use of paths across the backend and the frontend.. I'm not excited about the amount of code duplication between these three tools. I know this was because of back-patching various issues in the past, but I really think we need to unify these data structures/functions in HEAD. >> So backup_label.old and >> tablespace_map.old should just be added to the exclude list. That's how we >> have it in pgBackRest. > > That would be a behavior change. We could change that on HEAD, but I > don't think that this can be back-patched as this does not cause an > actual problem. Right, that should be in HEAD. > For now, my proposal is to fix the prefix first, and then let's look > at the business with tablespaces where needed. OK. Regards, -- -David david@pgmasters.net
Am Dienstag, den 18.02.2020, 15:15 +0900 schrieb Michael Paquier: > Fair point. Now, while the proposed patch is right to use > TABLESPACE_VERSION_DIRECTORY, shouldn't we use strncmp based on the > length of TABLESPACE_VERSION_DIRECTORY instead of de->d_name? It > seems also to me that the code as proposed is rather fragile, and > that > we had better be sure that the check only happens when we are > scanning > entries within pg_tblspc. > Yes, after thinking and playing around with it a little i share your position. You can still easily cause pg_checksums to error out by just having arbitrary files around in the reference tablespace locations. Though i don't think this is something of a big issue, it looks strange and misleading if pg_checksums just complains about files not belonging to the scanned PostgreSQL data directory (even we explicitly note in the docs that even tablespace locations are somehow taboo for DBAs to put other files and/or directories in there). So i propose a different approach like the attached patch tries to implement: instead of just blindly iterating over directory contents and filter them out, reference the tablespace location and TABLESPACE_VERSION_DIRECTORY directly. This is done by a new function scan_tablespaces() which is specialized in just follow the symlinks/junctions in pg_tblspc and call scan_directory() with just what it has found there. It will also honour directories, just in case an experienced DBA has copied over the tablespace into pg_tblspc directly. > The issue with pg_internal.init.XX is quite different, so I think > that > it would be better to commit that separately first. Agreed. Thanks, Bernd
Attachment
On Thu, Feb 20, 2020 at 07:37:15AM -0600, David Steele wrote: > But since the name includes the backend's pid you would need to get lucky > and have a new backend with the same pid create the file after a restart. I > tried it and the old temp file was left behind after restart and first > connection to the database. > > I doubt this is a big issue in the field, but it seems like it would be nice > to do something about it. The natural area to do that would be around ResetUnloggedRelations(). Still that would require combine both operations to not do any unnecessary lookups at the data folder paths. > I'm not excited about the amount of code duplication between these three > tools. I know this was because of back-patching various issues in the past, > but I really think we need to unify these data structures/functions in HEAD. The lists are duplicated because we have never really figured out how to combine this code in one place. The idea was to have all the data folder path logic and the lists within one header shared between the frontend and backend but there was not much support for that on HEAD. >> For now, my proposal is to fix the prefix first, and then let's look >> at the business with tablespaces where needed. > > OK. I'll let this patch round for a couple of extra day, and revisit it at the beginning of next week. -- Michael
Attachment
On Thu, Feb 20, 2020 at 05:38:15PM +0100, Bernd Helmle wrote: > So i propose a different approach like the attached patch tries to > implement: instead of just blindly iterating over directory contents > and filter them out, reference the tablespace location and > TABLESPACE_VERSION_DIRECTORY directly. This is done by a new function > scan_tablespaces() which is specialized in just follow the > symlinks/junctions in pg_tblspc and call scan_directory() with just > what it has found there. It will also honour directories, just in case > an experienced DBA has copied over the tablespace into pg_tblspc > directly. + if (S_ISREG(st.st_mode)) + { + pg_log_debug("ignoring file %s in pg_tblspc", de->d_name); + continue; + } We don't do that for the normal directory scan path, so it does not strike me as a good idea on consistency ground. As a whole, I don't see much point in having a separate routine which is just roughly a duplicate of scan_directory(), and I think that we had better just add the check looking for matches with TABLESPACE_VERSION_DIRECTORY directly when having a directory, if subdir is "pg_tblspc". That also makes the patch much shorter. + * the direct path to it and check via lstat wether it exists. s/wether/whether/, repeated three times. We should have some TAP tests for that. The first patch of this thread from Michael had some, but I would just have added a dummy tablespace with an empty file in 002_actions.pl, triggering an error if pg_checksums is not fixed. Dummy entries around the place where dummy temp files are added would be fine. -- Michael
Attachment
Thank you David for decrypting my previous mail.., and your translation was correct. At Fri, 21 Feb 2020 15:07:12 +0900, Michael Paquier <michael@paquier.xyz> wrote in > On Thu, Feb 20, 2020 at 07:37:15AM -0600, David Steele wrote: > > But since the name includes the backend's pid you would need to get lucky > > and have a new backend with the same pid create the file after a restart. I > > tried it and the old temp file was left behind after restart and first > > connection to the database. > > > > I doubt this is a big issue in the field, but it seems like it would be nice > > to do something about it. > > The natural area to do that would be around ResetUnloggedRelations(). > Still that would require combine both operations to not do any > unnecessary lookups at the data folder paths. > > > I'm not excited about the amount of code duplication between these three > > tools. I know this was because of back-patching various issues in the past, > > but I really think we need to unify these data structures/functions in HEAD. > > The lists are duplicated because we have never really figured out how > to combine this code in one place. The idea was to have all the data > folder path logic and the lists within one header shared between the > frontend and backend but there was not much support for that on HEAD. > > >> For now, my proposal is to fix the prefix first, and then let's look > >> at the business with tablespaces where needed. > > > > OK. > > I'll let this patch round for a couple of extra day, and revisit it at > the beginning of next week. Thank you for the version. I didn't look it closer bat it looks in the direction I wanted. At a quick look, the following section attracted my eyes. + if (strncmp(de->d_name, excludeFiles[excludeIdx].name, + strlen(excludeFiles[excludeIdx].name)) == 0) + { + elog(DEBUG1, "file \"%s\" matching prefix \"%s\" excluded from backup", + de->d_name, excludeFiles[excludeIdx].name); + excludeFound = true; + break; + } + } + else + { + if (strcmp(de->d_name, excludeFiles[excludeIdx].name) == 0) + { + elog(DEBUG1, "file \"%s\" excluded from backup", de->d_name); + excludeFound = true; + break; + } The two str[n]cmps are different only in matching length. I don't think we don't need to differentiate the two message there, so we could reduce the code as: | cmplen = strlen(excludeFiles[].name); | if (!prefix_patch) | cmplen++; | if (strncmp(d_name, excludeFilep.name, cmplen) == 0) | ... regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Hi Michael, On 2/20/20 11:07 PM, Michael Paquier wrote: > On Thu, Feb 20, 2020 at 07:37:15AM -0600, David Steele wrote: >> But since the name includes the backend's pid you would need to get lucky >> and have a new backend with the same pid create the file after a restart. I >> tried it and the old temp file was left behind after restart and first >> connection to the database. >> >> I doubt this is a big issue in the field, but it seems like it would be nice >> to do something about it. > > The natural area to do that would be around ResetUnloggedRelations(). > Still that would require combine both operations to not do any > unnecessary lookups at the data folder paths. Yeah, that's what I was thinking as well, since there is already a directory scan there and doing the check would be very cheap. It's not obvious how to combine these in the right way without moving a lot of code around to non-obvious places. One solution might be to have each subsystem register a function that does checks/cleanup as each path/file is found in a common scan function. That's a pretty major rework though, and I doubt there would be much appetite for it to solve such a minor problem. >> I'm not excited about the amount of code duplication between these three >> tools. I know this was because of back-patching various issues in the past, >> but I really think we need to unify these data structures/functions in HEAD. > > The lists are duplicated because we have never really figured out how > to combine this code in one place. The idea was to have all the data > folder path logic and the lists within one header shared between the > frontend and backend but there was not much support for that on HEAD. Do you have the thread? I'd like to see what was proposed and what the objections were. Regards, -- -David david@pgmasters.net
On 2/21/20 1:36 AM, Michael Paquier wrote: > On Thu, Feb 20, 2020 at 05:38:15PM +0100, Bernd Helmle wrote: >> So i propose a different approach like the attached patch tries to >> implement: instead of just blindly iterating over directory contents >> and filter them out, reference the tablespace location and >> TABLESPACE_VERSION_DIRECTORY directly. This is done by a new function >> scan_tablespaces() which is specialized in just follow the >> symlinks/junctions in pg_tblspc and call scan_directory() with just >> what it has found there. It will also honour directories, just in case >> an experienced DBA has copied over the tablespace into pg_tblspc >> directly. > > + if (S_ISREG(st.st_mode)) > + { > + pg_log_debug("ignoring file %s in pg_tblspc", de->d_name); > + continue; > + } > We don't do that for the normal directory scan path, so it does not > strike me as a good idea on consistency ground. As a whole, I don't > see much point in having a separate routine which is just roughly a > duplicate of scan_directory(), and I think that we had better just add > the check looking for matches with TABLESPACE_VERSION_DIRECTORY > directly when having a directory, if subdir is "pg_tblspc". That > also makes the patch much shorter. +1. This is roughly what pg_basebackup does and it seems simpler to me. Regards, -- -David david@pgmasters.net
On Fri, Feb 21, 2020 at 05:37:15PM +0900, Kyotaro Horiguchi wrote: > The two str[n]cmps are different only in matching length. I don't > think we don't need to differentiate the two message there, so we > could reduce the code as: > > | cmplen = strlen(excludeFiles[].name); > | if (!prefix_patch) > | cmplen++; > | if (strncmp(d_name, excludeFilep.name, cmplen) == 0) > | ... Good idea. Let's do things as you suggest. -- Michael
Attachment
On Fri, Feb 21, 2020 at 08:13:34AM -0500, David Steele wrote: > Do you have the thread? I'd like to see what was proposed and what the > objections were. Here you go: https://www.postgresql.org/message-id/20180205071022.GA17337@paquier.xyz -- Michael
Attachment
On Fri, 2020-02-21 at 15:36 +0900, Michael Paquier wrote: > We don't do that for the normal directory scan path, so it does not > strike me as a good idea on consistency ground. As a whole, I don't > see much point in having a separate routine which is just roughly a > duplicate of scan_directory(), and I think that we had better just > add > the check looking for matches with TABLESPACE_VERSION_DIRECTORY > directly when having a directory, if subdir is "pg_tblspc". That > also makes the patch much shorter. To be honest, i dislike both: The other doubles logic (note: i don't see it necessarily as 100% code duplication, since the semantic of scan_tablespaces() is different: it serves as a driver for scan_directories() and just resolves entries in pg_tblspc directly). The other makes scan_directories() complicater to read and special cases just a single directory in an otherwise more or less generic function. E.g. it makes me uncomfortable if we get a pg_tblspc somewhere else than PGDATA (if someone managed to create such a directory in a foreign tablespace location for example), so we should maintain an additional check if we really operate on the pg_tblspc we have to. That was the reason(s) i've moved it into a separate function. That said, i'll provide an updated patch with your ideas. Bernd
On Sun, Feb 23, 2020 at 04:08:58PM +0900, Michael Paquier wrote: > Good idea. Let's do things as you suggest. Applied and back-patched this one down to 11. -- Michael
Attachment
Hi Michael, On 2/24/20 7:26 AM, Michael Paquier wrote: > On Sun, Feb 23, 2020 at 04:08:58PM +0900, Michael Paquier wrote: >> Good idea. Let's do things as you suggest. > > Applied and back-patched this one down to 11. FWIW, we took a slightly narrower approach to this issue in the pgBackRest patch (attached). I don't have an issue with the prefix approach since it works and the Postgres project is very likely to catch it if there is a change in behavior. For third-party projects, though, it might pay to be more conservative in case the behavior changes in the future, i.e. pg_internal.init[something] (but not pg_internal\.init[0-9]+) becomes valid. Regards, -- -David david@pgmasters.net
Attachment
On Mon, Feb 24, 2020 at 01:11:10PM +0100, Bernd Helmle wrote: > The other makes scan_directories() complicated to read and special > cases just a single directory in an otherwise more or less generic > function. E.g. it makes me uncomfortable if we get a pg_tblspc > somewhere else than PGDATA (if someone managed to create such a > directory in a foreign tablespace location for example), so we should > maintain an additional check if we really operate on the pg_tblspc we > have to. That was the reason(s) i've moved it into a separate function. We are just discussing about the code path involving scanning a directory, so that does not seem that bad to me. I really think that we should avoid duplicating the same logic around, and that we should remain consistent with non-directory entries in those paths, complaining with a proper failure if extra, unwanted files are present. -- Michael
Attachment
Am Dienstag, den 25.02.2020, 11:33 +0900 schrieb Michael Paquier: > I really think that > we should avoid duplicating the same logic around, and that we should > remain consistent with non-directory entries in those paths, > complaining with a proper failure if extra, unwanted files are > present. Okay, please find an updated patch attached. My feeling is that in the case we cannot successfully resolve a tablespace location from pg_tblspc, we should error out, but i could imagine that people would like to have just a warning instead. I've updated the TAP test for pg_checksums by adding a dummy subdirectory into the tablespace directory already created for the corrupted relfilenode test, containing a file to process in case an unpatched pg_checksums is run. With the patch attached, these directories simply won't be considered to check. Thanks, Bernd
Attachment
On Wed, Feb 26, 2020 at 06:02:22PM +0100, Bernd Helmle wrote: > My feeling is that in the case we cannot successfully resolve a > tablespace location from pg_tblspc, we should error out, but i could > imagine that people would like to have just a warning instead. Thanks, this patch is much cleaner in its approach, and I don't have much to say about it except that the error message for lstat() should be more consistent with the one above in scan_directory(). The version for v11 has required a bit of rework, but nothing huge either. > I've updated the TAP test for pg_checksums by adding a dummy > subdirectory into the tablespace directory already created for the > corrupted relfilenode test, containing a file to process in case an > unpatched pg_checksums is run. With the patch attached, these > directories simply won't be considered to check. What you have here is much more simple than the original proposal, so I kept it. Applied and back-patched down to 11. -- Michael