Thread: [Patch] Make pg_checksums skip foreign tablespace directories

[Patch] Make pg_checksums skip foreign tablespace directories

From

Michael Banck

Date:

30 January 2020, 17:11:22

Hi,

we had a customer who ran pg_checksums on their v12 cluster and got a
confusing error:

|pg_checksums: error: invalid segment number 0 in file name "/PG-
|Data/foo_12_data/pg_tblspc/16402/PG_10_201707211/16390/pg_internal.init
|.10028"

Turns out the customer ran a pg_ugprade in copy mode before and started
up the old cluster again which pg_checksums decided to checked as well -
note the PG_10_201707211 in the error message. The attached patch is a
stab at teaching pg_checksums to only check its own
TABLESPACE_VERSION_DIRECTORY directory. I guess this implies that it
would ignore tablespace directories of outdated catversion instances
during development, which I think should be ok, but others might not
agree?

The other question is whether it is possible to end up with a
pg_internal.init.$PID file in a running cluster. E.g. if an instance
crashes and gets started up again - are those cleaned up during crash
recovery, or should pg_checksums ignore them? Right now pg_checksums
only checks against a list of filenames and only skips on exact matches
not prefixes so that might take a bit of work.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Attachment

0001-Make-pg_checksums-skip-foreign-tablespace-directorie.patch

Re: [Patch] Make pg_checksums skip foreign tablespace directories

From

Michael Paquier

Date:

31 January 2020, 04:53:52

On Thu, Jan 30, 2020 at 06:11:22PM +0100, Michael Banck wrote:
> The other question is whether it is possible to end up with a
> pg_internal.init.$PID file in a running cluster. E.g. if an instance
> crashes and gets started up again - are those cleaned up during crash
> recovery, or should pg_checksums ignore them? Right now pg_checksums
> only checks against a list of filenames and only skips on exact matches
> not prefixes so that might take a bit of work.

Indeed, with a bad timing and a crash in the middle of
write_relcache_init_file(), it could be possible to have such a file
left around in the data folder.  Having a past tablespace version left
around after an upgrade is a pilot error in my opinion because
pg_upgrade generates a script to cleanup past tablespaces, no?  So
your patch does not look like a good idea to me.  And now that I look
at it, if we happen to leave behind a temporary file for
pg_internal.init, backups fail with the following error:
2020-01-31 13:26:18.345 JST [102076] 010_pg_basebackup.pl ERROR:
invalid segment number 0 in file "pg_internal.init.123"

So, I think that it would be better to change basebackup.c,
pg_checksums and pg_rewind so as files are excluded if there is a
prefix match with the exclude lists.  Please see the attached.
--
Michael

Attachment

Re: [Patch] Make pg_checksums skip foreign tablespace directories

From

Kyotaro Horiguchi

Date:

31 January 2020, 08:30:43

At Fri, 31 Jan 2020 13:53:52 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> On Thu, Jan 30, 2020 at 06:11:22PM +0100, Michael Banck wrote:
> > The other question is whether it is possible to end up with a
> > pg_internal.init.$PID file in a running cluster. E.g. if an instance
> > crashes and gets started up again - are those cleaned up during crash
> > recovery, or should pg_checksums ignore them? Right now pg_checksums
> > only checks against a list of filenames and only skips on exact matches
> > not prefixes so that might take a bit of work.
> 
> Indeed, with a bad timing and a crash in the middle of
> write_relcache_init_file(), it could be possible to have such a file
> left around in the data folder.  Having a past tablespace version left
> around after an upgrade is a pilot error in my opinion because
> pg_upgrade generates a script to cleanup past tablespaces, no?  So
> your patch does not look like a good idea to me.  And now that I look
> at it, if we happen to leave behind a temporary file for
> pg_internal.init, backups fail with the following error:
> 2020-01-31 13:26:18.345 JST [102076] 010_pg_basebackup.pl ERROR:
> invalid segment number 0 in file "pg_internal.init.123"

Agreed.

> So, I think that it would be better to change basebackup.c,
> pg_checksums and pg_rewind so as files are excluded if there is a
> prefix match with the exclude lists.  Please see the attached.

Agreed that the tools should ignore such files.  Looking excludeFile,
it seems to me that basebackup makes sure to exclude only files that
should harm.  I'm not sure whether it's explicitly, but
tablespace_map.old and backup_label.old are not excluded.

The patch excludes harmless files such like "backup_label.20200131"
and the two files above.

I don't think that is a problem right away, of course. It looks good
to me except for the possible excessive exclusion.  So, I don't object
it if we don't mind that.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [Patch] Make pg_checksums skip foreign tablespace directories

From

Kyotaro Horiguchi

Date:

31 January 2020, 08:39:36

At Fri, 31 Jan 2020 17:30:43 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> I don't think that is a problem right away, of course. It looks good
> to me except for the possible excessive exclusion.  So, I don't object
> it if we don't mind that.

That's a bit wrong.  All the discussion is only on excludeFiles.  I
think we should refrain from letting more files match to
nohecksumFiles.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [Patch] Make pg_checksums skip foreign tablespace directories

From

Michael Banck

Date:

31 January 2020, 09:59:30

Hi,

Am Freitag, den 31.01.2020, 13:53 +0900 schrieb Michael Paquier:
> On Thu, Jan 30, 2020 at 06:11:22PM +0100, Michael Banck wrote:
> Having a past tablespace version left
> around after an upgrade is a pilot error in my opinion because
> pg_upgrade generates a script to cleanup past tablespaces, no?  So
> your patch does not look like a good idea to me. 

Not sure I agree with it, but if that (i.e. after pg_upgrade in copy
mode, you have no business to use the old cluster as well as the new
one) is project policy, fair enough.

However, Postgres does not disallow to just create tablespaces in the
same location from two different versions, so you don't need the
pg_upgade scenario to get into this (pg_checksums checking the wrong
cluster's data) problem:

postgres@kohn:~$ psql -p 5437 -c "CREATE TABLESPACE bar LOCATION '/var/lib/postgresql/bar'"
CREATE TABLESPACE
postgres@kohn:~$ psql -p 5444 -c "CREATE TABLESPACE bar LOCATION '/var/lib/postgresql/bar'"
CREATE TABLESPACE
postgres@kohn:~$ ls bar
PG_11_201809051  PG_12_201909212
postgres@kohn:~$ touch bar/PG_11_201809051/pg_internal.init.123
postgres@kohn:~$ pg_ctlcluster 12 main stop
  sudo systemctl stop postgresql@12-main
postgres@kohn:~$ LANG=C /usr/lib/postgresql/12/bin/pg_checksums -D /var/lib/postgresql/12/main
pg_checksums: error: invalid segment number 0 in file name
"/var/lib/postgresql/12/main/pg_tblspc/16396/PG_11_201809051/pg_internal
.init.123"

I believe this is in order to allow pg_upgrade to run in the first
place. But is this pilot error as well? In any case, it is a situation
we allow to happen so IMO we should fix pg_checksums to skip the foreign
tablespaces.

As an aside, I would advocate to just skip files which fail the segment
number determination step (and maybe log a warning), not bail out.

Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz

Re: [Patch] Make pg_checksums skip foreign tablespace directories

From

Bernd Helmle

Date:

31 January 2020, 10:33:34

Am Freitag, den 31.01.2020, 13:53 +0900 schrieb Michael Paquier:
> Indeed, with a bad timing and a crash in the middle of
> write_relcache_init_file(), it could be possible to have such a file
> left around in the data folder.  Having a past tablespace version
> left
> around after an upgrade is a pilot error in my opinion because
> pg_upgrade generates a script to cleanup past tablespaces, no?

I'm suprised, why should that be a problem in copy mode? For me this is
a fair use case to test upgrades, e.g. for development purposes, if
someone want's to still have application tests against the current old
version, for fallback and whatever. And people might not want such
upgrades as a "fire-and-forget" task. We even have the --clone feature
now, making this even faster.

If our project policy is to never ever touch an pg_upgrade'd PostgreSQL
instance again in copy mode, i wasn't aware of it.

And to be honest, even PostgreSQL itself allows you to reuse tablespace
locations for multiple instances as well, so the described problem
should exist not in upgraded clusters only.


    Bernd

Re: [Patch] Make pg_checksums skip foreign tablespace directories

From

Michael Paquier

Date:

18 February 2020, 06:15:34

On Fri, Jan 31, 2020 at 11:33:34AM +0100, Bernd Helmle wrote:
> And to be honest, even PostgreSQL itself allows you to reuse tablespace
> locations for multiple instances as well, so the described problem
> should exist not in upgraded clusters only.

Fair point.  Now, while the proposed patch is right to use
TABLESPACE_VERSION_DIRECTORY, shouldn't we use strncmp based on the
length of TABLESPACE_VERSION_DIRECTORY instead of de->d_name?  It
seems also to me that the code as proposed is rather fragile, and that
we had better be sure that the check only happens when we are scanning
entries within pg_tblspc.

The issue with pg_internal.init.XX is quite different, so I think that
it would be better to commit that separately first.
--
Michael

Attachment

signature.asc

Re: [Patch] Make pg_checksums skip foreign tablespace directories

From

Michael Paquier

Date:

19 February 2020, 08:13:16

On Fri, Jan 31, 2020 at 05:39:36PM +0900, Kyotaro Horiguchi wrote:
> At Fri, 31 Jan 2020 17:30:43 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
>> I don't think that is a problem right away, of course. It looks good
>> to me except for the possible excessive exclusion.  So, I don't object
>> it if we don't mind that.
>
> That's a bit wrong.  All the discussion is only on excludeFiles.  I
> think we should refrain from letting more files match to
> nohecksumFiles.

I am not sure what you are saying here.  Are you saying that we should
not use a prefix matching for that part?  Or are you saying that we
should not touch this list at all?

Please note that pg_internal.init is listed within noChecksumFiles in
basebackup.c, so we would miss any temporary pg_internal.init.PID if
we don't check after the file prefix and the base backup would issue
extra WARNING messages, potentially masking messages that could
matter.  So let's fix that as well.

I agree that a side effect of this change would be to discard anything
prefixed with "backup_label" or "tablespace_map", including any old,
renamed files.  Do you know of any backup solutions which could be
impacted by that?  I am adding David Steele and Stephen Frost in CC so
as they can comment based on their experience in this area.  I recall
that backrest stuff uses the replication protocol, but I may be
wrong.
--
Michael

Attachment

signature.asc

Re: [Patch] Make pg_checksums skip foreign tablespace directories

From

David Steele

Date:

19 February 2020, 18:37:00

On 2/19/20 2:13 AM, Michael Paquier wrote:
> On Fri, Jan 31, 2020 at 05:39:36PM +0900, Kyotaro Horiguchi wrote:
>> At Fri, 31 Jan 2020 17:30:43 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
>>> I don't think that is a problem right away, of course. It looks good
>>> to me except for the possible excessive exclusion.  So, I don't object
>>> it if we don't mind that.
>>
>> That's a bit wrong.  All the discussion is only on excludeFiles.  I
>> think we should refrain from letting more files match to
>> nohecksumFiles.
> 
> I am not sure what you are saying here.  Are you saying that we should
> not use a prefix matching for that part?  Or are you saying that we
> should not touch this list at all?

Perhaps he is saying that if it is already excluded it won't be 
checksummed.  So, if pg_internal.init* is excluded from the backup, that 
is all that is needed.  If so, I agree.  This might not help 
pg_verify_checksums, though, except that it should be applying the same 
rules.

> Please note that pg_internal.init is listed within noChecksumFiles in
> basebackup.c, so we would miss any temporary pg_internal.init.PID if
> we don't check after the file prefix and the base backup would issue
> extra WARNING messages, potentially masking messages that could
> matter.  So let's fix that as well.

Agreed.  Though, I think pg_internal.init.XX should be excluded from the 
backup as well.

As far as I can see, the pg_internal.init.XX will not be cleaned up by 
PostgreSQL on startup.  I've only tested this in 9.6 so far, but I don't 
see any differences in the code since then that would lead to better 
behavior.  Perhaps that's also something we should fix?

> I agree that a side effect of this change would be to discard anything
> prefixed with "backup_label" or "tablespace_map", including any old,
> renamed files.  Do you know of any backup solutions which could be
> impacted by that?  I am adding David Steele and Stephen Frost in CC so
> as they can comment based on their experience in this area.  I recall
> that backrest stuff uses the replication protocol, but I may be
> wrong.

I'm really not a fan of a blind prefix match.  I think we should stick 
with only excluding files that are created by Postgres.  So 
backup_label.old and tablespace_map.old should just be added to the 
exclude list.  That's how we have it in pgBackRest.

Regards,
-- 
-David
david@pgmasters.net

Re: [Patch] Make pg_checksums skip foreign tablespace directories

From

David Steele

Date:

19 February 2020, 18:42:53

On 1/31/20 3:59 AM, Michael Banck wrote:
> Hi,
> 
> Am Freitag, den 31.01.2020, 13:53 +0900 schrieb Michael Paquier:
>> On Thu, Jan 30, 2020 at 06:11:22PM +0100, Michael Banck wrote:
>> Having a past tablespace version left
>> around after an upgrade is a pilot error in my opinion because
>> pg_upgrade generates a script to cleanup past tablespaces, no?  So
>> your patch does not look like a good idea to me.
> 
> Not sure I agree with it, but if that (i.e. after pg_upgrade in copy
> mode, you have no business to use the old cluster as well as the new
> one) is project policy, fair enough.

I don't see how this is project policy.  The directories for other 
versions of Postgres should be ignored as they are in other utilities, 
e.g. pg_basebackup.

> However, Postgres does not disallow to just create tablespaces in the
> same location from two different versions, so you don't need the
> pg_upgade scenario to get into this (pg_checksums checking the wrong
> cluster's data) problem:

Exactly.

Regards,
-- 
-David
david@pgmasters.net

Re: [Patch] Make pg_checksums skip foreign tablespace directories

From

Michael Paquier

Date:

20 February 2020, 06:55:32

On Wed, Feb 19, 2020 at 12:37:00PM -0600, David Steele wrote:
> On 2/19/20 2:13 AM, Michael Paquier wrote:
>> Please note that pg_internal.init is listed within noChecksumFiles in
>> basebackup.c, so we would miss any temporary pg_internal.init.PID if
>> we don't check after the file prefix and the base backup would issue
>> extra WARNING messages, potentially masking messages that could
>> matter.  So let's fix that as well.
>
> Agreed.  Though, I think pg_internal.init.XX should be excluded from the
> backup as well.

Sure.  That's the intention.  pg_rewind, pg_checksums and basebackup.c
are all the things on my list.

> As far as I can see, the pg_internal.init.XX will not be cleaned up by
> PostgreSQL on startup.  I've only tested this in 9.6 so far, but I don't see
> any differences in the code since then that would lead to better behavior.
> Perhaps that's also something we should fix?

Not sure that it is worth spending cycles on that at the beginning of
recovery as when a mapping file is written its temporary entry
truncates any existing one present matching its name.

> I'm really not a fan of a blind prefix match.  I think we should stick with
> only excluding files that are created by Postgres.

Thinking more on that, excluding any backup_label with a custom suffix
worries me as it could cause a potential breakage for exiting backup
solutions.  So attached is an updated patch making things in a
smarter way: I have added to the exclusion lists the possibility to
match an entry based on its prefix, or not, the choice being optional.
This solves the problem with pg_internal.PID and is careful to not
exclude unnecessary entries like suffixed backup labels or such.  This
leads to some extra duplication within pg_rewind, basebackup.c and
pg_checksums but I think we can live with that, and that makes
back-patching simpler.  Refactoring is still tricky though as it
relates to the use of paths across the backend and the frontend..

> So backup_label.old and
> tablespace_map.old should just be added to the exclude list.  That's how we
> have it in pgBackRest.

That would be a behavior change.  We could change that on HEAD, but I
don't think that this can be back-patched as this does not cause an
actual problem.

For now, my proposal is to fix the prefix first, and then let's look
at the business with tablespaces where needed.
--
Michael

Attachment

Re: [Patch] Make pg_checksums skip foreign tablespace directories

From

David Steele

Date:

20 February 2020, 13:37:15

On 2/20/20 12:55 AM, Michael Paquier wrote:
> On Wed, Feb 19, 2020 at 12:37:00PM -0600, David Steele wrote:
> 
>> As far as I can see, the pg_internal.init.XX will not be cleaned up by
>> PostgreSQL on startup.  I've only tested this in 9.6 so far, but I don't see
>> any differences in the code since then that would lead to better behavior.
>> Perhaps that's also something we should fix?
> 
> Not sure that it is worth spending cycles on that at the beginning of
> recovery as when a mapping file is written its temporary entry
> truncates any existing one present matching its name.

But since the name includes the backend's pid you would need to get 
lucky and have a new backend with the same pid create the file after a 
restart.  I tried it and the old temp file was left behind after restart 
and first connection to the database.

I doubt this is a big issue in the field, but it seems like it would be 
nice to do something about it.

>> I'm really not a fan of a blind prefix match.  I think we should stick with
>> only excluding files that are created by Postgres.
> 
> Thinking more on that, excluding any backup_label with a custom suffix
> worries me as it could cause a potential breakage for exiting backup
> solutions.  So attached is an updated patch making things in a
> smarter way: I have added to the exclusion lists the possibility to
> match an entry based on its prefix, or not, the choice being optional.
> This solves the problem with pg_internal.PID and is careful to not
> exclude unnecessary entries like suffixed backup labels or such.  This
> leads to some extra duplication within pg_rewind, basebackup.c and
> pg_checksums but I think we can live with that, and that makes
> back-patching simpler.  Refactoring is still tricky though as it
> relates to the use of paths across the backend and the frontend..

I'm not excited about the amount of code duplication between these three 
tools.  I know this was because of back-patching various issues in the 
past, but I really think we need to unify these data 
structures/functions in HEAD.

>> So backup_label.old and
>> tablespace_map.old should just be added to the exclude list.  That's how we
>> have it in pgBackRest.
> 
> That would be a behavior change.  We could change that on HEAD, but I
> don't think that this can be back-patched as this does not cause an
> actual problem.

Right, that should be in HEAD.

> For now, my proposal is to fix the prefix first, and then let's look
> at the business with tablespaces where needed.

OK.

Regards,
-- 
-David
david@pgmasters.net

Re: [Patch] Make pg_checksums skip foreign tablespace directories

From

Bernd Helmle

Date:

20 February 2020, 16:38:15

Am Dienstag, den 18.02.2020, 15:15 +0900 schrieb Michael Paquier:
> Fair point.  Now, while the proposed patch is right to use
> TABLESPACE_VERSION_DIRECTORY, shouldn't we use strncmp based on the
> length of TABLESPACE_VERSION_DIRECTORY instead of de->d_name?  It
> seems also to me that the code as proposed is rather fragile, and
> that
> we had better be sure that the check only happens when we are
> scanning
> entries within pg_tblspc.
> 

Yes, after thinking and playing around with it a little i share your
position. You can still easily cause pg_checksums to error out by just
having arbitrary files around in the reference tablespace locations.
Though i don't think this is something of a big issue, it looks strange
and misleading if pg_checksums just complains about files not belonging
to the scanned PostgreSQL data directory (even we explicitly note in
the docs that even tablespace locations are somehow taboo for DBAs to
put other files and/or directories in there).

So i propose a different approach like the attached patch tries to
implement: instead of just blindly iterating over directory contents
and filter them out, reference the tablespace location and
TABLESPACE_VERSION_DIRECTORY directly. This is done by a new function
scan_tablespaces() which is specialized in just follow the
symlinks/junctions in pg_tblspc and call scan_directory() with just
what it has found there. It will also honour directories, just in case
an experienced DBA has copied over the tablespace into pg_tblspc
directly.

> The issue with pg_internal.init.XX is quite different, so I think
> that
> it would be better to commit that separately first.

Agreed.

Thanks,
    Bernd

Attachment

0001-Make-scanning-pg_tblspc-more-robust.patch

Re: [Patch] Make pg_checksums skip foreign tablespace directories

From

Michael Paquier

Date:

21 February 2020, 06:07:12

 On Thu, Feb 20, 2020 at 07:37:15AM -0600, David Steele wrote:
> But since the name includes the backend's pid you would need to get lucky
> and have a new backend with the same pid create the file after a restart.  I
> tried it and the old temp file was left behind after restart and first
> connection to the database.
>
> I doubt this is a big issue in the field, but it seems like it would be nice
> to do something about it.

The natural area to do that would be around ResetUnloggedRelations().
Still that would require combine both operations to not do any
unnecessary lookups at the data folder paths.

> I'm not excited about the amount of code duplication between these three
> tools.  I know this was because of back-patching various issues in the past,
> but I really think we need to unify these data structures/functions in HEAD.

The lists are duplicated because we have never really figured out how
to combine this code in one place.  The idea was to have all the data
folder path logic and the lists within one header shared between the
frontend and backend but there was not much support for that on HEAD.

>> For now, my proposal is to fix the prefix first, and then let's look
>> at the business with tablespaces where needed.
>
> OK.

I'll let this patch round for a couple of extra day, and revisit it at
the beginning of next week.
--
Michael

Attachment

signature.asc

Re: [Patch] Make pg_checksums skip foreign tablespace directories

From

Michael Paquier

Date:

21 February 2020, 06:36:11

On Thu, Feb 20, 2020 at 05:38:15PM +0100, Bernd Helmle wrote:
> So i propose a different approach like the attached patch tries to
> implement: instead of just blindly iterating over directory contents
> and filter them out, reference the tablespace location and
> TABLESPACE_VERSION_DIRECTORY directly. This is done by a new function
> scan_tablespaces() which is specialized in just follow the
> symlinks/junctions in pg_tblspc and call scan_directory() with just
> what it has found there. It will also honour directories, just in case
> an experienced DBA has copied over the tablespace into pg_tblspc
> directly.

+       if (S_ISREG(st.st_mode))
+       {
+           pg_log_debug("ignoring file %s in pg_tblspc", de->d_name);
+           continue;
+       }
We don't do that for the normal directory scan path, so it does not
strike me as a good idea on consistency ground.  As a whole, I don't
see much point in having a separate routine which is just roughly a
duplicate of scan_directory(), and I think that we had better just add
the check looking for matches with TABLESPACE_VERSION_DIRECTORY
directly when having a directory, if subdir is "pg_tblspc".  That
also makes the patch much shorter.

+     * the direct path to it and check via lstat wether it exists.
s/wether/whether/, repeated three times.

We should have some TAP tests for that.  The first patch of this
thread from Michael had some, but I would just have added a dummy
tablespace with an empty file in 002_actions.pl, triggering an error
if pg_checksums is not fixed.  Dummy entries around the place where
dummy temp files are added would be fine.
--
Michael

Attachment

signature.asc

Re: [Patch] Make pg_checksums skip foreign tablespace directories

From

Kyotaro Horiguchi

Date:

21 February 2020, 08:37:15

Thank you David for decrypting my previous mail.., and your
translation was correct.

At Fri, 21 Feb 2020 15:07:12 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
>  On Thu, Feb 20, 2020 at 07:37:15AM -0600, David Steele wrote:
> > But since the name includes the backend's pid you would need to get lucky
> > and have a new backend with the same pid create the file after a restart.  I
> > tried it and the old temp file was left behind after restart and first
> > connection to the database.
> > 
> > I doubt this is a big issue in the field, but it seems like it would be nice
> > to do something about it.
> 
> The natural area to do that would be around ResetUnloggedRelations().
> Still that would require combine both operations to not do any
> unnecessary lookups at the data folder paths.
> 
> > I'm not excited about the amount of code duplication between these three
> > tools.  I know this was because of back-patching various issues in the past,
> > but I really think we need to unify these data structures/functions in HEAD.
> 
> The lists are duplicated because we have never really figured out how
> to combine this code in one place.  The idea was to have all the data
> folder path logic and the lists within one header shared between the
> frontend and backend but there was not much support for that on HEAD.
> 
> >> For now, my proposal is to fix the prefix first, and then let's look
> >> at the business with tablespaces where needed.
> > 
> > OK.
> 
> I'll let this patch round for a couple of extra day, and revisit it at
> the beginning of next week.


Thank you for the version.
I didn't look it closer bat it looks in the direction I wanted.
At a quick look, the following section attracted my eyes.

+                if (strncmp(de->d_name, excludeFiles[excludeIdx].name,
+                            strlen(excludeFiles[excludeIdx].name)) == 0)
+                {
+                    elog(DEBUG1, "file \"%s\" matching prefix \"%s\" excluded from backup",
+                         de->d_name, excludeFiles[excludeIdx].name);
+                    excludeFound = true;
+                    break;
+                }
+            }
+            else
+            {
+                if (strcmp(de->d_name, excludeFiles[excludeIdx].name) == 0)
+                {
+                    elog(DEBUG1, "file \"%s\" excluded from backup", de->d_name);
+                    excludeFound = true;
+                    break;
+                }

The two str[n]cmps are different only in matching length. I don't
think we don't need to differentiate the two message there, so we
could reduce the code as:

| cmplen = strlen(excludeFiles[].name);
| if (!prefix_patch)
|   cmplen++;
| if (strncmp(d_name, excludeFilep.name, cmplen) == 0)
|   ...
  
regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [Patch] Make pg_checksums skip foreign tablespace directories

From

David Steele

Date:

21 February 2020, 13:13:34

Hi Michael,

On 2/20/20 11:07 PM, Michael Paquier wrote:
 >   On Thu, Feb 20, 2020 at 07:37:15AM -0600, David Steele wrote:
 >> But since the name includes the backend's pid you would need to get 
lucky
 >> and have a new backend with the same pid create the file after a 
restart.  I
 >> tried it and the old temp file was left behind after restart and first
 >> connection to the database.
 >>
 >> I doubt this is a big issue in the field, but it seems like it would 
be nice
 >> to do something about it.
 >
 > The natural area to do that would be around ResetUnloggedRelations().
 > Still that would require combine both operations to not do any
 > unnecessary lookups at the data folder paths.

Yeah, that's what I was thinking as well, since there is already a 
directory scan there and doing the check would be very cheap.  It's not 
obvious how to combine these in the right way without moving a lot of 
code around to non-obvious places.

One solution might be to have each subsystem register a function that 
does checks/cleanup as each path/file is found in a common scan 
function.  That's a pretty major rework though, and I doubt there would 
be much appetite for it to solve such a minor problem.

 >> I'm not excited about the amount of code duplication between these three
 >> tools.  I know this was because of back-patching various issues in 
the past,
 >> but I really think we need to unify these data structures/functions 
in HEAD.
 >
 > The lists are duplicated because we have never really figured out how
 > to combine this code in one place.  The idea was to have all the data
 > folder path logic and the lists within one header shared between the
 > frontend and backend but there was not much support for that on HEAD.

Do you have the thread?  I'd like to see what was proposed and what the 
objections were.

Regards,
-- 
-David
david@pgmasters.net

Re: [Patch] Make pg_checksums skip foreign tablespace directories

From

David Steele

Date:

21 February 2020, 13:18:51

On 2/21/20 1:36 AM, Michael Paquier wrote:
 > On Thu, Feb 20, 2020 at 05:38:15PM +0100, Bernd Helmle wrote:
 >> So i propose a different approach like the attached patch tries to
 >> implement: instead of just blindly iterating over directory contents
 >> and filter them out, reference the tablespace location and
 >> TABLESPACE_VERSION_DIRECTORY directly. This is done by a new function
 >> scan_tablespaces() which is specialized in just follow the
 >> symlinks/junctions in pg_tblspc and call scan_directory() with just
 >> what it has found there. It will also honour directories, just in case
 >> an experienced DBA has copied over the tablespace into pg_tblspc
 >> directly.
 >
 > +       if (S_ISREG(st.st_mode))
 > +       {
 > +           pg_log_debug("ignoring file %s in pg_tblspc", de->d_name);
 > +           continue;
 > +       }
 > We don't do that for the normal directory scan path, so it does not
 > strike me as a good idea on consistency ground.  As a whole, I don't
 > see much point in having a separate routine which is just roughly a
 > duplicate of scan_directory(), and I think that we had better just add
 > the check looking for matches with TABLESPACE_VERSION_DIRECTORY
 > directly when having a directory, if subdir is "pg_tblspc".  That
 > also makes the patch much shorter.

+1.  This is roughly what pg_basebackup does and it seems simpler to me.

Regards,
-- 
-David
david@pgmasters.net

Re: [Patch] Make pg_checksums skip foreign tablespace directories

From

Michael Paquier

Date:

23 February 2020, 07:08:58

On Fri, Feb 21, 2020 at 05:37:15PM +0900, Kyotaro Horiguchi wrote:
> The two str[n]cmps are different only in matching length. I don't
> think we don't need to differentiate the two message there, so we
> could reduce the code as:
>
> | cmplen = strlen(excludeFiles[].name);
> | if (!prefix_patch)
> |   cmplen++;
> | if (strncmp(d_name, excludeFilep.name, cmplen) == 0)
> |   ...

Good idea.  Let's do things as you suggest.
--
Michael

Attachment

Re: [Patch] Make pg_checksums skip foreign tablespace directories

From

Michael Paquier

Date:

23 February 2020, 07:12:57

On Fri, Feb 21, 2020 at 08:13:34AM -0500, David Steele wrote:
> Do you have the thread?  I'd like to see what was proposed and what the
> objections were.

Here you go:
https://www.postgresql.org/message-id/20180205071022.GA17337@paquier.xyz
--
Michael

Attachment

signature.asc

Re: [Patch] Make pg_checksums skip foreign tablespace directories

From

Bernd Helmle

Date:

24 February 2020, 12:11:10

On Fri, 2020-02-21 at 15:36 +0900, Michael Paquier wrote:
> We don't do that for the normal directory scan path, so it does not
> strike me as a good idea on consistency ground.  As a whole, I don't
> see much point in having a separate routine which is just roughly a
> duplicate of scan_directory(), and I think that we had better just
> add
> the check looking for matches with TABLESPACE_VERSION_DIRECTORY
> directly when having a directory, if subdir is "pg_tblspc".  That
> also makes the patch much shorter.

To be honest, i dislike both: The other doubles logic (note: i don't
see it necessarily as 100% code duplication, since the semantic of
scan_tablespaces() is different: it serves as a driver for
scan_directories() and just resolves entries in pg_tblspc directly).

The other makes scan_directories() complicater to read and special
cases just a single directory in an otherwise more or less generic
function. E.g. it makes me uncomfortable if we get a pg_tblspc
somewhere else than PGDATA (if someone managed to create such a
directory in a foreign tablespace location for example), so we should
maintain an additional check if we really operate on the pg_tblspc we
have to. That was the reason(s) i've moved it into a separate function.

That said, i'll provide an updated patch with your ideas.

    Bernd

Re: [Patch] Make pg_checksums skip foreign tablespace directories

From

Michael Paquier

Date:

24 February 2020, 12:26:42

On Sun, Feb 23, 2020 at 04:08:58PM +0900, Michael Paquier wrote:
> Good idea.  Let's do things as you suggest.

Applied and back-patched this one down to 11.
--
Michael

Attachment

signature.asc

Re: [Patch] Make pg_checksums skip foreign tablespace directories

From

David Steele

Date:

25 February 2020, 00:44:04

Hi Michael,

On 2/24/20 7:26 AM, Michael Paquier wrote:
> On Sun, Feb 23, 2020 at 04:08:58PM +0900, Michael Paquier wrote:
>> Good idea.  Let's do things as you suggest.
> 
> Applied and back-patched this one down to 11.

FWIW, we took a slightly narrower approach to this issue in the 
pgBackRest patch (attached).

I don't have an issue with the prefix approach since it works and the 
Postgres project is very likely to catch it if there is a change in 
behavior.

For third-party projects, though, it might pay to be more conservative 
in case the behavior changes in the future, i.e. 
pg_internal.init[something] (but not pg_internal\.init[0-9]+) becomes valid.

Regards,
-- 
-David
david@pgmasters.net

Attachment

pginternalinit-pid.patch

Re: [Patch] Make pg_checksums skip foreign tablespace directories

From

Michael Paquier

Date:

25 February 2020, 02:33:09

On Mon, Feb 24, 2020 at 01:11:10PM +0100, Bernd Helmle wrote:
> The other makes scan_directories() complicated to read and special
> cases just a single directory in an otherwise more or less generic
> function.  E.g. it makes me uncomfortable if we get a pg_tblspc
> somewhere else than PGDATA (if someone managed to create such a
> directory in a foreign tablespace location for example), so we should
> maintain an additional check if we really operate on the pg_tblspc we
> have to. That was the reason(s) i've moved it into a separate function.

We are just discussing about the code path involving scanning a
directory, so that does not seem that bad to me.  I really think that
we should avoid duplicating the same logic around, and that we should
remain consistent with non-directory entries in those paths,
complaining with a proper failure if extra, unwanted files are
present.
--
Michael

Attachment

signature.asc

Re: [Patch] Make pg_checksums skip foreign tablespace directories

From

Bernd Helmle

Date:

26 February 2020, 17:02:22

Am Dienstag, den 25.02.2020, 11:33 +0900 schrieb Michael Paquier:
> I really think that
> we should avoid duplicating the same logic around, and that we should
> remain consistent with non-directory entries in those paths,
> complaining with a proper failure if extra, unwanted files are
> present.

Okay, please find an updated patch attached.

My feeling is that in the case we cannot successfully resolve a
tablespace location from pg_tblspc, we should error out, but i could
imagine that people would like to have just a warning instead.

I've updated the TAP test for pg_checksums by adding a dummy
subdirectory into the tablespace directory already created for the
corrupted relfilenode test, containing a file to process in case an
unpatched pg_checksums is run. With the patch attached, these
directories simply won't be considered to check.

Thanks,

    Bernd

Attachment

Re: [Patch] Make pg_checksums skip foreign tablespace directories

From

Michael Paquier

Date:

27 February 2020, 06:48:21

On Wed, Feb 26, 2020 at 06:02:22PM +0100, Bernd Helmle wrote:
> My feeling is that in the case we cannot successfully resolve a
> tablespace location from pg_tblspc, we should error out, but i could
> imagine that people would like to have just a warning instead.

Thanks, this patch is much cleaner in its approach, and I don't have
much to say about it except that the error message for lstat() should
be more consistent with the one above in scan_directory().  The
version for v11 has required a bit of rework, but nothing huge
either.

> I've updated the TAP test for pg_checksums by adding a dummy
> subdirectory into the tablespace directory already created for the
> corrupted relfilenode test, containing a file to process in case an
> unpatched pg_checksums is run. With the patch attached, these
> directories simply won't be considered to check.

What you have here is much more simple than the original proposal, so
I kept it.  Applied and back-patched down to 11.
--
Michael

Attachment

signature.asc