Re: making the backend's json parser work in frontend code - Mailing list pgsql-hackers

From Mark Dilger
Subject Re: making the backend's json parser work in frontend code
Date
Msg-id A7971FA1-D8A2-4A0A-BFDD-496FEBF9DE25@enterprisedb.com
Whole thread Raw
In response to Re: making the backend's json parser work in frontend code  (David Steele <david@pgmasters.net>)
Responses Re: making the backend's json parser work in frontend code
List pgsql-hackers

> On Jan 24, 2020, at 8:36 AM, David Steele <david@pgmasters.net> wrote:
>
>> I don't entirely follow why we're discussing this at all, if the
>> requirement is backing up a PG data directory.  There are not, and
>> are never likely to be, any legitimate files with non-ASCII names
>> in that context.  Why can't we just skip any such files?
>
> It's not uncommon in my experience for users to drop odd files into PGDATA (usually versioned copies of
postgresql.conf,etc.), but I agree that it should be discouraged.  Even so, I don't recall ever seeing any non-ASCII
filenames.
>
> Skipping files sounds scary, I'd prefer an error or a warning (and then base64 encode the filename).

I tend to agree with Tom.  We know that postgres doesn’t write any such files now, and if we ever decided to change
that,we could change this, too.  So for now, we can assume any such files are not ours.  Either the user manually
scribbledin this directory, or had a tool (antivirus checksum file, vim .WHATEVER.swp file, etc) that did so.  Raising
anerror would break any automated backup process that hit this issue, and base64 encoding the file name and backing up
thefile contents could grab data that the user would not reasonably expect in the backup.  But this argument applies
equallywell to such files regardless of filename encoding.  It would be odd to back them up when they happen to be
validUTF-8/ASCII/whatever, but not do so when they are not valid.  I would expect, therefore, that we only back up
fileswhich match our expected file name pattern and ignore (perhaps with a warning) everything else. 

Quoting from Robert’s email about why we want a backup manifest seems to support this idea, at least as I see it:

> So, let's suppose we invent a backup manifest. What should it contain?
> I imagine that it would consist of a list of files, and the lengths of
> those files, and a checksum for each file. I think you should have a
> choice of what kind of checksums to use, because algorithms that used
> to seem like good choices (e.g. MD5) no longer do; this trend can
> probably be expected to continue. Even if we initially support only
> one kind of checksum -- presumably SHA-something since we have code
> for that already for SCRAM -- I think that it would also be a good
> idea to allow for future changes. And maybe it's best to just allow a
> choice of SHA-224, SHA-256, SHA-384, and SHA-512 right out of the
> gate, so that we can avoid bikeshedding over which one is secure
> enough. I guess we'll still have to argue about the default. I also
> think that it should be possible to build a manifest with no
> checksums, so that one need not pay the overhead of computing
> checksums if one does not wish. Of course, such a manifest is of much
> less utility for checking backup integrity, but you can still check
> that you've got the right files, which is noticeably better than
> nothing.  The manifest should probably also contain a checksum of its
> own contents so that the integrity of the manifest itself can be
> verified. And maybe a few other bits of metadata, but I'm not sure
> exactly what.  Ideas?
>
>
>
> Once we invent the concept of a backup manifest, what do we need to do
> with them? I think we'd want three things initially:
>
>
>
> (1) When taking a backup, have the option (perhaps enabled by default)
> to include a backup manifest.
> (2) Given an existing backup that has not got a manifest, construct one.
> (3) Cross-check a manifest against a backup and complain about extra
> files, missing files, size differences, or checksum mismatches.


Nothing in there sounds to me like it needs to include random cruft.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






pgsql-hackers by date:

Previous
From: David Steele
Date:
Subject: Re: making the backend's json parser work in frontend code
Next
From: Juan José Santamaría Flecha
Date:
Subject: Re: Allow to_date() and to_timestamp() to accept localized names