Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3 - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3 |
Date | |
Msg-id | CA+TgmoaUw3QaHoj_QnkyFVMFLnfsDX32ctOz_zLPReQ-RjAioQ@mail.gmail.com Whole thread Raw |
In response to | Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3 (Antonin Houska <ah@cybertec.at>) |
Responses |
Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3
|
List | pgsql-hackers |
On Wed, Apr 24, 2019 at 10:48 AM Antonin Houska <ah@cybertec.at> wrote: > Attached is the next version. It tries to address various problems pointed out > upthread, including documentation. It's not immediately evident to me, perhaps because I haven't looked at this stuff in a while, what the purpose of 0001 and 0002 is. I think it'd be helpful if you would generate these patches with 'git format-patch' and include a meaningful commit message in each patch of the series. + <para> + This setting specifies path to executable file that returns + either <literal>encryption_key=</literal> for key + and <literal>encryption_password=</literal> for password. + </para> This is incomprehensible. Executables don't return strings. Also, you can return "either a or b" or you can return "both a and b," but you can't return "either a and b". - Build with support for <acronym>SSL</acronym> (encrypted) - connections. This requires the <productname>OpenSSL</productname> - package to be installed. <filename>configure</filename> will check - for the required header files and libraries to make sure that + Build with support for on-disk data encryption + or <acronym>SSL</acronym> (encrypted) connections. This requires + the <productname>OpenSSL</productname> package to be + installed. <filename>configure</filename> will check for the + required header files and libraries to make sure that I think we should instead just remove the word 'connections'. We don't want to have multiple places in the documentation listing all the things for which OpenSSL is required. Generally, the documentation changes here have a lot of stuff about key-or-password, which is quite confusing. It seems like the idea is that you accept either a key or a password from which a key is derived, but I don't think that's explained super-clearly, and I wonder whether it's unnecessary complexity. Why not just require a key always? Another general point is that the documentation, comments, and README need some work on the English. They contain lots of information, but they read somewhat awkwardly and are hard to understand in some places. Wherever you are adding a large hunk of code into the middle of an existing function (e.g. XLogWrite), please consider putting the logic in a new function and calling it from the original function, instead of just inlining the entire thing. xloginsert.c adds a new #include even though there are no other changes to that file. That should not be necessary. If it is, you've broken the principle the header files are compilable independently of other header files (except postgres.h, which is a prerequisite for every header file). -XLogRead(char *buf, int segsize, TimeLineID tli, XLogRecPtr startptr, - Size count) +XLogRead(char *buf, int segsize, TimeLineID tli, XLogRecPtr startptr, Size count) Spurious hunk. Please try to avoid unnecessary changes. + * XXX Currently the function is only called with startptr at page + * boundary. If it should change, encryption needs to reflect this fact, + * i.e. read data from the beginning of the page. Actually this is a + * reason to adjust and use walsender.c:XLogRead(). This comment and the similar comment at the end of the function are not very clear. It sorta sounds, though, like maybe the decryption logic should move from here to the caller or something. You seem to be saying that, after your changes, this function depends on the caller using it in a certain way, which is often a sign that you haven't abstracted things in the right way. And if walsender.c's version of the logic is better, why not also do that stuff here? @@ -52,7 +54,6 @@ uint32 bootstrap_data_checksum_version = 0; /* No checksum */ - #define ALLOC(t, c) \ ((t *) MemoryContextAllocZero(TopMemoryContext, (unsigned)(c) * sizeof(t))) Another useless hunk. + { + RelFileNode fromNode = {srctablespace, src_dboid, InvalidOid}; + RelFileNode toNode = {dsttablespace, dboid, InvalidOid}; + + copydir(srcpath, dstpath, &fromNode, &toNode); + } I suggest instead declaring these variables in the surrounding scope and initializing them here. This style is unlike most PostgreSQL code. + case WAIT_EVENT_KDF_FILE_READ: + event_name = "KDFFileRead"; + break; New wait events need documentation. +are encrypted differently. Furthermore, as the Postgres page starts with LSN, +even if only the last encryption block of the page changes, the whole cipher +text of the page will be different after the next encryption. But for a hint-bit-only change, this would not be true, unless we force wal_log_hints. Sounds like we're doing that, but it might be worth mentioning it here also. There's also the question of unlogged tables, which do not have an LSN. This logic wouldn't apply there, which might be a problem. +If the encryption is enabled, the following requirements need to be taken into +account: + +1. The file buffer cannot be positioned at arbitrary offset of the file. If I think it's no good at all to think about enforcing requirements like this only when data_encryption is used. That's just going to lead to bugs. I think we need to either find a way to mask this restriction from the users of the buffile facility, or else revise all those users so that they respect this new rule and enforce it with an unconditional assert. That brings me to another point, which is that a complex patch like this really needs to be divided up into smaller patches for review. I see you've started that a little bit (but see comments on commit messages above) but you should try to go further with it. For example, changing the buffile contract so that there is some new rule about positioning should be done as a separate patch (or not done at all). That patch should be written so that it applies before the main encryption patch, have comments and asserts explaining the new rules, etc. +To store other kinds of data encrypted than the ones above, developers are +advised to use BufFileWriteTransient() and BufFileReadTransient() functions The previous paragraph is about BufFileWrite() and BufFileRead(), and that section is titled "temporary files." Now here you say that for other kinds of data (not relations, xlog, or temporary files) we should use BufFileWriteTransient(). That seems pretty weird, because data that it is not temporary files would presumably be, uh, permanent files? So then why is Transient in the name? I imagine there are good reasons here but it needs to be made clearer. +pg_upgrade. (Therefore the encrypted cluster does not support pg_upgrade with +the --link option.) That seems pretty unfortunate. Any way to do better? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: