Re: Transparent column encryption - Mailing list pgsql-hackers
From | Jacob Champion |
---|---|
Subject | Re: Transparent column encryption |
Date | |
Msg-id | 5003d222-5975-38c1-e471-888e642f23aa@timescale.com Whole thread Raw |
In response to | Re: Transparent column encryption (Peter Eisentraut <peter.eisentraut@enterprisedb.com>) |
Responses |
Re: Transparent column encryption
|
List | pgsql-hackers |
On 12/31/22 06:17, Peter Eisentraut wrote: > On 21.12.22 06:46, Peter Eisentraut wrote: >> And another update. The main changes are that I added an 'unspecified' >> CMK algorithm, which indicates that the external KMS knows what it is >> but the database system doesn't. This was discussed a while ago. I >> also changed some details about how the "cmklookup" works in libpq. Also >> added more code comments and documentation and rearranged some code. Trying to delay a review until I had "completed it" has only led to me not reviewing, so here's a partial one. Let me know what pieces of the implementation and/or architecture you're hoping to get more feedback on. I like the existing "caveats" documentation, and I've attached a sample patch with some more caveats documented, based on some of the upthread conversation: - text format makes fixed-length columns leak length information too - you only get partial protection against the Evil DBA - RSA-OAEP public key safety (Feel free to use/remix/discard as desired.) When writing the paragraph on RSA-OAEP I was reminded that we didn't really dig into the asymmetric/symmetric discussion. Assuming that most first-time users will pick the builtin CMK encryption method, do we still want to have an asymmetric scheme implemented first instead of a symmetric keywrap? I'm still concerned about that public key, since it can't really be made public. (And now that "unspecified" is available, I think an asymmetric CMK could be easily created by users that have a niche use case, and then we wouldn't have to commit to supporting it forever.) For the padding caveat: > + There is no concern if all values are of the same length (e.g., credit > + card numbers). I nodded along to this statement last year, and then this year I learned that CCNs aren't fixed-length. So with a 16-byte block, you're probably going to be able to figure out who has an American Express card. The column encryption algorithm is set per-column -- but isn't it tightly coupled to the CEK, since the key length has to match? From a layperson perspective, using the same key to encrypt the same plaintext under two different algorithms (if they happen to have the same key length) seems like it might be cryptographically risky. Is there a reason I should be encouraged to do that? With the loss of \gencr it looks like we also lost a potential way to force encryption from within psql. Any plans to add that for v1? While testing, I forgot how the new option worked and connected with `column_encryption=on` -- and then I accidentally sent unencrypted data to the server, since `on` means "not enabled". :( The server errors out after the damage is done, of course, but would it be okay to strictly validate that option's values? Are there plans to document client-side implementation requirements, to ensure cross-client compatibility? Things like the "PG\x00\x01" associated data are buried at the moment (or else I've missed them in the docs). If you're holding off until the feature is more finalized, that's fine too. Speaking of cross-client compatibility, I'm still disconcerted by the ability to write the value "hello world" into an encrypted integer column. Should clients be required to validate the text format, using the attrealtypid? It occurred to me when looking at the "unspecified" CMK scheme that the CEK doesn't really have to be an encryption key at all. In that case it can function more like a (possibly signed?) cookie for lookup, or even be ignored altogether if you don't want to use a wrapping scheme (similar to JWE's "direct" mode, maybe?). So now you have three ways to look up or determine a column encryption key (CMK realm, CMK name, CEK cookie)... is that a concept worth exploring in v1 and/or the documentation? Thanks, --Jacob
Attachment
pgsql-hackers by date: