Re: Transparent column encryption - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Transparent column encryption |
Date | |
Msg-id | 163b3295-7046-a12f-055e-55b68c594fb9@enterprisedb.com Whole thread Raw |
In response to | Re: Transparent column encryption (Jacob Champion <pchampion@vmware.com>) |
Responses |
Re: Transparent column encryption
|
List | pgsql-hackers |
On 12/9/21 01:12, Jacob Champion wrote: > On Wed, 2021-12-08 at 02:58 +0100, Tomas Vondra wrote: >> >> On 12/8/21 00:26, Jacob Champion wrote: >>> On Tue, 2021-12-07 at 22:21 +0100, Tomas Vondra wrote: >>>> IMO it's impossible to solve this attack within TCE, because it requires >>>> ensuring consistency at the row level, but TCE obviously works at column >>>> level only. >>> >>> I was under the impression that clients already had to be modified to >>> figure out how to encrypt the data? If part of that process ends up >>> including enforcement of encryption for a specific column set, then the >>> addition of AEAD data could hypothetically be part of that hand- >>> waviness. >> >> I think "transparency" here means the client just uses the regular >> prepared-statement API without having to explicitly encrypt/decrypt any >> data. The problem is we can't easily tie this to other columns in the >> table, because the client may not even know what values are in those >> columns. > > The way I originally described my request -- "I'd like to be able to > tie an encrypted value to other column (or external) data" -- was not > very clear. > > With my proposed model -- where the DBA (and the server) are completely > untrusted, and the DBA needs to be prevented from using the encrypted > value -- I don't think there's a useful way for the client to use > associated data that comes from the server. The client has to know what > the AD should be beforehand, because otherwise the DBA can make it so > the server returns whatever is correct. > True. With untrusted server the additional data would have to come from some other source. Say, an isolated auth system or so. >> Imagine you do this >> >> UPDATE t SET encrypted_column = $1 WHERE another_column = $2; >> >> but you want to ensure the encrypted value belongs to a particular row >> (which may or may not be identified by the another_column value). How >> would the client do that? Should it fetch the value or what? >> >> Similarly, what if the client just does >> >> SELECT encrypted_column FROM t; >> >> How would it verify the values belong to the row, without having all the >> data for the row (or just the required columns)? > > So with my (hopefully more clear) model above, it wouldn't. The client > would already have the AD, and somehow tell libpq what that data was > for the query. > > The rabbit hole I led you down is one where we use the rest of the row > as AD, to try to freeze pieces of it in place. That might(?) have some > useful security properties (if the client defines its use and doesn't > defer to the server). But it's not what I intended to propose and I'd > have to think about that case some more. > OK > In my credit card example, I'm imagining something like (forgive the > contrived syntax): > > SELECT address, :{aead(users.credit_card, 'user@example.com')} > FROM users WHERE email = 'user@example.com'; > > UPDATE users > SET :{aead(users.credit_card, 'user@example.com')} = '1234-...' > WHERE email = 'user@example.com'; > > The client explicitly links a table's column to its AD for the duration > of the query. This approach can't scale to > > SELECT credit_card FROM users; > > because in this case the AD for each row is different, but I'd argue > that's ideal for this particular case. The client doesn't need to (and > probably shouldn't) grab everyone's credit card details all at once, so > there's no reason to optimize for it. > Maybe, but it seems like a rather annoying limitation, as it restricts the client to single-row queries (or at least it looks like that to me). Yes, it may be fine for some use cases, but I'd bet a DBA who can modify data can do plenty other things - swapping "old" values, which will have the right AD, for example. >>> Unless "transparent" means that the client completely defers to the >>> server on whether to encrypt or not, and silently goes along with it if >>> the server tells it not to encrypt? >> I think that's probably a valid concern - a "bad DBA" could alter the >> table definition to not contain the "ENCRYPTED" bits, and then peek at >> the plaintext values. >> >> But it's not clear to me how exactly would the AEAD prevent this? >> Wouldn't that be also specified on the server, somehow? In which case >> the DBA could just tweak that too, no? >> >> In other words, this issue seems mostly orthogonal to the AEAD, and the >> right solution would be to allow the client to define which columns have >> to be encrypted (in which case altering the server definition would not >> be enough). > > Right, exactly. When I mentioned AEAD I had assumed that "allow the > client to define which columns have to be encrypted" was already > planned or in the works; I just misunderstood pieces of Peter's email. > It's that piece where a client would probably have to add details > around AEAD and its use. > >>> That would only protect against a >>> _completely_ passive DBA, like someone reading unencrypted backups, >>> etc. And that still has a lot of value, certainly. But it seems like >>> this prototype is very close to a system where the client can reliably >>> secure data even if the server isn't trustworthy, if that's a use case >>> you're interested in. >> >> Right. IMHO the "passive attacker" is a perfectly fine model for use >> cases that would be fine with e.g. pgcrypto if there was no risk of >> leaking plaintext values to logs, system catalogs, etc. >> >> If we can improve it to provide (at least some) protection against >> active attackers, that'd be a nice bonus. > > I agree that resistance against offline attacks is a useful step > forward (it seems to be a strict improvement over pgcrypto). I have a > feeling that end users will *expect* some protection against online > attacks too, since an evil DBA is going to be well-positioned to do > exactly that. > Yeah. >>>> It's probably possible to get something like this (row-level AEAD) by >>>> encrypting enriched data, i.e. not just the card number, but {user ID, >>>> card number} or something like that, and verify that in the webapp. The >>>> problem of course is that the "user ID" is just another column in the >>>> table, and there's nothing preventing the DBA from modifying that too. >>> >>> Right. That's why the client has to be able to choose AD according to >>> the application. In my previous example, the victim's email address can >>> be copied by the DBA, but they wouldn't be able to authenticate as that >>> user and couldn't convince the client to use the plaintext on their >>> behalf. >> >> Well, yeah. But I'm not sure how to make that work easily, because the >> client may not have the data :-( >> >> I was thinking about using a composite data type combining the data with >> the extra bits - that'd not be all that transparent as it'd require the >> client to build this manually and then also cross-check it after loading >> the data. So the user would be responsible for having all the data. >> >> But doing that automatically/transparently seems hard, because how would >> you deal e.g. with SELECT queries reading data through a view or CTE? >> >> How would you declare this, either at the client or server? > > I'll do some more thinking on the case you're talking about here, where > pieces of the row are transparently tied together. > OK. In any case, I think we shouldn't require this capability from the get go - it's fine to get the simple version done first, which gives us privacy / protects against passive attacker. And then sometime in the future improve this further. >> Do any other databases have this capability? How do they do it? > > BigQuery advertises AEAD support. I don't think their model is the same > as ours, though; from the docs it looks like it's essentially pgcrypto, > where you tell the server to encrypt stuff for you. > Pretty sure it's server-side. The docs say it's for encryption at rest, all the examples do the encryption/decryption in SQL, etc. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: