Home > mailing lists
Re: Transparent column encryption - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: Transparent column encryption
Date	December 9, 2021 10:04:26
Msg-id	163b3295-7046-a12f-055e-55b68c594fb9@enterprisedb.com Whole thread Raw
In response to	Re: Transparent column encryption (Jacob Champion <pchampion@vmware.com>)
Responses	Re: Transparent column encryption
List	pgsql-hackers
Tree view

On 12/9/21 01:12, Jacob Champion wrote:
> On Wed, 2021-12-08 at 02:58 +0100, Tomas Vondra wrote:
>>
>> On 12/8/21 00:26, Jacob Champion wrote:
>>> On Tue, 2021-12-07 at 22:21 +0100, Tomas Vondra wrote:
>>>> IMO it's impossible to solve this attack within TCE, because it requires 
>>>> ensuring consistency at the row level, but TCE obviously works at column 
>>>> level only.
>>>
>>> I was under the impression that clients already had to be modified to
>>> figure out how to encrypt the data? If part of that process ends up
>>> including enforcement of encryption for a specific column set, then the
>>> addition of AEAD data could hypothetically be part of that hand-
>>> waviness.
>>
>> I think "transparency" here means the client just uses the regular
>> prepared-statement API without having to explicitly encrypt/decrypt any
>> data. The problem is we can't easily tie this to other columns in the
>> table, because the client may not even know what values are in those
>> columns.
> 
> The way I originally described my request -- "I'd like to be able to
> tie an encrypted value to other column (or external) data" -- was not
> very clear.
> 
> With my proposed model -- where the DBA (and the server) are completely
> untrusted, and the DBA needs to be prevented from using the encrypted
> value -- I don't think there's a useful way for the client to use
> associated data that comes from the server. The client has to know what
> the AD should be beforehand, because otherwise the DBA can make it so
> the server returns whatever is correct.
> 

True. With untrusted server the additional data would have to come from
some other source. Say, an isolated auth system or so.

>> Imagine you do this
>>
>>   UPDATE t SET encrypted_column = $1 WHERE another_column = $2;
>>
>> but you want to ensure the encrypted value belongs to a particular row
>> (which may or may not be identified by the another_column value). How
>> would the client do that? Should it fetch the value or what?
>>
>> Similarly, what if the client just does
>>
>>   SELECT encrypted_column FROM t;
>>
>> How would it verify the values belong to the row, without having all the
>> data for the row (or just the required columns)?
> 
> So with my (hopefully more clear) model above, it wouldn't. The client
> would already have the AD, and somehow tell libpq what that data was
> for the query.
> 
> The rabbit hole I led you down is one where we use the rest of the row
> as AD, to try to freeze pieces of it in place. That might(?) have some
> useful security properties (if the client defines its use and doesn't
> defer to the server). But it's not what I intended to propose and I'd
> have to think about that case some more.
> 

OK

> In my credit card example, I'm imagining something like (forgive the
> contrived syntax):
> 
>     SELECT address, :{aead(users.credit_card, 'user@example.com')}
>       FROM users WHERE email = 'user@example.com';
> 
>     UPDATE users
>        SET :{aead(users.credit_card, 'user@example.com')} = '1234-...'
>      WHERE email = 'user@example.com';
> 
> The client explicitly links a table's column to its AD for the duration
> of the query. This approach can't scale to
> 
>     SELECT credit_card FROM users;
> 
> because in this case the AD for each row is different, but I'd argue
> that's ideal for this particular case. The client doesn't need to (and
> probably shouldn't) grab everyone's credit card details all at once, so
> there's no reason to optimize for it.
> 

Maybe, but it seems like a rather annoying limitation, as it restricts
the client to single-row queries (or at least it looks like that to me).
Yes, it may be fine for some use cases, but I'd bet a DBA who can modify
data can do plenty other things - swapping "old" values, which will have
the right AD, for example.

>>> Unless "transparent" means that the client completely defers to the
>>> server on whether to encrypt or not, and silently goes along with it if
>>> the server tells it not to encrypt?
>> I think that's probably a valid concern - a "bad DBA" could alter the
>> table definition to not contain the "ENCRYPTED" bits, and then peek at
>> the plaintext values.
>>
>> But it's not clear to me how exactly would the AEAD prevent this?
>> Wouldn't that be also specified on the server, somehow? In which case
>> the DBA could just tweak that too, no?
>>
>> In other words, this issue seems mostly orthogonal to the AEAD, and the
>> right solution would be to allow the client to define which columns have
>> to be encrypted (in which case altering the server definition would not
>> be enough).
> 
> Right, exactly. When I mentioned AEAD I had assumed that "allow the
> client to define which columns have to be encrypted" was already
> planned or in the works; I just misunderstood pieces of Peter's email.
> It's that piece where a client would probably have to add details
> around AEAD and its use.
> 
>>> That would only protect against a
>>> _completely_ passive DBA, like someone reading unencrypted backups,
>>> etc. And that still has a lot of value, certainly. But it seems like
>>> this prototype is very close to a system where the client can reliably
>>> secure data even if the server isn't trustworthy, if that's a use case
>>> you're interested in.
>>
>> Right. IMHO the "passive attacker" is a perfectly fine model for use
>> cases that would be fine with e.g. pgcrypto if there was no risk of
>> leaking plaintext values to logs, system catalogs, etc.
>>
>> If we can improve it to provide (at least some) protection against
>> active attackers, that'd be a nice bonus.
> 
> I agree that resistance against offline attacks is a useful step
> forward (it seems to be a strict improvement over pgcrypto). I have a
> feeling that end users will *expect* some protection against online
> attacks too, since an evil DBA is going to be well-positioned to do
> exactly that.
> 

Yeah.

>>>> It's probably possible to get something like this (row-level AEAD) by 
>>>> encrypting enriched data, i.e. not just the card number, but {user ID, 
>>>> card number} or something like that, and verify that in the webapp. The 
>>>> problem of course is that the "user ID" is just another column in the 
>>>> table, and there's nothing preventing the DBA from modifying that too.
>>>
>>> Right. That's why the client has to be able to choose AD according to
>>> the application. In my previous example, the victim's email address can
>>> be copied by the DBA, but they wouldn't be able to authenticate as that
>>> user and couldn't convince the client to use the plaintext on their
>>> behalf.
>>
>> Well, yeah. But I'm not sure how to make that work easily, because the
>> client may not have the data :-(
>>
>> I was thinking about using a composite data type combining the data with
>> the extra bits - that'd not be all that transparent as it'd require the
>> client to build this manually and then also cross-check it after loading
>> the data. So the user would be responsible for having all the data.
>>
>> But doing that automatically/transparently seems hard, because how would
>> you deal e.g. with SELECT queries reading data through a view or CTE?
>>
>> How would you declare this, either at the client or server?
> 
> I'll do some more thinking on the case you're talking about here, where
> pieces of the row are transparently tied together.
> 

OK. In any case, I think we shouldn't require this capability from the
get go - it's fine to get the simple version done first, which gives us
privacy / protects against passive attacker. And then sometime in the
future improve this further.

>> Do any other databases have this capability? How do they do it?
> 
> BigQuery advertises AEAD support. I don't think their model is the same
> as ours, though; from the docs it looks like it's essentially pgcrypto,
> where you tell the server to encrypt stuff for you.
> 

Pretty sure it's server-side. The docs say it's for encryption at rest,
all the examples do the encryption/decryption in SQL, etc.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
pgsql-hackers by date:
From: Peter Eisentraut
Date: 09 December 2021, 09:26:01
Subject: Re: Replace uses of deprecated Python module distutils.sysconfig
From: Amit Kapila
Date: 09 December 2021, 10:05:38
Subject: Re: parallel vacuum comments
Re: Transparent column encryption - Mailing list pgsql-hackers

Previous

Next