Home > mailing lists
Re: Internal key management system - Mailing list pgsql-hackers

From	Stephen Frost
Subject	Re: Internal key management system
Date	October 28, 2020 17:22:25
Msg-id	20201028172224.GF16415@tamriel.snowman.net Whole thread Raw
In response to	Re: Internal key management system (Craig Ringer <craig.ringer@enterprisedb.com>)
Responses	Re: Internal key management system
List	pgsql-hackers
Tree view
Greetings,

* Craig Ringer (craig.ringer@enterprisedb.com) wrote:
> On Mon, Oct 26, 2020 at 11:02 PM Stephen Frost <sfrost@snowman.net> wrote:
>
> TL;DR:
>
> * Important to check that key rotation is possible on a replica, i.e.
> primary and standby can have different cluster passphrase and KEK
> encrypting the same WAL and heap keys;

I agree that key rotation would certainly be good to have.

> * with a HSM we can't read the key out, so a pluggable KEK operations
> context or a configurable URI for the KEK is necessary

There's a lot of options around HSMs, the Linux crypto API, potential
different encryption libraries, et al.  One thing that I'm not sure
we're being clear enough on here is when we're talking about a KEK (key
encryption key) vs. when we're talking about actually off-loading all of
the encryption to an HSM or to an OpenSSL engine (which might in turn
use the Linux crypto API...), etc.

Agreed that, with some HSMs, we aren't able to actually pull out the
key.  Depending on the HSM, it may or may not be able to perform
encryption and decryption with any kind of speed and therefore we should
have options which don't require that.  This would be the typical case
where we'd have a KEK which encrypts a key we have stored and then that
key is what's actually used for the encryption/decryption of the data.

> * I want the SQL key and SQL wrap/unwrap part in a separate patch, I
> don't think it's fully baked and oppose its inclusion in its current
> form

I'm generally a fan of having something at the SQL level, but I agree
that it doesn't need to be part of this initial capability and could be
done later as a separate patch.

> Most importantly - I don't think the SQL key adds anything really
> crucial that we cannot do at the SQL level with an extension.  An
> extension "pg_wrap" could provide pg_wrap() and pg_unwrap() already,
> using a single master key much like the SQL key proposed in this
> patch. To store the master key it could:

Lots of things can be done in extensions but, at least for my part, I'd
much rather see us build in an SQL key capability (with things like
grammar support and being able to tie to to a role cleanly) than to try
and figure out how to make this work as an extension.

> That way we haven't baked some sort of limited wrap/unwrap into Pg's
> long term user visible API. I'd be totally happy for such a SQL key
> wrap/unwrap to become part of pgcrypto, or a separate extension that
> uses pgcrypto, if you're worried about having it available to users. I
> just don't really want it in src/backend in its current form.

There's no shortage of interfaces that exist in other database systems
for this that we can look at to help guide us in coming up with a good
API here.  All that said, we can debate that on another thread and
independently of this discussion around TDE.

> OTHER TRANSPARENT ENCRYPTION USE CASES
> ----
>
> Does this patch get in the way of supporting other kinds of
> transparent encryption that are frequently requested and are in use on
> other systems already?
>
> I don't think so. Whole-cluster encryption is quite separate and the
> proposed patch doesn't seem to do anything that'd make table-, row- or
> column-level encryption, per-user key management, etc any harder.
>
> Specific use cases I looked at:
>
> * Finer grained keying than whole-cluster for transparent
> encryption-at-rest. As soon as we have relations that require user
> session supplied information to allow the backend to read the relation
> we get into a real mess with autovacuum, logical decoding, etc. So if
> anyone wants to implement that sorts of thing they're probably going
> to want to do so separately to block-level whole-cluster encryption,
> in a way that preserves the normal page and page item structure and
> encrypts the row data only.

I tend to agree with this.

> * Client-driver-assisted transparently encrypted
> at-rest-and-in-transit data, where the database engine doesn't have
> the encrypt/decrypt keys at all. Again in this case they're going to
> have to do that at the row level or column level, not the block
> (relfilenode extents and WAL) level, otherwise we can't provide
> autovacuum etc.

+100 to having client-driver-assisted encryption, this solves real
attack vectors which traditional TDE simply doesn't, compared to
filesystem or block device level encryption (even though lots of
people seem to think it does, which is bizarre to me).

> > That
> > said- I don't think we necessarily want to throw out tho command-based
> > option, as users may wish to use a vaulting solution or similar instead
> > of an HSM.
>
> I agree. I wasn't proposing to throw out the command based approach,
> just provide a way to inform postgres that it should do operations
> with the KEK using an external engine instead of deriving its own KEK
> from a passphrase and other inputs.

I would think we'd want to enable admins to be able to control if what
is being provided is a KEK (where the key is then decrypted by PG and PG
then uses whatever libraries it's built with to perform the encryption
and decryption in PG process space), or an engine/offloading
configuration (where PG doesn't ever see the actual key and all
encryption and decryption is done outside of PG's control by an HSM or
the Linux kernel through the crypto API or whatever).

The use-cases I'm thinking about:

- User has a Yubikey, but would like PG to be able to write more than
  one block at a time.  In this case, the Yubikey would have a KEK which
  PG doesn't ever see.  PG would have an encrypted blob that it then
  asks the yubikey to decrypt which contains the actual key that's then
  kept in PG's memory to perform the encryption/decryption.  Naturally,
  if that key is stolen then an attacker could decrypt the entire
  database, even if they don't have the yubikey.  An attacker could
  acquire that key by having sufficient access on the PG sever to be
  able to read PG's memory.

- User has a Thales Luna PCIe HSM, or similar.  In this case, the user
  wants *all* of the encryption/decryption happening on the HSM and none
  of it happening in PG space, making it impossible for an attacker to
  acquire the actual key.

- User has a yubikey, similar to #1, but would like to have the Linux
  kernel used to safe-guard the actual key used.  This is a bit of an
  in-between area between the first case above and the second-
  specifically, a yubikey could have the KEK but then the actual data
  encryption key isn't given to PG, it's put into the Linux kernel's
  keyring and PG uses (perhaps through OpenSSL) the Linux crypto API to
  off-load the actual encryption and decryption to have that happening
  outside of PG's process space.  This would make it much more difficult
  for an attacker to acquire the key if they only have control over PG
  or the postgres unix account, since the Linux kernel would prevent
  access to it, but it wouldn't require a HSM crypto accelerator.  Of
  course, should an attacker gain root or direct physical access to the
  system somehow, they might be able to acquire the actual data
  encryption key that way.

- User has a vaulting solution, and perhaps wants to store the actual
  encryption/decryption key there, or perhaps the user wants to store a
  passphrase in the vault and have PG derive the actual key from that.
  Either seems like it could be reasonable.

- User hasn't got anything special and just wants to keep it simple by
  using a passphrase that's entered when PG is started up.

> > What I am curious about though- what are the thoughts around
> > using a vaulting solution's command-line tool vs. writing code to work
> > with an API?
>
> I think the code that fetches the cluster passphrase from a command
> should be interceptable by a hook, so organisations with Wacky
> Security Policies Written By People Who Have Heard About Computers But
> Never Used One can jump through the necessary hoops. I am of course
> absolutely not speaking from experience here, no, not at all... see
> ssl_passphrase_function in src/backend/libpq/be-secure-openssl.c, and
> see src/test/modules/ssl_passphrase_callback/ssl_passphrase_func.c .
>
> So I suggest something like that - a hook that by default calls an
> external command but can by overridden by an extension. It wouldn't be
> openssl specific like the server key passphrase example though. That
> was done with an openssl specific hook because we don't know if we're
> going to need a passphrase at all until openssl has opened the key. In
> the cluster encryption case we'll know if we're doing our own KEK+HMAC
> generation or not without having to ask the SSL library.

What I'm wondering about here is if we should make it an explicit option
for a user to pick through the server configuration about if they're
giving PG a direct key to use, a KEK that's actually meant to decrypt
the data key, a way to fetch the direct key or the KEK, or a engine
which has the KEK to ask to decrypt the data key, etc.  If we can come
up with a way to configure PG that will support the different use cases
outlined above without being overly complicated, that'd be great.  I'm
not sure that I see that in what you've proposed here, but maybe by
going through each of the use-cases and showing how a user would
configure PG for each with this proposal, I will.

> > Between these various options, what are the risks of
> > having a script vs. using an API and would one or the other weaken the
> > overall solution?  Or is what's really needed here is a way to tell us
> > if it's a passphrase we're getting or a proper key, regardless of the
> > method being used to fetch it?
>
> For various vault systems I don't think it matters at all whether the
> secret they manage is the key, input used to generate the key, or
> input used to decrypt a key stored elsewhere. Either way they have the
> pivotal secret. So I don't see much point allowing the command to
> return a fully formed key.

I hadn't really considered that to be a distinction either, so I'm glad
that it sounds like we agreed on that point.

> The point of a HSM that  you don't get to read the key. Pg can never
> read the key, it can only perform encrypt and decrypt operations on
> the key using the HSM via the SSL library:

This really depends on exactly what "key" is being referred to here, and
where the encryption/decryption is happening.  Hopefully the above use
cases help clarify.

> Pg -> openssl:
>   "this is the ciphertext of the wal_key. Please decrypt it for me."
> openssl -> engine layer
>   "engine, please decrypt this"
> pkcs#11 engine-> pkcs#11 provider:
>   "please decrypt this"
> pkcs#11 provider -> HSM-specific libraries, network proxies, whatever:
>   "please decrypt this"
>   "... here's the plaintext"
> <- flows back up

Right- in this case, ultimately, the actual key used for the encryption
and decryption ends up in PG's memory space as plaintext and could
therefore be acquired by an attacker with access to PG memory space.

> So the KEK used to encrypt the main cluster keys for heap and wal
> encryption is never readable by Pg. It usually never enters host
> memory - in the case of a HSM, the ciphertext is sent over USB or PCIe
> to the HSM and the cleartext comes back.

Agreed, the KEK isn't, but that isn't actually all that interesting
since the KEK isn't needed to decrypt the data.

> In openssl, the default engine is file-based with host software crypto
> implementations. You can specify alternate engines using various
> OpenSSL APIs, or you can specify them by supplying a URI where you'd
> usually supply a file path to a key.

Right.

> I'm proposing we make it easy to supply a key URI and let openssl
> handle the engine etc. It's far from perfect, and it's really meant as
> a fallback to allow apps that don't natively understand SSL engines
> etc to still use them in a limited capacity.

I agree that it doesn't seem like a bad approach to expose that URI, but
I'm not sure that's really the end of it since there's going to be cases
where people would like to have a KEK on a yubikey and there'll be other
cases where people would like to offload all of the encryption and
decryption to a HSM crypto accelerator and, ideally, we'd allow them to
be able to configure PG for either of those cases.

> What I'd *prefer* to do is make the function that sets up the KEK
> hookable. So by default we'd call a function that'd read the external
> passphrase from a command use that to generate KEK+HMAC. But an
> extension hook installed at shared_preload_libraries time could
> override the behaviour completely and return its own implementation.

I don't see a problem with adding hooks, where they make sense, but we
should also make things work in a sensible way and a way that works with
at least the use-cases that I've outlined, ideally, without having to go
get an extension or write C code.

> > This really locks us into OpenSSL for this, which I don't particularly
> > like.
>
> We're pretty locked into openssl already. I don't like it either, it
> was just the option that has the least impact/delay on the main work
> on this patch.

There's an active patch that's been worked on for quite some time that's
getting some renewed interest in adding NSS support, something I
certainly support also, so we really shouldn't be taking steps that end
up making it more difficult to support alternatives.  Perhaps a generic
'key URI' type of option wouldn't be too bad, and each library we
support could parse that string out based on what information it needs
(eg: for NSS, a database + key nickname could be provided in some
specific format), but overall we certainly shouldn't be baking things in
which are very OpenSSL-specific and exposed to users.

> I'd rather abstract KEK operations behind a context object-like struct
> with function pointer members, like we do in many other places in Pg.
> Make the default one do the dance of reading the external passphrase
> and generating the KEK on the fly. Allow plugins to override it with
> their own, and let them set it up to delegate to a HSM or whatever
> else they want.
>
> Then ship a simple openssl based default implementation of HSM support
> that can be shoved in shared_preload_libraries. Or if we don't like
> using s_p_l, add a separate GUC for cluster_encryption_key_manager or
> whatever, and a different entrypoint, instead of having s_p_l call
> _PG_init() to register a hook.

I definitely think we want to support things directly in PG and not
require an extension or something to be in s_p_l for this.

> > > For example if I want to lock my database with a YubiHSM I would configure
> > > something like:
> > >
> > >     cluster_encryption_key = 'pkcs11:token=YubiHSM;id=0:0001;type=private'
> > >
> > > The DB would be encrypted and decrypted using application keys unlocked by
> > > the HSM. Backups of the database, stolen disk images, etc, would be
> > > unreadable unless you have access to another HSM with the same key loaded.
> >
> > Well, you would surely just need the key, since you could change the PG
> > config to fetch the key from wherever you have it, you wouldn't need an
> > actual HSM.
>
> Right - if your HSM was programmed by generating a key and storing
> that into the HSM and you have that key backed up in file form
> somewhere, you could likely put it in a pem file and use that directly
> by pointing Pg at the file instead of an engine URI.

Sure.

> But you might not even have the key. In some HSM implementations the
> key is completely sealed - you can program new HSMs to have the same
> key by using the same configuration, but you cannot actually obtain
> the key short of attacks on the HSM hardware itself. That's very much
> by design - the HSM configuration is usually on an air-gapped system,
> and it isn't sufficient to decrypt anything unless you also have
> access to a copy of the HSM hardware itself. Obviously you accept the
> risks if you take that approach, and you must have an escape route
> where you can re-encrypt the material protected by the HSM against
> some other key. But it's not at all uncommon.

Right, but in such cases you'd need an HSM that's able to perform
encryption and decryption at some reasonable rate.

> Key rotation is obviously vital to make this vaguely sane. In Pg's
> case you'd to change the key configuration, then trigger a key
> rotation step, which would decrypt with a context obtained from the
> old config then encrypt with a context obtained from the new config.

Yes, key rotation is an important part.

> > > If cluster_encryption_key is unset, Pg would perform its own KEK derivation
> > > based on cluster_passphrase_command as currently implemented.
> >
> > To what I was suggesting above- what if we just had a GUC that's
> > "kek_method" with options 'passphrase' and 'direct', where passphrase
> > goes through KEK and 'direct' doesn't, which just changes how we treat
> > the results of called cluster_passphrase_command?
>
> That won't work for a HSM. It is not possible to extract the key.
> "direct" cannot be implemented.

Perhaps the above helps explain what I was getting at there.

Thanks,

Stephen
Attachment

signature.asc
pgsql-hackers by date:
From: Tom Lane
Date: 28 October 2020, 16:25:41
Subject: Re: duplicate function oid symbols
From: Andres Freund
Date: 28 October 2020, 17:57:44
Subject: Re: cutting down the TODO list thread
Re: Internal key management system - Mailing list pgsql-hackers

Attachment

Previous

Next