Home > mailing lists

WIP: Data at rest encryption - Mailing list pgsql-hackers

From	Ants Aasma
Subject	WIP: Data at rest encryption
Date	June 7, 2016 13:58:37
Msg-id	CA+CSw_tb3bk5i7if6inZFc3yyf+9HEVNTy51QFBoeUk7UE_V=w@mail.gmail.com Whole thread Raw
Responses	Re: WIP: Data at rest encryption Re: WIP: Data at rest encryption Re: [HACKERS] WIP: Data at rest encryption
List	pgsql-hackers

Tree view

Hi all,

I have been working on data-at-rest encryption support for PostgreSQL.
In my experience this is a common request that customers make. The
short of the feature is that all PostgreSQL data files are encrypted
with a single master key and are decrypted when read from the OS. It
does not provide column level encryption which is an almost orthogonal
feature, arguably better done client side.

Similar things can be achieved with filesystem level encryption.
However this is not always optimal for various reasons. One of the
better reasons is the desire for HSM based encryption in a storage
area network based setup.

Attached to this mail is a work in progress patch that adds an
extensible encryption mechanism. There are some loose ends left to tie
up, but the general concept and architecture is at a point where it's
ready for some feedback, fresh ideas and bikeshedding.

Usage
=====

Set up database like so:

    (read -sp "Postgres passphrase: " PGENCRYPTIONKEY; echo;
     export PGENCRYPTIONKEY
     initdb -k -K pgcrypto $PGDATA )

Start PostgreSQL:

    (read -sp "Postgres passphrase: " PGENCRYPTIONKEY; echo;
     export PGENCRYPTIONKEY
     postgres $PGDATA )

Design
======

The patch adds a new GUC called encryption_library, when specified the
named library is loaded before shared_preload_libraries and is
expected to register its encryption routines. For now the API is
pretty narrow, one parameterless function that lets the extension do
key setup on its own terms, and two functions for
encrypting/decrypting an arbitrary sized block of data with tweak. The
tweak should alter the encryption function so that identical block
contents are encrypted differently based on their location. The GUC
needs to be set at bootstrap time, so it gets set by a new option for
initdb. During bootstrap an encryption sample gets stored in the
control file, enabling useful error messages.

The library name is not stored in controldata. I'm not quite sure
about this decision. On one hand it would be very useful to tell the
user what he needs to get at his data if the configuration somehow
goes missing and it would get rid of the extra GUC. On the other hand
I don't really want to bloat control data, and the same encryption
algorithm could be provided by different implementations.

For now the encryption is done for everything that goes through md,
xlog and slru. Based on a review of read/write/fread/fwrite calls this
list is missing:

* BufFile - needs refactoring
* Logical reorder buffer serialization - probably needs a stream mode
cipher API addition.
* logical_heap_rewrite - can be encrypted as one big block
* 2PC state data - ditto
* pg_stat_statements - query texts get appended so a stream mode
cipher might be needed here too.

copydir needed some changes too because tablespace and database oid
are included in the tweak and so copying also needs to decrypt and
encrypt with the new tweak value.

For demonstration purposes I imported Brian Gladman's AES-128-XTS mode
implementation into pgcrypto and used an environment variable for key
setup. This part is not really in any reviewable state, the XTS code
needs heavy cleanup to bring it up to PostgreSQL coding standards,
keysetup needs something secure, like PBKDF2 or scrypt.

Performance with current AES implementation is not great, but not
horrible either, I'm seeing around 2x slowdown for larger than
shared_buffers, smaller than free memory workloads. However the plan
is to fix this - I have a prototype AES-NI implementation that does
3GB/s per core on my Haswell based laptop (1.25 B/cycle).

Open questions
==============

The main questions is what to do about BufFile? It currently provides
both unaligned random access and a block based interface. I wonder if
it would be a good idea to refactor it to be fully block based under
the covers.

I would also like to incorporate some database identifier as a salt in
key setup. However, system identifier stored in control file doesn't
fit this role well. It gets initialized somewhat too late in the
bootstrap process, and more importantly, gets changed on pg_upgrade.
This will make link mode upgrades impossible, which seems like a no
go. I'm torn whether to add a new value for this purpose (perhaps
stored outside the control file) or allow setting of system identifier
via initdb. The first seems like a better idea, the file could double
as a place to store additional encryption parameters, like key length
or different cipher primitive.

Regards,
Ants Aasma

Attachment

data-at-rest-encryption-wip-2016.06.07.patch

pgsql-hackers by date:

From: Tom Lane
Date: 07 June 2016, 13:26:15
Subject: Re: slower connect from hostnossl clients

From: Amit Kapila
Date: 07 June 2016, 14:20:10
Subject: Re: Reviewing freeze map code

WIP: Data at rest encryption - Mailing list pgsql-hackers

Attachment

Previous

Next