WIP: Data at rest encryption - Mailing list pgsql-hackers
From | Ants Aasma |
---|---|
Subject | WIP: Data at rest encryption |
Date | |
Msg-id | CA+CSw_tb3bk5i7if6inZFc3yyf+9HEVNTy51QFBoeUk7UE_V=w@mail.gmail.com Whole thread Raw |
Responses |
Re: WIP: Data at rest encryption
Re: WIP: Data at rest encryption Re: [HACKERS] WIP: Data at rest encryption |
List | pgsql-hackers |
Hi all, I have been working on data-at-rest encryption support for PostgreSQL. In my experience this is a common request that customers make. The short of the feature is that all PostgreSQL data files are encrypted with a single master key and are decrypted when read from the OS. It does not provide column level encryption which is an almost orthogonal feature, arguably better done client side. Similar things can be achieved with filesystem level encryption. However this is not always optimal for various reasons. One of the better reasons is the desire for HSM based encryption in a storage area network based setup. Attached to this mail is a work in progress patch that adds an extensible encryption mechanism. There are some loose ends left to tie up, but the general concept and architecture is at a point where it's ready for some feedback, fresh ideas and bikeshedding. Usage ===== Set up database like so: (read -sp "Postgres passphrase: " PGENCRYPTIONKEY; echo; export PGENCRYPTIONKEY initdb -k -K pgcrypto $PGDATA ) Start PostgreSQL: (read -sp "Postgres passphrase: " PGENCRYPTIONKEY; echo; export PGENCRYPTIONKEY postgres $PGDATA ) Design ====== The patch adds a new GUC called encryption_library, when specified the named library is loaded before shared_preload_libraries and is expected to register its encryption routines. For now the API is pretty narrow, one parameterless function that lets the extension do key setup on its own terms, and two functions for encrypting/decrypting an arbitrary sized block of data with tweak. The tweak should alter the encryption function so that identical block contents are encrypted differently based on their location. The GUC needs to be set at bootstrap time, so it gets set by a new option for initdb. During bootstrap an encryption sample gets stored in the control file, enabling useful error messages. The library name is not stored in controldata. I'm not quite sure about this decision. On one hand it would be very useful to tell the user what he needs to get at his data if the configuration somehow goes missing and it would get rid of the extra GUC. On the other hand I don't really want to bloat control data, and the same encryption algorithm could be provided by different implementations. For now the encryption is done for everything that goes through md, xlog and slru. Based on a review of read/write/fread/fwrite calls this list is missing: * BufFile - needs refactoring * Logical reorder buffer serialization - probably needs a stream mode cipher API addition. * logical_heap_rewrite - can be encrypted as one big block * 2PC state data - ditto * pg_stat_statements - query texts get appended so a stream mode cipher might be needed here too. copydir needed some changes too because tablespace and database oid are included in the tweak and so copying also needs to decrypt and encrypt with the new tweak value. For demonstration purposes I imported Brian Gladman's AES-128-XTS mode implementation into pgcrypto and used an environment variable for key setup. This part is not really in any reviewable state, the XTS code needs heavy cleanup to bring it up to PostgreSQL coding standards, keysetup needs something secure, like PBKDF2 or scrypt. Performance with current AES implementation is not great, but not horrible either, I'm seeing around 2x slowdown for larger than shared_buffers, smaller than free memory workloads. However the plan is to fix this - I have a prototype AES-NI implementation that does 3GB/s per core on my Haswell based laptop (1.25 B/cycle). Open questions ============== The main questions is what to do about BufFile? It currently provides both unaligned random access and a block based interface. I wonder if it would be a good idea to refactor it to be fully block based under the covers. I would also like to incorporate some database identifier as a salt in key setup. However, system identifier stored in control file doesn't fit this role well. It gets initialized somewhat too late in the bootstrap process, and more importantly, gets changed on pg_upgrade. This will make link mode upgrades impossible, which seems like a no go. I'm torn whether to add a new value for this purpose (perhaps stored outside the control file) or allow setting of system identifier via initdb. The first seems like a better idea, the file could double as a place to store additional encryption parameters, like key length or different cipher primitive. Regards, Ants Aasma
Attachment
pgsql-hackers by date: