Diving into the world of encryption and data security can be intimidating. While encryption traditionally renders data unusable, a new wave of function-preserving encryption algorithms have emerged that attempt to provide both security and usability. This guide is designed to help orient you so that you can make an informed decision about the security and usability trade-offs of these new encryption schemes.
The gold standard in confidentiality. Probabilistic encryption is semantically secure — given a ciphertext, nothing about the plaintext data can be determined. Encrypting the same plaintext multiple times will yield different ciphertexts, preventing statistical attacks.
Unfortunately, unstructured encryption renders data completely unusable, obfuscating it to both adversaries and legitimate users alike.
Let’s use pycryptodome (pip install pycrypto) to define a probabilistic encryption function:
from Crypto.Cipher import AES from Crypto.Random import get_random_bytes key = get_random_bytes(32) # Probabilistic encryption def encrypt(txt): nonce = get_random_bytes(AES.block_size) cipher = AES.new(key, AES.MODE_CFB, nonce) return nonce + cipher.encrypt(txt)
And testing it out:
> encrypt('hello world') '\x15]Xc\x19\x11\xe1\x13\x83\xbbNyo\xd9~T\x15G\x0f\x9fU\x16\xbc\xc5.\xd6\xb2' > encrypt('hello world') '\xf2\xd4\xceJ\xbbo\x9b\xbf\xfaV\x0e\xd2`\x82\xa5In\x17y\xcd\xb5\xd6M\xf5\xbf\x93\n'
As you can see, we get a different ciphertext every time.
Selective Encryption Selective encryption only encrypts the sensitive bits of some larger piece of data. This is often done for compliance reasons or when sharing personally identifiable information. The context of the selectively encrypted data is exposed to an adversary:
The secret to life is 42
The secret to life is uV7sQO1PCW5o6Rs9v4+JH+nBQXzvlewOgzFeaaFNBFQ=
Searchable (Deterministic) Encryption/Tokenization
Encrypting a given plaintext with deterministic encryption always yields the same ciphertext token. While this makes it trivial to do keyword searches against this token, it also opens your data set to statistical attacks. By comparing the frequency of ciphertext tokens to the frequency of words in natural language, an attacker can infer the plaintext inputs.
Let’s return to the probabilistic encryption function we defined previously and tweak it a bit:
# Deterministic encryption def encrypt(txt): nonce = chr(0) * AES.block_size cipher = AES.new(key, AES.MODE_CFB, nonce) return cipher.encrypt(txt)
Testing it out:
> encrypt('hello world') '\x1c\xd9\xb4G\xa5\xf4\x80\xd6\xdbH\xb3' > encrypt('hello world') '\x1c\xd9\xb4G\xa5\xf4\x80\xd6\xdbH\xb3'
Encrypted, but the same result each time!
Format Preserving Encryption (FPE)
Ciphertext encrypted using an FPE scheme maintains the length and format of the corresponding plaintext. FPE is a special case of deterministic encryption and is particularly useful when a database schema requires fields to be stored according to some validation rules. For this reason dates, credit card numbers, phone numbers are often encrypted using FPE.
However, encrypting the same plaintext multiple times will yield the same ciphertext, so FPE does leak equality. In cases where a plaintext is not repeated (e.g., credit card or phone numbers) this might not be an issue.
Jan 23, 1932
Oct 15, 2044
Order Preserving Encryption (OPE)
There are variants of searchable encryption in which ciphertexts retain the order of their corresponding plaintexts. OPE preserves sort and range query functionality. However, leaking the order also leaks the relative distance between plaintexts as well as their approximate value.
Let’s give it a try with pip install pyope:
> from pyope.ope import OPE > cipher = OPE(b'secret key') > cipher.encrypt(100) 6569784 > cipher.encrypt(1000) 63738157 > cipher.encrypt(10000) 651165852
ZeroDB takes a different approach to encryption in-use. Rather than working at the encryption layer, as in the schemes above, ZeroDB makes unstructured encryption usable by creating encrypted indexes for the data whose functionality needs to be preserved. These encrypted indexes are then traversed remotely from a client machine with access to the encryption keys, in such a way that the plaintexts are never revealed to the database server.
If you’re interested in building highly secure and performant applications with ZeroDB, check out the docs and pip install zerodb-server. You can find the GitHub repo here: https://github.com/zero-db.