BlogAnnounced at MongoDB.local NYC 2024: A recap of all announcements and updatesLearn more >>
MongoDB Developer
Java
plus
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right
Languageschevron-right
Javachevron-right

Java - Client Side Field Level Encryption

Maxime Beugnet14 min read • Published Feb 01, 2022 • Updated Mar 01, 2024
MongoDBSecurityJava
Facebook Icontwitter iconlinkedin icon
Rate this quickstart
star-empty
star-empty
star-empty
star-empty
star-empty

Updates

The MongoDB Java quickstart repository is available on GitHub.

February 28th, 2024

  • Update to Java 21
  • Update Java Driver to 5.0.0
  • Update logback-classic to 1.2.13

November 14th, 2023

  • Update to Java 17
  • Update Java Driver to 4.11.1
  • Update mongodb-crypt to 1.8.0

March 25th, 2021

  • Update Java Driver to 4.2.2.
  • Added Client Side Field Level Encryption example.

October 21st, 2020

  • Update Java Driver to 4.1.1.
  • The Java Driver logging is now enabled via the popular SLF4J API, so I added logback in the pom.xml and a configuration file logback.xml.

What's the Client Side Field Level Encryption?

Java badge
The Client Side Field Level Encryption (CSFLE for short) is a new feature added in MongoDB 4.2 that allows you to encrypt some fields of your MongoDB documents prior to transmitting them over the wire to the cluster for storage.
It's the ultimate piece of security against any kind of intrusion or snooping around your MongoDB cluster. Only the application with the correct encryption keys can decrypt and read the protected data.
Let's check out the Java CSFLE API with a simple example.

Video

This content is also available in video format.

Getting Set Up

I will use the same repository as usual in this series. If you don't have a copy of it yet, you can clone it or just update it if you already have it:
If you didn't set up your free cluster on MongoDB Atlas, now is great time to do so. You have all the instructions in this post.
For this CSFLE quickstart post, I will only use the Community Edition of MongoDB. As a matter of fact, the only part of CSFLE that is an enterprise-only feature is the automatic encryption of fields which is supported by mongocryptd or the Automatic Encryption Shared Library for Queryable Encryption.
Automatic Encryption Shared Library for Queryable Encryption is a replacement for mongocryptd and should be the preferred solution. They are both optional and part of MongoDB Enterprise.
In this tutorial, I will be using the explicit (or manual) encryption of fields which doesn't require mongocryptd or the Automatic Encryption Shared Library and the enterprise edition of MongoDB or Atlas. If you would like to explore the enterprise version of CSFLE with Java, you can find out more in this documentation or in my more recent post: How to Implement Client-Side Field Level Encryption (CSFLE) in Java with Spring Data MongoDB.
Do not confuse mongocryptd or the Automatic Encryption Shared Library with the libmongocrypt library which is the companion C library used by the drivers to encrypt and decrypt your data. We need this library to run CSFLE. I added it in the pom.xml file of this project.
To keep the code samples short and sweet in the examples below, I will only share the most relevant parts. If you want to see the code working with all its context, please check the source code in the github repository in the csfle package directly.

Run the Quickstart Code

In this quickstart tutorial, I will show you the CSFLE API using the MongoDB Java Driver. I will show you how to:
  • create and configure the MongoDB connections we need.
  • create a master key.
  • create Data Encryption Keys (DEK).
  • create and read encrypted documents.
To run my code from the above repository, check out the README.
But for short, the following command should get you up and running in no time:
This is the output you should get:
Let's have a look in depth to understand what is happening.

How it Works

CSFLE diagram with master key and DEK vault
CSFLE looks complicated, like any security and encryption feature, I guess. Let's try to make it simple in a few words.
  1. We need a master key which unlocks all the Data Encryption Keys ( DEK for short) that we can use to encrypt one or more fields in our documents.
  2. You can use one DEK for our entire cluster or a different DEK for each field of each document in your cluster. It's up to you.
  3. The DEKs are stored in a collection in a MongoDB cluster which does not have to be the same that contains the encrypted data. The DEKs are stored encrypted. They are useless without the master key which needs to be protected.
  4. You can use the manual (community edition) or the automated (enterprise advanced or Atlas) encryption of fields.
  5. The decryption can be manual or automated. Both are part of the community edition of MongoDB. In this post, I will use manual encryption and automated decryption to stick with the community edition of MongoDB.

GDPR Compliance

GDPR logo
European laws enforce data protection and privacy. Any oversight can result in massive fines.
CSFLE is a great way to save millions of dollars/euros.
For example, CSFLE could be a great way to enforce the "right-to-be-forgotten" policy of GDPR. If a user asks to be removed from your systems, the data must be erased from your production cluster, of course, but also the logs, the dev environment, and the backups... And let's face it: Nobody will ever remove this user's data from the backups. And if you ever restore or use these backups, this can cost you millions of dollars/euros.
But now... encrypt each user's data with a unique Data Encryption Key (DEK) and to "forget" a user forever, all you have to do is lose the key. So, saving the DEKs on a separated cluster and enforcing a low retention policy on this cluster will ensure that a user is truly forgotten forever once the key is deleted.
Kenneth White, Security Principal at MongoDB who worked on CSFLE, explains this perfectly in this answer in the MongoDB Community Forum.
If the primary motivation is just to provably ensure that deleted plaintext user records remain deleted no matter what, then it becomes a simple timing and separation of concerns strategy, and the most straight-forward solution is to move the keyvault collection to a different database or cluster completely, configured with a much shorter backup retention; FLE does not assume your encrypted keyvault collection is co-resident with your active cluster or has the same access controls and backup history, just that the client can, when needed, make an authenticated connection to that keyvault database. Important to note though that with a shorter backup cycle, in the event of some catastrophic data corruption (malicious, intentional, or accidental), all keys for that db (and therefore all encrypted data) are only as recoverable to the point in time as the shorter keyvault backup would restore.
More trivial, but in the event of an intrusion, any stolen data will be completely worthless without the master key and would not result in a ruinous fine.

The Master Key

The master key is an array of 96 bytes. It can be stored in a Key Management Service in a cloud provider or can be locally managed (documentation). One way or another, you must secure it from any threat.
It's as simple as that to generate a new one:
But you most probably just want to do this once and then reuse the same one each time you restart your application.
Here is my implementation to store it in a local file the first time and then reuse it for each restart.
This is nowhere near safe for a production environment because leaving the master_key.txt directly in the application folder on your production server is like leaving the vault combination on a sticky note. Secure that file or please consider using a KMS in production.
In this simple quickstart, I will only use a single master key, but it's totally possible to use multiple master keys.

The Key Management Service (KMS) Provider

Whichever solution you choose for the master key, you need a KMS provider to set up the ClientEncryptionSettings and the AutoEncryptionSettings.
Here is the configuration for a local KMS:

The Clients

We will need to set up two different clients:
  • The first one ─ ClientEncryption ─ will be used to create our Data Encryption Keys (DEK) and encrypt our fields manually.
  • The second one ─ MongoClient ─ will be the more conventional MongoDB connection that we will use to read and write our documents, with the difference that it will be configured to automatically decrypt the encrypted fields.

ClientEncryption

MongoClient

bypassAutoEncryption(true) is the ticket for the Community Edition. Without it, mongocryptd or the Automatic Encryption Shared Library would rely on the JSON schema that you would have to provide to encrypt automatically the documents. See this example in the documentation.
You don't have to reuse the same connection string for both connections. It would actually be a lot more "GDPR-friendly" to use separated clusters, so you can enforce a low retention policy on the Data Encryption Keys.

Unique Index on Key Alternate Names

The first thing you should do before you create your first Data Encryption Key is to create a unique index on the key alternate names to make sure that you can't reuse the same alternate name on two different DEKs.
These names will help you "label" your keys to know what each one is used for ─ which is still totally up to you.
In my example, I choose to use one DEK per user. I will encrypt all the fields I want to secure in each user document with the same key. If I want to "forget" a user, I just need to drop that key. In my example, the names are unique so I'm using this for my keyAltNames. It's a great way to enforce GDPR compliance.

Create Data Encryption Keys

Let's create two Data Encryption Keys: one for Bobby and one for Alice. Each will be used to encrypt all the fields I want to keep safe in my respective user documents.
We get a little help from this private method to make my code easier to read:
Here is what Bobby's DEK looks like in my csfle.vault collection:
As you can see above, the keyMaterial (the DEK itself) is encrypted by the master key. Without the master key to decrypt it, it's useless. Also, you can identify that it's Bobby's key in the keyAltNames field.

Create Encrypted Documents

Now that we have an encryption key for Bobby and Alice, I can create their respective documents and insert them into MongoDB like so:
Here is what Bobby and Alice documents look like in my encrypted.users collection:
Bobby
Alice
Client Side Field Level Encryption currently provides two different algorithms to encrypt the data you want to secure.

AEAD_AES_256_CBC_HMAC_SHA_512-Deterministic

With this algorithm, the result of the encryption ─ given the same inputs (value and DEK) ─ is deterministic. This means that we have a greater support for read operations, but encrypted data with low cardinality is susceptible to frequency analysis attacks.
In my example, if I want to be able to retrieve users by phone numbers, I must use the deterministic algorithm. As a phone number is likely to be unique in my collection of users, it's safe to use this algorithm here.

AEAD_AES_256_CBC_HMAC_SHA_512-Random

With this algorithm, the result of the encryption is always different. That means that it provides the strongest guarantees of data confidentiality, even when the cardinality is low, but prevents read operations based on these fields.
In my example, the blood type has a low cardinality and it doesn't make sense to search in my user collection by blood type anyway, so it's safe to use this algorithm for this field.
Also, Bobby's medical record must be very safe. So, the entire subdocument containing all his medical records is encrypted with the random algorithm as well and won't be used to search Bobby in my collection anyway.

Read Bobby's Document

As mentioned in the previous section, it's possible to search documents by fields encrypted with the deterministic algorithm.
Here is how:
I simply encrypt again, with the same key, the phone number I'm looking for, and I can use this BsonBinary in my query to find Bobby.
If I output the doc string, I get:
As you can see, the automatic decryption worked as expected, I can see my document in clear text. To find this document, I could use the _id, the name, the age, or the phone number, but not the blood_type or the medical_record.

Read Alice's Document

Now let's put CSFLE to the test. I want to be sure that if Alice's DEK is destroyed, Alice's document is lost forever and can never be restored, even from a backup that could be restored. That's why it's important to keep the DEKs and the encrypted documents in two different clusters that don't have the same backup retention policy.
Let's retrieve Alice's document by name, but let's protect my code in case something "bad" has happened to her key...
If her key still exists in the database, then I can decrypt her document:
Now, let's remove her key from the database:
In a real-life production environment, it wouldn't make sense to read her document again; and because we are all professional and organised developers who like to keep things tidy, we would also delete Alice's document along with her DEK, as this document is now completely worthless for us anyway.
In my example, I want to try to read this document anyway. But if I try to read it immediately after deleting her document, there is a great chance that I will still able to do so because of the 60 seconds Data Encryption Key Cache that is managed by libmongocrypt.
This cache is very important because, without it, multiple back-and-forth would be necessary to decrypt my document. It's critical to prevent CSFLE from killing the performances of your MongoDB cluster.
So, to make sure I'm not using this cache anymore, I'm creating a brand new MongoClient (still with auto decryption settings) for the sake of this example. But of course, in production, it wouldn't make sense to do so.
Now if I try to access Alice's document again, I get the following MongoException, as expected:

Wrapping Up

In this quickstart tutorial, we have discovered how to use Client Side Field Level Encryption using the MongoDB Java Driver, using only the community edition of MongoDB. You can learn more about the automated encryption in our documentation.
CSFLE is the ultimate security feature to ensure the maximal level of security for your cluster. Not even your admins will be able to access the data in production if they don't have access to the master keys.
But it's not the only security measure you should use to protect your cluster. Preventing access to your cluster is, of course, the first security measure that you should enforce by enabling the authentication and limit network exposure.
In doubt, check out the security checklist before launching a cluster in production to make sure that you didn't overlook any of the security options MongoDB has to offer to protect your data.
There is a lot of flexibility in the implementation of CSFLE: You can choose to use one or multiple master keys, same for the Data Encryption Keys. You can also choose to encrypt all your phone numbers in your collection with the same DEK or use a different one for each user. It's really up to you how you will organise your encryption strategy but, of course, make sure it fulfills all your legal obligations. There are multiple right ways to implement CSFLE, so make sure to find the most suitable one for your use case.
If you have questions, please head to our developer community website where the MongoDB engineers and the MongoDB community will help you build your next big idea with MongoDB.

Documentation


Facebook Icontwitter iconlinkedin icon
Rate this quickstart
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Article

MongoDB ORMs, ODMs, and Libraries


Apr 02, 2024 | 3 min read
Tutorial

Single-Collection Designs in MongoDB with Spring Data (Part 2)


Apr 02, 2024 | 10 min read
Quickstart

Getting Started with MongoDB and Java - CRUD Operations Tutorial


Mar 01, 2024 | 24 min read
Podcast

Scaling the Gaming Industry with Gaspard Petit of Square Enix


Mar 22, 2023 | 29 min
Table of Contents