How to Implement Client-Side Field Level Encryption (CSFLE) in Java with Spring Data MongoDB
Maxime Beugnet, Megha Arora11 min read โข Published Nov 06, 2023 โข Updated Jan 27, 2024
Rate this code example
The source code of this template is available on GitHub:
To get started, you'll need:
- Java 17.
- MongoDB Automatic Encryption Shared Library v7.0.2 or higher.
This content is also available in video format.
This post will explain the key details of the integration of
MongoDB Client-Side Field Level Encryption (CSFLE)
with Spring Data MongoDB.
If you feel like you need a refresher on CSFLE before working on this more complicated piece, I can recommend a few
resources for CSFLE:
And for Spring Data MongoDB:
This template is significantly larger than other online CSFLE templates you can find online. It tries to provide
reusable code for a real production environment using:
- Multiple encrypted collections.
- Automated JSON Schema generation.
- Server-side JSON Schema.
- Separated clusters for DEKs and encrypted collections.
- Automated data encryption keys generation or retrieval.
- SpEL Evaluation Extension.
- Auto-implemented repositories.
- Open API documentation 3.0.1.
While I was coding, I also tried to respect the SOLID Principles as much
as possible to increase the code readability, usability, and reutilization.
Now that we are all on board, here is a high-level diagram of the different moving parts required to create a correctly-configured CSFLE-enabled MongoClient which can encrypt and decrypt fields automatically.
The arrows can mean different things in the diagram:
- "needs to be done before"
- "requires"
- "direct dependency of"
But hopefully it helps explain the dependencies, the orchestration, and the inner machinery of the CSFLE
configuration with Spring Data MongoDB.
Once the connection with MongoDB โ capable of encrypting and decrypting the fields โ is established, with the correct
configuration and library, we are just using a classical three-tier architecture to expose a REST API and manage the
communication all the way down to the MongoDB database.
Here, nothing tricky or fascinating to discuss, so we are not going to discuss this in this post.
Let's now focus on all the complicated bits of this template.
As this is a tutorial, the code can be started from a blank MongoDB cluster.
So the first point of order is to create the key vault collection and its unique index on the
keyAltNames
field.In production, you could choose to create the key vault collection and its unique index on the
keyAltNames
field
manually once and remove the code as it's never going to be executed again. I guess it only makes sense to keep it if
you are running this code in a CI/CD pipeline.One important thing to note here is the dependency to a completely standard (i.e., not CSFLE-enabled) and ephemeral
MongoClient
(use of a
try-with-resources block) as we are already creating a collection and an index in our MongoDB cluster.When it's done, we can close the standard MongoDB connection.
We can now create the Data Encryption Keys (DEKs) using the
ClientEncryption
connection.We can instantiate directly a
ClientEncryption
bean using
the KMS and use it to
generate our DEKs (one for each encrypted collection).One thing to note here is that we are storing the DEKs in a map, so we don't have to retrieve them again later when we
need them for the JSON Schemas.
One of the key functional areas of Spring Data MongoDB is the POJO-centric model it relies on to implement the
repositories and map the documents to the MongoDB collections.
PersonEntity.java
As you can see above, this entity contains all the information we need to fully automate CSFLE. We have the information
we need to generate the JSON Schema:
- Using the SpEL expression
#{mongocrypt.keyId(#target)}
, we can populate dynamically the DEK that was generated or retrieved earlier. ssn
is aString
that requires a deterministic algorithm.bloodType
is aString
that requires a random algorithm.
The generated JSON Schema looks like this:
The evaluation of the SpEL expression is only possible because of this class we added in the configuration:
Note that it's the place where we are retrieving the DEKs and matching them with the
target
: "PersonEntity", in this case.JSON Schemas are actually not trivial to generate in a Spring Data MongoDB project.
As a matter of fact, to generate the JSON Schemas, we need the MappingContext (the entities, etc.) which is created by
the automatic configuration of Spring Data which creates the
MongoClient
connection and the MongoTemplate
...But to create the MongoClient โ with the automatic encryption enabled โ you need JSON Schemas!
It took me a significant amount of time to find a solution to this deadlock, and you can just enjoy the solution now!
The solution is to inject the JSON Schema creation in the autoconfiguration process by instantiating
the
MongoClientSettingsBuilderCustomizer
bean.One thing to note here is the option to separate the DEKs from the encrypted collections in two completely separated
MongoDB clusters. This isn't mandatory, but it can be a handy trick if you choose to have a different backup retention
policy for your two clusters. This can be interesting for the GDPR Article 17 "Right to erasure," for instance, as you
can then guarantee that a DEK can completely disappear from your systems (backup included). I talk more about this
approach in
my Java CSFLE post.
Here is the JSON Schema service which stores the generated JSON Schemas in a map:
We are storing the JSON Schemas because this template also implements one of the good practices of CSFLE: server-side
JSON Schemas.
Indeed, to make the automatic encryption and decryption of CSFLE work, you do not require the server-side JSON Schemas.
Only the client-side ones are required for the Automatic Encryption Shared Library. But then nothing would prevent
another misconfigured client or an admin connected directly to the cluster to insert or update some documents without
encrypting the fields.
To enforce this you can use the server-side JSON Schema as you would to enforce a field type in a document, for instance.
But given that the JSON Schema will evolve with the different versions of your application, the JSON Schemas need to be
updated accordingly each time you restart your application.
One big feature of this template as well is the support of multiple entities. As you probably noticed already, there is
a
CompanyEntity
and all its related components but the code is generic enough to handle any amount of entities which
isn't usually the case in all the other online tutorials.In this template, if you want to support a third type of entity, you just have to create the components of the
three-tier architecture as usual and add your entry in the
EncryptedCollectionsConfiguration
class.Everything else from the DEK generation to the encrypted collection creation with the server-side JSON Schema is fully
automated and taken care of transparently. All you have to do is specify
the
@Encrypted(algorithm = "AEAD_AES_256_CBC_HMAC_SHA_512-Deterministic")
annotation in the entity class and the field
will be encrypted and decrypted automatically for you when you are using the auto-implemented repositories (courtesy of
Spring Data MongoDB, of course!).Maybe you noticed but this template implements the
findFirstBySsn(ssn)
method which means that it's possible to
retrieve a person document by its SSN number, even if this field is encrypted.Note that it only works because we are using a deterministic encryption algorithm.
Thanks for reading my post!
If you have any questions about it, please feel free to open a question in the GitHub repository or ask a question in
the MongoDB Community Forum.
Pull requests and improvement ideas are very welcome!