BlogAnnounced at MongoDB.local NYC 2024: A recap of all announcements and updatesLearn more >>
MongoDB Developer
MongoDB
plus
Sign in to follow topics
MongoDB Developer Centerchevron-right
Developer Topicschevron-right
Productschevron-right
MongoDBchevron-right

Improving Storage and Read Performance for Free: Flat vs Structured Schemas

Artur Costa5 min read • Published Jan 26, 2024 • Updated Jan 26, 2024
MongoDB
Facebook Icontwitter iconlinkedin icon
Rate this article
star-empty
star-empty
star-empty
star-empty
star-empty
When developers or administrators who had previously only been "followers of the word of relational data modeling" start to use MongoDB, it is common to see documents with flat schemas. This behavior happens because relational data modeling makes you think about data and schemas in a flat, two-dimensional structure called tables.
In MongoDB, data is stored as BSON documents, almost a binary representation of JSON documents, with slight differences. Because of this, we can create schemas with more dimensions/levels. More details about BSON implementation can be found in its specification. You can also learn more about its differences from JSON.
MongoDB documents are composed of one or more key/value pairs, where the value of a field can be any of the BSON data types, including other documents, arrays, or arrays of documents.
Using documents, arrays, or arrays of documents as values for fields enables the creation of a structured schema, where one field can represent a group of related information. This structured schema is an alternative to a flat schema.
Let's see an example of how to write the same user document using the two schemas:
Comparing Flat vs Structured Schemas in a Document Database
The two documents above contain the same data. The one on the left, flatUser, uses a flat schema where all the field-and-value pairs are on the same level. The one on the right, structuredUser, employs a structured schema where the field and values have nested levels according to related information inside the document.
So, what are the advantages of using a structured rather than a flat one? The quick answer for those in a hurry is that a structured schema may require less storage and be faster to traverse than a flat schema. For those who want to know why, we need a better understanding of BSON.
For the purpose of this article, a BSON document can be seen as a list of items, where each item represents a field-and-value pair of the document. An item is composed of the field’s type, name, length, and data in a serialized form. The field type is one byte long and indicates the data type in the data field. The field name is the field's name in a string form. The field length is four bytes long and indicates the length of the data field for those types where the size is not fixed. The data field is the actual data of the field-and-value pair. Putting this definition in a graphical representation, we have:
BSON Document Structure
Let's see how a structured schema uses less storage than a flat schema by analyzing the field-and-value pair related to the user's name.
In the flatUser, we have the following table from a storage perspective:
field-and-valueTypeField NameField LengthField DataTotal
name_first: "john"1 byte10 bytes4 bytes4 bytes19 bytes
name_last: "smith"1 byte9 bytes4 bytes5 bytes19 bytes
name_middle: "oliver"1 byte11 bytes4 bytes6 bytes22 bytes
Adding up the table's total sizes, the flat document uses 60 bytes to store the field and value related to the user's name.
To analyze the storage of the structuredUser, let's divide it into two tables. In the first table, we'll have the storage used by the document of the field name, and in the second table, we'll have the storage utilized by the field-and-value name.
Let’s build the first table for the value/content of the field name:
field-and-valueTypeField NameField LengthField DataTotal Size
first: "john"1 byte5 bytes4 bytes4 bytes14 bytes
last: "smith"1 byte4 bytes4 bytes5 bytes14 bytes
middle: "oliver"1 byte6 bytes4 bytes6 bytes17 bytes
Adding up the previous table's total sizes, the value/Field Data of the field name uses 45 bytes. Building the second table for the field-and-value name, we get:
field-and-valueTypeField NameField LengthField DataTotal Size
name: { … }1 byte4 bytes4 bytes45 bytes54 bytes
The structured document uses 54 bytes to store the values related to the user's name.
Comparing the tables, we see the main difference is the "Field Name" storage size. The flat schema uses 30 bytes to store the names of its fields, while the structured schema uses 19 bytes to store the names of its fields. This is due to the repetition of the sub-string "name_" in the "Field Name" of the flat schema.
Storing the two documents in a MongoDB instance, we will get a size of 403 bytes for the flat schema and 307 bytes for the structured schema. Not bad getting a 24% improvement in storage just by changing the schema, and a structured document is easier to read and more pleasant to look at.
Now, let's see how a structured schema is faster to traverse than a flat schema by getting the zip code of the work address.
In the flatUser document, to get to the field address_work_zip starting at the beginning of the document, a cursor would need to perform a 12 field names comparison until it reaches the desired field.
In the structuredUser document, to get to the field address.work.zip starting at the beginning of the document, a cursor would need to perform an 8 field names comparison. The smaller number of comparisons here is due to some values of a field-and-value pair being a document. When the cursor checks the field name, it can jump three fields/comparison — first, middle, and last— because it knows that address.work.zip won't be inside of name.<field>. When the cursor checks the field address.home, it can also jump five fields/comparison — street, number, zip, state, and country.
To quantify the performance gain on traversing a structured schema instead of a flat schema in MongoDB, a test with the following methodology was used:
  • To isolate the result to be influenced just by the traversing of the documents, the MongoDB instance used was configured with in-memory storage.
  • Documents with 10, 25, 50, and 100 fields were utilized for the flat schema.
  • Documents with 2x5, 5x5, 10x5, and 20x5 fields were used for the structured schema, where 2x5 means two fields of type document with five fields for each document.
  • Each collection had 10.000 documents generated using faker/npm.
  • To force the MongoDB engine to loop through all documents and all fields inside each document, all queries were made searching for a field and value that wasn't present in the documents.
  • Each query was executed 100 times in a row for each document size and schema.
  • No concurrent operation was executed during each test.
Now, to the test results:
DocumentsFlatStructuredDifferenceImprovement
10 / 2x5487 ms376 ms111 ms29,5%
25 / 5x5624 ms434 ms190 ms43,8%
50 / 10x5915 ms617 ms298 ms48,3%
100 / 20x51384 ms891 ms493 ms55,4%
As our theory predicted, traversing a structured document is faster than traversing a flat one. The gains presented in this test shouldn't be considered for all cases when comparing structured and flat schemas, the improvements in traversing will depend on how the nested fields and documents are organized.
This article showed how to better use your MongoDB deployment by changing the schema of your document for the same data/information. Another option to extract more performance from your MongoDB deployment is to apply the common schema patterns of MongoDB. In this case, you will analyze which data you should put in your document/schema. The article Building with Patterns has the most common patterns and will significantly help.
The code used to get the above results is available in the GitHub repository.

Facebook Icontwitter iconlinkedin icon
Rate this article
star-empty
star-empty
star-empty
star-empty
star-empty
Related
Quickstart

Store Sensitive Data With Python & MongoDB Client-Side Field Level Encryption


Sep 23, 2022 | 11 min read
Tutorial

Modernize your insurance data models with MongoDB Relational Migrator


Mar 04, 2024 | 14 min read
Article

Structuring Data With Serde in Rust


Apr 23, 2024 | 5 min read
Industry Event
locationPITTSBURGH, PA, USA | IN-PERSON

PyCon US


May 15, 2024 - May 19, 2024