Panoply Blog: Data Management, Warehousing & Data Analysis

MongoDB Best Practices: Schema Design, Indexes & More

Written by Dawid Ziolkowski | Aug 2, 2022 3:00:00 PM

MongoDB database is really popular these days. Developers often use it instead of MySQL, but these two platforms aren’t in direct competition. While MySQL is a relational database, MongoDB is a NoSQL document-oriented database, so the two work quite differently. And for that reason, optimizing MongoDB is not the same as optimizing a traditional relational database, although some best practices are similar. Read on to learn what to do and what to avoid when using MongoDB.

Understand Schema Differences Between Relational and Document-based Databases

Let's start with the most important difference: the schema. Designing your database schema is a crucial task, and while making changes is possible and common, it can be expensive from an engineering perspective. When a database schema needs changes, your deployment process becomes much more complicated, so good design is critical. How do you design a good MongoDB database schema? Rule number one: don't design it as you would with relational databases. It sounds logical to split your schema into small table-like pieces, right?

In the case of MongoD, no. For relational databases, you usually construct a schema based on the data. You need to figure out how to split the data your application will use into tables so it’s logically organized and not duplicated. But when it comes to a MongoDB schema, you should look not at the data itself, but at the application. Specifically, how your application will use the data, what kind of queries it will likely execute, and so on. This means that two different applications using the exact same data might have very different schema designs in MongoDB, whereas for relational databases the schema would probably be the same or very similar across applications.

Another thing you need to know is that MongoDB has almost no rules or guidelines on how you should structure the data, because MongoDB operates on JSON-like documents. This gives you the ability to embed data into arrays and objects within one document. If you want to learn more about modelling data, take a look at this free course from MongoDB.

Embed Your Data Instead of Relying on Joins

One of the best practices when using MongoDB is to embed your data within one document instead of performing lookups or creating in-application joins. It may be a bit counterintuitive, but MongoDB performs better when you stuff all the data you need into one document. For example, instead of putting user details in one document and user order history in another, chuck them into the same one. Reading documents is extremely fast in MongoDB. Performing lookups or joins within the application is slower in most cases.

Keep in mind that this is only a general rule and you should always start by understanding your application query pattern. Including the data in the document is preferred over lookup operations. But, of course, there is no point in dumping all possible data in one document.

Use Indexes For Frequent Operations

Let's talk about indexes. This next MongoDB best practice is similar to what you'd do with relational databases. In the previous best practice we mentioned that MongoDB prefers to embed data (instead of splitting it into smaller logical pieces). Therefore it's normal for MongoDB documents to become quite big. This will naturally impact performance, but indexes can solve that. Indexes in MongoDB work pretty much the same way as with relational databases. These special data structures store a small subset of the whole document in order to speed up the matching of data for frequently used queries.

For example, imagine that you have your user's data together with their order history in a single document and you want to find all users who ordered something in the last month. Normally (without indexes) MongoDB would have to scan the whole user collection, going one by one through the user document and checking the last order data for each user. It's not horrible; that's how the database performs a lot of operations. But if you frequently ask the database for this kind of matching, then indexing will help you a lot. Coming back to our example -  with indexes, MongoDB stores a separate, small list containing pointers to the data (for example user id, email address, or last order date).

Properly Size Your Servers

It may sound obvious, but server RAM sizing in MongoDB is crucial. There are two things to keep in mind: first, more memory won't increase the performance of your database. It's not just a matter of getting the server with the most RAM memory you can afford. Second, MongoDB performs best when its working set can fit an application's RAM.

Sizing your MongoDB machine is not dependent on the size of the database itself. It doesn't matter if you have 100MB or 2TB of data in your MongoDB instance. What matters is the size of indices and frequently accessed data. To size your MongoDB instance, you need to perform some tests to find out how much data your application normally uses. Then, make sure to use a server with slightly more memory than that. If your working set won't fit in the RAM, MongoDB will read the data from disk. And even if you use superfast SSD disks, the operation will be much slower than reading from RAM.

So, how do you know if your MongoDB working set fits in your RAM? The simplest way is to execute MongoDB's serverStatus command. From there, take a look at the pages-read-into-cache and unmodified-pages-evicted metrics. If you see high numbers in these two, it most likely means that your working set does not fit in your RAM memory.

Use Replication or Sharding

As with relational databases, another MongoDB best practice is to use replication and/or sharding when your database becomes slow. MongoDB implements replication by use of replica sets, and works similarly to other database systems using primary and secondary nodes. You can instruct your application to run some queries on secondary servers (or use load balancers), relieving some pressure on your primary server.

What's good about MongoDB replication is that it also serves as a great redundancy mechanism. Since it simply copies documents from primary to secondary nodes, electing one of the secondary nodes to be a primary in case your original primary server fails is simple. You won't run into any inconsistencies or complicated election processes with MongoDB. Therefore, replicating your MongoDB is good not only for better performance, but for redundancy.

Replication helps the most for small and medium databases, so once your dataset gets really big, consider sharding. Although replication just copies all the data across multiple servers, sharding actually splits the data into smaller pieces and distributes them across servers. This brings great performance improvement for large data sets and allows you to horizontally scale both reads and writes. You can read more about how it works here.

Summary

As you can see, MongoDB best practices are a mix of typical database best practices and some specific to MongoDB. The nice thing about MongoDB is that you don’t need to start worrying about performance until you have a relatively big database - it’s fast and optimized by design. This doesn't mean you should ignore best practices when working with smaller databases. Some of the best practices we mentioned aren’t just for boosting performance, but can ensure good database design. They should always be top of mind no matter the size of the database.

If you want to learn more about the differences between SQL and NoSQL databases, take a look at our blog post here.