Last week I was in a chat with a couple of guys who had just started experimenting with large scale databases. At one point I said: “Unless you can afford enough RAM to fit all your important data along with the indexes, don’t even consider using MongoDB!”. To my surprise these guys were shocked and expressed great disappointment… It seems the general idea newcomers get when considering NoSQL databases is that they can provide a huge performance boost for free. After working with MongoDB for 18 months, and having been disappointed myself when discovering how much reality differed from the hype, I beg to differ. Everything comes at a cost, and that cost in this case is greater memory and disk space requirements.
Why all the RAM?
MongoDB gets most of its performance gains from the fact that it tries to keep as much data in RAM as possible. As we saw in last week’s article, memory consumption in MongoDB can become a pitfall if not well planned out. It’s true that MongoDB allows for other powerful methods to improve performance, but it doesn’t matter how much optimizations you do to your code, if you simply cannot load the data fast enough from wherever it is stored. Therefore, if you are continuously generating IO requests to load your working data from a hard drive, most of the performance gains will be lost. But why wouldn’t you want to consider MongoDB even if there is not enough RAM? Well, the reason is that you will reach a point where the amount of RAM available is so little compared with the data you’re working with, that the speed with which you can work will become directly dependent on your IO speed (i.e. on how fast you can load data from hard drive to memory). In that case, you would want to store your data into the smallest space possible so that it can be read faster from the hard drive/s. That is why some DBMS’s provide optional data compression and unfortunately this is where MongoDB might fail. At the time of writing, MongoDB does not yet provide any means of data compression. Apart from this, it also tends to consume more disk space that its RDBMS counterparts.
MySQL vs MongoDB – Disk Usage Benchmark
For the purpose of this post I created a small case study with some benchmarks to provide you with some tangible comparison results. I built a sample data set which was stored in both MySQL and MongoDB in order to be compared for disk space usage. Since I have been working on software usage analytics for a while, I used a very basic sample data set that can be potentially used to store usage data in its most simplistic form. Now, before you go on to post comments such as “…but there are better ways to store that data…”, just keep in mind that this is not an optimization exercise. This test is simply intended to compare like with like between MySQL and MongoDB when it comes to disk space usage. For this test, assume we want to store daily data about every user. We will be storing the date, some form of user ID, the product version number, the build number and the language of the product installed on the user’s machine. We will also store some basic platform information such as the Operating System version and OS language as well as product usage statistics such as how many times the product was run and and how long the user spent interacting with the monitored product (represented in runtime minutes and sessions). These are shown in the table below:
|ID Field||Date||User ID||Product Version||Product Build||OS Version||OS Language||Product Language||Runtime Minutes||Sessions|
You may notice I have also added a field named “_id”. That is because MongoDB always stores such a field which is also indexed. This field can be filled with custom data or else MongoDB can create its own ID. For this sample, I’m filling all the fields with integers. In real life these may then be mapped to friendly names. The field names I used were all only 2 characters long. In MySQL this doesn’t make much difference, but in MongoDB, the longer the names, the larger the data that is stored, so here we try to keep the data as small as possible.
The test sample was built to simulate data collected from an application being used by 10,000 users daily for 365 days. Therefore, I inserted 3.65 million records. In MySQL I also added an index on the “_id” field so that we emulate the index that is forced by MongoDB on the said field. Since MySQL offers the option of compressing the data and the indexes, I also measured the data size with compression turned on. The results in the third column were obtained after enabling compression with a key block size of 4KB. The resulting data size was as follows:
4KB block size compression
|Data: 306MB||Data: 251MB||Data: 113MB|
|Index: 83MB||Index: 59MB||Index: 22MB|
|Total: 439MB||Total: 310MB||Total: 135MB|
As can be seen from the results above, MySQL consumed less storage space both in terms of the data and the index. This difference became much more significant when enabling compression where MySQL was consuming just 31% of the space consumed by MongoDB for storing the same data. Obviously, compression would then come at a cost of much slower writes due to the high amount of CPU cycles that would be consumed for compressing the data. So this has to also be taken into consideration. One should also point out that MongoDB can, in some scenarios, beat MySQL hands down when it comes to performance once the data is in RAM. However, if your IT budget is low and you cannot afford a server with enough RAM to host MongoDB, or if you are expecting a free performance boost, it might make more sense to stick with traditional RDBMS because switching to a technology like MongoDB might actually make your system run slower!