Why aggregation?

Aggregation framework was introduced by MongoDB in version 2.2 (2.1 in development branch). It serves as an alternative to both the MapReduce framework and also querying the database directly.

Using the aggregation framework, we can perform group by operations in the server. Thus we can project only the fields that are needed in the result set. Using the $match and $project operators, we can reduce the amount of data passed through the pipeline, resulting in faster data processing.

Self-joins, that is, joining data within the same collection, can also be performed using the aggregation framework as we will see in our use case.

When comparing the aggregation framework to the queries available via the shell or various other drivers, there is a use case for both.

For selection and projection queries, it's almost always better to use simple queries as the complexity of developing, testing, and deploying an aggregation framework operation cannot easily outweigh the simplicity of using built-in commands. Finding documents with ( db.books.find({price: 50}, {price: 1, name: 1}) ) or without ( db.books.find({price: 50}) ) projecting only some of the fields is simple and fast enough to not warrant usage of the aggregation framework.

On the other hand, if we want to perform group by and self-join operations using MongoDB, there might be a case for the aggregation framework. The most important limitation of the group() command in the MongoDB shell is that the result set has to fit in a document, thus meaning that it can't be more than 16 MB in size. In addition, the result of any group() command can't have more than 20,000 results. Finally, group() doesn't work with sharded input collections, which means that when our data size grows we have to rewrite our queries anyway.

In comparison to MapReduce, the aggregation framework is more limited in functionality and flexibility. In aggregation framework, we are limited by the available operators at hand. On the plus side, the API for aggregation framework is simpler to grasp and use than MapReduce. In terms of performance, aggregation framework was way faster than MapReduce in earlier versions of MongoDB but seems to be on a par with the most recent versions after the improvement in performance by MapReduce.

Finally, there is always the case of using the database as data storage and performing complex operations by the application. This can be quick to develop sometimes, but should be avoided as it will most likely incur memory, networking, and ultimately performance costs down the road.

In the next section, we will explain the available operators before using them in a real case.

Table of Contents for Why aggregation?

Create new playlist

Sign In

Sign Up

Table of Contents for
Why aggregation?