MapReduce concurrency

MapReduce operations will place several short-lived locks that should not affect operations. However, at the end of the reduce phase, if we are outputting data to an existing collection, then output actions such as merge, reduce, and replace will take an exclusive global write lock for the whole server, blocking all other writes in the db instance. If we want to avoid that we should invoke MapReduce in the following way:

> db.collection.mapReduce(
                        mapper,
                        reducer,
                        {
                          out: { merge/reduce: bookOrders, nonAtomic: true  }
                        })

We can apply nonAtomic only to merge or reduce actions. replace will just replace the contents of documents in bookOrders, which would not take much time anyway.

With the merge action, the new result is merged with the existing result if the output collection already exists. If an existing document has the same key as the new result, then it will overwrite that existing document.

With the reduce action, the new result is processed together with the existing result if the output collection already exists. If an existing document has the same key as the new result, it will apply the reduce function to both the new and the existing documents and overwrite the existing document with the result.

Although MapReduce has been present since the early versions of MongoDB, it hasn't evolved as much as the rest of the database, resulting in its usage being less than that of specialized MapReduce frameworks such as Hadoop, which we will learn more about in Chapter 9, Harnessing Big Data with MongoDB.

Table of Contents for MapReduce concurrency

Create new playlist

Sign In

Sign Up

Table of Contents for
MapReduce concurrency