Query results and cursors

MongoDB's lack of support for transactions means that several semantics that we take for granted in RDBMS work differently.

As explained before, updates can modify the size of a document. Modifying the size can result in MongoDB moving the document on disk to a new slot towards the end of the storage file.

When we have multiple threads querying and updating a single collection, we can end up with a document appearing multiple times in the result set.

This will happen in the following scenario:

  • Thread A starts querying the collection and matches document A1
  • Thread B updates document A1, increasing its size and forcing MongoDB to move it to a different physical location towards the end of the storage file
  • Thread A is still querying the collection. It reaches the end of the collection and finds document A1 again with its new value

This is rare but can happen in production. If we can't safeguard from such a case in the application layer, we can use snapshot() to prevent it.

snapshot() is supported by official drivers and the shell by appending it into an operation that returns a cursor:

> db.books.find().snapshot()

$snapshot cannot be used with sharded collections. $snapshot has to be applied before the query returns the first document. Snapshot cannot be used together with hint() or sort() operators.

We can simulate the snapshot() behavior by querying using hint({id :1}), thus forcing the query engine to use the id index just like the $snapshot operator.

If our query runs on a unique index of a field whose values won't get modified during the duration of the query, we should use this to query to get the same query behavior. Even then, snapshot() cannot protect us from insertions or deletions happening in the middle of a query. The $snapshot operator will traverse the built-in index that every collection has on the id field, making it inherently slow. It should only be used as a last resort.

If we want to update, insert, or delete multiple documents without other threads seeing the results of our operation while it's happening, we can use the $isolated operator:

> db.books.remove( { price: { $gt: 30 }, $isolated: 1 } )

In this example, threads querying the books collection will see either all books with price greater than 30 or no books at all. The isolated operator will acquire an exclusive write lock in the collection for the whole duration of the query, no matter what the storage engine can support, contributing to contention in this collection.

Isolated operations are still not transactions. They don't provide atomicity ( "all-or-nothing"). So, if they fail midway, we need to manually roll back the operation to get our database into a consistent state.

Again, this should be a last resort and only used in cases where it's mission-critical to avoid multiple threads seeing inconsistent information at any time.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.168.46