Sharding limitations

Sharding comes with great flexibility, but unfortunately there are a few limitations in the way we can perform some operations.

We will highlight the most important ones here:

  • The group() database command does not work. group() should not be used anyway; use aggregate() and the aggregation framework instead, or mapreduce().
  • db.eval() does not work. db.eval() should not be used regardless and should be disabled in most cases for security reasons.
  • The $isolated option in updates does not work.

This is a functionality that is missing in sharded environments. The $isolated option for update() provides the guarantee that if we update multiple documents at once, other readers and writers will not see some of the documents updated with the new value and the others still having the old value.

The way this is implemented in unsharded environments is by holding a global write lock and/or serializing operations to a single thread to make sure that every request for the documents affected by update() will not be accessed by other threads/operations.

This affects concurrency and performance seriously and makes it prohibitive to implement in a sharded environment.

  • The $snapshot operator in queries is not supported. $snapshot in the find() cursor prevents documents from appearing more than once in the results as a result of being moved to a different location on disk after an update. $snapshot is operationally expensive and often not a hard requirement. The way to substitute it is by using an index for our queries on a field whose keys will not change for the duration of the query.
  • The indexes cannot cover our queries if our queries do not contain the shard key. Results in sharded environments will come from the disk and not exclusively from the index. The only exception is if we query only on the built-in _id field and return only the _id field, in which case MongoDB can still cover the query using built-in indexes.
  • The update() and remove() operations work differently. All update() and remove() operations in a sharded environment must include either the _id of the documents to be affected or the shard key. Otherwise, the mongos router will have to do a full table scan across all collections, databases, and shards, which would be operationally very expensive.
  • Unique indexes across shards need to contain the shard key as a prefix of the index. In other words, to achieve uniqueness of documents across shards, we need to follow the data distribution that MongoDB follows for the shards.
  • Shard key limitations: The shard key has to be up to 512 bytes. The shard key index has to be in ascending order on the key field that gets sharded and optionally other fields as well, or a hashed index on it.

The shard key value in a document is also immutable. If our shard key for our User collection is email, then we cannot update the email value for any user after we set it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.119.206