Chapter 4. Querying

This chapter looks at querying in detail. The main areas covered are as follows:

  • You can query for ranges, set inclusion, inequalities, and more by using $ conditionals.

  • Queries return a database cursor, which lazily returns batches of documents as you need them.

  • There are a lot of metaoperations you can perform on a cursor, including skipping a certain number of results, limiting the number of results returned, and sorting results.

Introduction to find

The find method is used to perform queries in MongoDB. Querying returns a subset of documents in a collection, from no documents at all to the entire collection. Which documents get returned is determined by the first argument to find, which is a document specifying the query criteria.

An empty query document (i.e., {}) matches everything in the collection. If find isn’t given a query document, it defaults to {}. For example, the following:

> db.c.find()

matches every document in the collection c (and returns these documents in batches).

When we start adding key/value pairs to the query document, we begin restricting our search. This works in a straightforward way for most types: numbers match numbers, booleans match booleans, and strings match strings. Querying for a simple type is as easy as specifying the value that you are looking for. For example, to find all documents where the value for "age" is 27, we can add that key/value pair to the query document:

> db.users.find({"age" : 27})

If we have a string we want to match, such as a "username" key with the value "joe", we use that key/value pair instead:

> db.users.find({"username" : "joe"})

Multiple conditions can be strung together by adding more key/value pairs to the query document, which gets interpreted as “condition1 AND condition2 AND … AND conditionN.” For instance, to get all users who are 27-year-olds with the username “joe,” we can query for the following:

> db.users.find({"username" : "joe", "age" : 27})

Specifying Which Keys to Return

Sometimes you do not need all of the key/value pairs in a document returned. If this is the case, you can pass a second argument to find (or findOne) specifying the keys you want. This reduces both the amount of data sent over the wire and the time and memory used to decode documents on the client side.

For example, if you have a user collection and you are interested only in the "username" and "email" keys, you could return just those keys with the following query:

> db.users.find({}, {"username" : 1, "email" : 1})
{
    "_id" : ObjectId("4ba0f0dfd22aa494fd523620"),
    "username" : "joe",
    "email" : "[email protected]"
}

As you can see from the previous output, the "_id" key is returned by default, even if it isn’t specifically requested.

You can also use this second parameter to exclude specific key/value pairs from the results of a query. For instance, you may have documents with a variety of keys, and the only thing you know is that you never want to return the "fatal_weakness" key:

> db.users.find({}, {"fatal_weakness" : 0})

This can also prevent "_id" from being returned:

> db.users.find({}, {"username" : 1, "_id" : 0})
{
    "username" : "joe",
}

Limitations

There are some restrictions on queries. The value of a query document must be a constant as far as the database is concerned. (It can be a normal variable in your own code.) That is, it cannot refer to the value of another key in the document. For example, if we were keeping inventory and we had both "in_stock" and "num_sold" keys, we couldn’t compare their values by querying the following:

> db.stock.find({"in_stock" : "this.num_sold"}) // doesn't work

There are ways to do this (see “$where Queries”), but you will usually get better performance by restructuring your document slightly, such that a “normal” query will suffice. In this example, we could instead use the keys "initial_stock" and "in_stock". Then, every time someone buys an item, we decrement the value of the "in_stock" key by one. Finally, we can do a simple query to check which items are out of stock:

> db.stock.find({"in_stock" : 0})

Query Criteria

Queries can go beyond the exact matching described in the previous section; they can match more complex criteria, such as ranges, OR-clauses, and negation.

Query Conditionals

"$lt", "$lte", "$gt", and "$gte" are all comparison operators, corresponding to <, <=, >, and >=, respectively. They can be combined to look for a range of values. For example, to look for users who are between the ages of 18 and 30, we can do this:

> db.users.find({"age" : {"$gte" : 18, "$lte" : 30}})

This would find all documents where the "age" field was greater than or equal to 18 AND less than or equal to 30.

These types of range queries are often useful for dates. For example, to find people who registered before January 1, 2007, we can do this:

> start = new Date("01/01/2007")
> db.users.find({"registered" : {"$lt" : start}})

Depending on how you create and store dates, an exact match might be less useful, since dates are stored with millisecond precision. Often you want a whole day, week, or month, making a range query necessary.

To query for documents where a key’s value is not equal to a certain value, you must use another conditional operator, "$ne", which stands for “not equal.” If you want to find all users who do not have the username “joe,” you can query for them using this:

> db.users.find({"username" : {"$ne" : "joe"}})

"$ne" can be used with any type.

OR Queries

There are two ways to do an OR query in MongoDB. "$in" can be used to query for a variety of values for a single key. "$or" is more general; it can be used to query for any of the given values across multiple keys.

If you have more than one possible value to match for a single key, use an array of criteria with "$in". For instance, suppose we’re running a raffle and the winning ticket numbers are 725, 542, and 390. To find all three of these documents, we can construct the following query:

> db.raffle.find({"ticket_no" : {"$in" : [725, 542, 390]}})

"$in" is very flexible and allows you to specify criteria of different types as well as values. For example, if we are gradually migrating our schema to use usernames instead of user ID numbers, we can query for either by using this:

> db.users.find({"user_id" : {"$in" : [12345, "joe"]}})

This matches documents with a "user_id" equal to 12345 and documents with a "user_id" equal to "joe".

If "$in" is given an array with a single value, it behaves the same as directly matching the value. For instance, {ticket_no : {$in : [725]}} matches the same documents as {ticket_no : 725}.

The opposite of "$in" is "$nin", which returns documents that don’t match any of the criteria in the array. If we want to return all of the people who didn’t win anything in the raffle, we can query for them with this:

> db.raffle.find({"ticket_no" : {"$nin" : [725, 542, 390]}})

This query returns everyone who did not have tickets with those numbers.

"$in" gives you an OR query for a single key, but what if we need to find documents where "ticket_no" is 725 or "winner" is true? For this type of query, we’ll need to use the "$or" conditional. "$or" takes an array of possible criteria. In the raffle case, using "$or" would look like this:

> db.raffle.find({"$or" : [{"ticket_no" : 725}, {"winner" : true}]})

"$or" can contain other conditionals. If, for example, we want to match any of the three "ticket_no" values or the "winner" key, we can use this:

> db.raffle.find({"$or" : [{"ticket_no" : {"$in" : [725, 542, 390]}},
...                        {"winner" : true}]})

With a normal AND-type query, you want to narrow down your results as far as possible in as few arguments as possible. OR-type queries are the opposite: they are most efficient if the first arguments match as many documents as possible.

While "$or" will always work, use "$in" whenever possible as the query optimizer handles it more efficiently.

$not

"$not" is a metaconditional: it can be applied on top of any other criteria. As an example, let’s consider the modulus operator, "$mod". "$mod" queries for keys whose values, when divided by the first value given, have a remainder of the second value:

> db.users.find({"id_num" : {"$mod" : [5, 1]}})

The previous query returns users with "id_num"s of 1, 6, 11, 16, and so on. If we want, instead, to return users with "id_num"s of 2, 3, 4, 5, 7, 8, 9, 10, 12, etc., we can use "$not":

> db.users.find({"id_num" : {"$not" : {"$mod" : [5, 1]}}})

"$not" can be particularly useful in conjunction with regular expressions to find all documents that don’t match a given pattern (regular expression usage is described in the section “Regular Expressions”).

Type-Specific Queries

As covered in Chapter 2, MongoDB has a wide variety of types that can be used in a document. Some of these types have special behavior when querying.

null

null behaves a bit strangely. It does match itself, so if we have a collection with the following documents:

> db.c.find()
{ "_id" : ObjectId("4ba0f0dfd22aa494fd523621"), "y" : null }
{ "_id" : ObjectId("4ba0f0dfd22aa494fd523622"), "y" : 1 }
{ "_id" : ObjectId("4ba0f148d22aa494fd523623"), "y" : 2 }

we can query for documents whose "y" key is null in the expected way:

> db.c.find({"y" : null})
{ "_id" : ObjectId("4ba0f0dfd22aa494fd523621"), "y" : null }

However, null also matches “does not exist.” Thus, querying for a key with the value null will return all documents lacking that key:

> db.c.find({"z" : null})
{ "_id" : ObjectId("4ba0f0dfd22aa494fd523621"), "y" : null }
{ "_id" : ObjectId("4ba0f0dfd22aa494fd523622"), "y" : 1 }
{ "_id" : ObjectId("4ba0f148d22aa494fd523623"), "y" : 2 }

If we only want to find keys whose value is null, we can check that the key is null and exists using the "$exists" conditional:

> db.c.find({"z" : {"$eq" : null, "$exists" : true}})

Regular Expressions

"$regex" provides regular expression capabilities for pattern matching strings in queries. Regular expressions are useful for flexible string matching. For example, if we want to find all users with the name “Joe” or “joe,” we can use a regular expression to do case-insensitive matching:

> db.users.find( {"name" : {"$regex" : /joe/i } })

Regular expression flags (e.g., i) are allowed but not required. If we want to match not only various capitalizations of “joe,” but also “joey,” we can continue to improve our regular expression:

> db.users.find({"name" : /joey?/i})

MongoDB uses the Perl Compatible Regular Expression (PCRE) library to match regular expressions; any regular expression syntax allowed by PCRE is allowed in MongoDB. It is a good idea to check your syntax with the JavaScript shell before using it in a query to make sure it matches what you think it matches.

Note

MongoDB can leverage an index for queries on prefix regular expressions (e.g., /^joey/). Indexes cannot be used for case-insensitive searches (/^joey/i). A regular expression is a “prefix expression” when it starts with either a caret (^) or a left anchor (A). If the regular expression uses a case-sensitive query, then if an index exists for the field, the matches can be conducted against values in the index. If it also is a prefix expression, then the search can be limited to the values within the range created by that prefix from the index.

Regular expressions can also match themselves. Very few people insert regular expressions into the database, but if you insert one, you can match it with itself:

> db.foo.insertOne({"bar" : /baz/})
> db.foo.find({"bar" : /baz/})
{
    "_id" : ObjectId("4b23c3ca7525f35f94b60a2d"),
    "bar" : /baz/
}

Querying Arrays

Querying for elements of an array is designed to behave the way querying for scalars does. For example, if the array is a list of fruits, like this:

> db.food.insertOne({"fruit" : ["apple", "banana", "peach"]})

the following query will successfully match the document:

> db.food.find({"fruit" : "banana"})

We can query for it in much the same way as we would if we had a document that looked like the (illegal) document {"fruit" : "apple", "fruit" : "banana", "fruit" : "peach"}.

“$all”

If you need to match arrays by more than one element, you can use "$all". This allows you to match a list of elements. For example, suppose we create a collection with three elements:

> db.food.insertOne({"_id" : 1, "fruit" : ["apple", "banana", "peach"]})
> db.food.insertOne({"_id" : 2, "fruit" : ["apple", "kumquat", "orange"]})
> db.food.insertOne({"_id" : 3, "fruit" : ["cherry", "banana", "apple"]})

Then we can find all documents with both "apple" and "banana" elements by querying with "$all":

> db.food.find({fruit : {$all : ["apple", "banana"]}})
{"_id" : 1, "fruit" : ["apple", "banana", "peach"]}
{"_id" : 3, "fruit" : ["cherry", "banana", "apple"]}

Order does not matter. Notice "banana" comes before "apple" in the second result. Using a one-element array with "$all" is equivalent to not using "$all". For instance, {fruit : {$all : ['apple']} will match the same documents as {fruit : 'apple'}.

You can also query by exact match using the entire array. However, exact match will not match a document if any elements are missing or superfluous. For example, this will match the first of our three documents:

> db.food.find({"fruit" : ["apple", "banana", "peach"]})

But this will not:

> db.food.find({"fruit" : ["apple", "banana"]})

and neither will this:

> db.food.find({"fruit" : ["banana", "apple", "peach"]})

If you want to query for a specific element of an array, you can specify an index using the syntax key.index:

> db.food.find({"fruit.2" : "peach"})

Arrays are always 0-indexed, so this would match the third array element against the string "peach".

“$size”

A useful conditional for querying arrays is "$size", which allows you to query for arrays of a given size. Here’s an example:

> db.food.find({"fruit" : {"$size" : 3}})

One common query is to get a range of sizes. "$size" cannot be combined with another $ conditional (in this example, "$gt"), but this query can be accomplished by adding a "size" key to the document. Then, every time you add an element to the array, increment the value of "size". If the original update looked like this:

> db.food.update(criteria, {"$push" : {"fruit" : "strawberry"}})

it can simply be changed to this:

> db.food.update(criteria,
... {"$push" : {"fruit" : "strawberry"}, "$inc" : {"size" : 1}})

Incrementing is extremely fast, so any performance penalty is negligible. Storing documents like this allows you to do queries such as this:

> db.food.find({"size" : {"$gt" : 3}})

Unfortunately, this technique doesn’t work as well with the "$addToSet" operator.

“$slice”

As mentioned earlier in this chapter, the optional second argument to find specifies the keys to be returned. The special "$slice" operator can be used to return a subset of elements for an array key.

For example, suppose we had a blog post document and we wanted to return the first 10 comments:

> db.blog.posts.findOne(criteria, {"comments" : {"$slice" : 10}})

Alternatively, if we wanted the last 10 comments, we could use −10:

> db.blog.posts.findOne(criteria, {"comments" : {"$slice" : -10}})

"$slice" can also return pages in the middle of the results by taking an offset and the number of elements to return:

> db.blog.posts.findOne(criteria, {"comments" : {"$slice" : [23, 10]}})

This would skip the first 23 elements and return the 24th through 33rd. If there were fewer than 33 elements in the array, it would return as many as possible.

Unless otherwise specified, all keys in a document are returned when "$slice" is used. This is unlike the other key specifiers, which suppress unmentioned keys from being returned. For instance, if we had a blog post document that looked like this:

{
    "_id" : ObjectId("4b2d75476cc613d5ee930164"),
    "title" : "A blog post",
    "content" : "...",
    "comments" : [
        {
            "name" : "joe",
            "email" : "[email protected]",
            "content" : "nice post."
        },
        {
            "name" : "bob",
            "email" : "[email protected]",
            "content" : "good post."
        }
    ]
}

and we did a "$slice" to get the last comment, we’d get this:

> db.blog.posts.findOne(criteria, {"comments" : {"$slice" : -1}})
{
    "_id" : ObjectId("4b2d75476cc613d5ee930164"),
    "title" : "A blog post",
    "content" : "...",
    "comments" : [
        {
            "name" : "bob",
            "email" : "[email protected]",
            "content" : "good post."
        }
    ]
}

Both "title" and "content" are still returned, even though they weren’t explicitly included in the key specifier.

Returning a matching array element

"$slice" is helpful when you know the index of the element, but sometimes you want whichever array element matched your criteria. You can return the matching element with the $ operator. Given the previous blog example, you could get Bob’s comment back with:

> db.blog.posts.find({"comments.name" : "bob"}, {"comments.$" : 1})
{
    "_id" : ObjectId("4b2d75476cc613d5ee930164"),
    "comments" : [
        {
            "name" : "bob",
            "email" : "[email protected]",
            "content" : "good post."
        }
    ]
}

Note that this only returns the first match for each document: if Bob had left multiple comments on this post, only the first one in the "comments" array would be returned.

Array and range query interactions

Scalars (nonarray elements) in documents must match each clause of a query’s criteria. For example, if you queried for {"x" : {"$gt" : 10, "$lt" : 20}}, "x" would have to be both greater than 10 and less than 20. However, if a document’s "x" field is an array, the document matches if there is an element of "x" that matches each part of the criteria but each query clause can match a different array element.

The best way to understand this behavior is to see an example. Suppose we have the following documents:

{"x" : 5}
{"x" : 15}
{"x" : 25}
{"x" : [5, 25]}

If we wanted to find all documents where "x" is between 10 and 20, we might naively structure a query as db.test.find({"x" : {"$gt" : 10, "$lt" : 20}}) and expect to get back one document: {"x" : 15}. However, running this, we get two:

> db.test.find({"x" : {"$gt" : 10, "$lt" : 20}})
{"x" : 15}
{"x" : [5, 25]}

Neither 5 nor 25 is between 10 and 20, but the document is returned because 25 matches the first clause (it is greater than 10) and 5 matches the second clause (it is less than 20).

This makes range queries against arrays essentially useless: a range will match any multielement array. There are a couple of ways to get the expected behavior.

First, you can use "$elemMatch" to force MongoDB to compare both clauses with a single array element. However, the catch is that "$elemMatch" won’t match nonarray elements:

> db.test.find({"x" : {"$elemMatch" : {"$gt" : 10, "$lt" : 20}}})
> // no results

The document {"x" : 15} no longer matches the query, because the "x" field is not an array. That said, you should have a good reason for mixing array and scalar values in a field. Many uses cases do not require mixing. For those, "$elemMatch" provides a good solution for range queries on array elements.

If you have an index over the field that you’re querying on (see Chapter 5), you can use min and max to limit the index range traversed by the query to your "$gt" and "$lt" values:

> db.test.find({"x" : {"$gt" : 10, "$lt" : 20}}).min({"x" : 10}).max({"x" : 20})
{"x" : 15}

Now this will only traverse the index from 10 to 20, missing the 5 and 25 entries. You can only use min and max when you have an index on the field you are querying for, though, and you must pass all fields of the index to min and max.

Using min and max when querying for ranges over documents that may include arrays is generally a good idea. The index bounds for a "$gt"/"$lt" query over an array is inefficient. It basically accepts any value, so it will search every index entry, not just those in the range.

Querying on Embedded Documents

There are two ways of querying for an embedded document: querying for the whole document or querying for its individual key/value pairs.

Querying for an entire embedded document works identically to a normal query. For example, if we have a document that looks like this:

{
    "name" : {
        "first" : "Joe",
        "last" : "Schmoe"
    },
    "age" : 45
}

we can query for someone named Joe Schmoe with the following:

> db.people.find({"name" : {"first" : "Joe", "last" : "Schmoe"}})

However, a query for a full subdocument must exactly match the subdocument. If Joe decides to add a middle name field, suddenly this query won’t work anymore; it doesn’t match the entire embedded document! This type of query is also order-sensitive: {"last" : "Schmoe", "first" : "Joe"} would not be a match.

If possible, it’s usually a good idea to query for just a specific key or keys of an embedded document. Then, if your schema changes, all of your queries won’t suddenly break because they’re no longer exact matches. You can query for embedded keys using dot notation:

> db.people.find({"name.first" : "Joe", "name.last" : "Schmoe"})

Now, if Joe adds more keys, this query will still match his first and last names.

This dot notation is the main difference between query documents and other document types. Query documents can contain dots, which mean “reach into an embedded document.” Dot notation is also the reason that documents to be inserted cannot contain the . character. Oftentimes people run into this limitation when trying to save URLs as keys. One way to get around it is to always perform a global replace before inserting or after retrieving, substituting a character that isn’t legal in URLs for the dot character.

Embedded document matches can get a little tricky as the document structure gets more complicated. For example, suppose we are storing blog posts and we want to find comments by Joe that were scored at least a 5. We could model the post as follows:

> db.blog.find()
{
    "content" : "...",
    "comments" : [
        {
            "author" : "joe",
            "score" : 3,
            "comment" : "nice post"
        },
        {
            "author" : "mary",
            "score" : 6,
            "comment" : "terrible post"
        }
    ]
}

Now, we can’t query using db.blog.find({"comments" : {"author" : "joe", "score" : {"$gte" : 5}}}). Embedded document matches have to match the whole document, and this doesn’t match the "comment" key. It also wouldn’t work to do db.blog.find({"comments.author" : "joe", "comments.score" : {"$gte" : 5}}), because the author criterion could match a different comment than the score criterion. That is, it would return the document shown above: it would match "author" : "joe" in the first comment and "score" : 6 in the second comment.

To correctly group criteria without needing to specify every key, use "$elemMatch". This vaguely named conditional allows you to partially specify criteria to match a single embedded document in an array. The correct query looks like this:

> db.blog.find({"comments" : {"$elemMatch" : 
... {"author" : "joe", "score" : {"$gte" : 5}}}})

"$elemMatch" allows you to “group” your criteria. As such, it’s only needed when you have more than one key you want to match on in an embedded document.

$where Queries

Key/value pairs are a fairly expressive way to query, but there are some queries that they cannot represent. For queries that cannot be done any other way, there are "$where" clauses, which allow you to execute arbitrary JavaScript as part of your query. This allows you to do (almost) anything within a query. For security, use of "$where" clauses should be highly restricted or eliminated. End users should never be allowed to execute arbitrary "$where" clauses.

The most common case for using "$where" is to compare the values for two keys in a document. For instance, suppose we have documents that look like this:

> db.foo.insertOne({"apple" : 1, "banana" : 6, "peach" : 3})
> db.foo.insertOne({"apple" : 8, "spinach" : 4, "watermelon" : 4})

We’d like to return documents where any two of the fields are equal. For example, in the second document, "spinach" and "watermelon" have the same value, so we’d like that document returned. It’s unlikely MongoDB will ever have a $ conditional for this, so we can use a "$where" clause to do it with JavaScript:

> db.foo.find({"$where" : function () {
... for (var current in this) {
...     for (var other in this) {
...         if (current != other && this[current] == this[other]) {
...             return true;
...         }
...     }
... }
... return false;
... }});

If the function returns true, the document will be part of the result set; if it returns false, it won’t be.

"$where" queries should not be used unless strictly necessary: they are much slower than regular queries. Each document has to be converted from BSON to a JavaScript object and then run through the "$where" expression. Indexes cannot be used to satisfy a "$where" either. Hence, you should use "$where" only when there is no other way of doing the query. You can cut down on the penalty by using other query filters in combination with "$where". If possible, an index will be used to filter based on the non-$where clauses; the "$where" expression will be used only to fine-tune the results. MongoDB 3.6 added the $expr operator which allows the use of aggregation expressions with the MongoDB query language. It is faster than $where as it does not execute JavaScript and is recommended as a replacement to this operator where possible.

Another way of doing complex queries is to use one of the aggregation tools, which are covered in Chapter 7.

Cursors

The database returns results from find using a cursor. The client-side implementations of cursors generally allow you to control a great deal about the eventual output of a query. You can limit the number of results, skip over some number of results, sort results by any combination of keys in any direction, and perform a number of other powerful operations.

To create a cursor with the shell, put some documents into a collection, do a query on them, and assign the results to a local variable (variables defined with "var" are local). Here, we create a very simple collection and query it, storing the results in the cursor variable:

> for(i=0; i<100; i++) {
...     db.collection.insertOne({x : i});
... }
> var cursor = db.collection.find();

The advantage of doing this is that you can look at one result at a time. If you store the results in a global variable or no variable at all, the MongoDB shell will automatically iterate through and display the first couple of documents. This is what we’ve been seeing up until this point, and it is often the behavior you want for seeing what’s in a collection but not doing actual programming with the shell.

To iterate through the results, you can use the next method on the cursor. You can use hasNext to check whether there is another result. A typical loop through result looks like the following:

> while (cursor.hasNext()) {
...     obj = cursor.next();
...     // do stuff
... }

cursor.hasNext() checks that the next result exists, and cursor.next() fetches it.

The cursor class also implements JavaScript’s iterator interface, so you can use it in a forEach loop:

> var cursor = db.people.find();
> cursor.forEach(function(x) {
...     print(x.name);
... });
adam
matt
zak

When you call find, the shell does not query the database immediately. It waits until you start requesting results to send the query, which allows you to chain additional options onto a query before it is performed. Almost every method on a cursor object returns the cursor itself, so that you can chain options in any order. For instance, all of the following are equivalent:

> var cursor = db.foo.find().sort({"x" : 1}).limit(1).skip(10);
> var cursor = db.foo.find().limit(1).sort({"x" : 1}).skip(10);
> var cursor = db.foo.find().skip(10).limit(1).sort({"x" : 1});

At this point, the query has not been executed yet. All of these functions merely build the query. Now, suppose we call the following:

> cursor.hasNext()

At this point, the query will be sent to the server. The shell fetches the first 100 results or first 4 MB of results (whichever is smaller) at once so that the next calls to next or hasNext will not have to make trips to the server. After the client has run through the first set of results, the shell will again contact the database and ask for more results with a getMore request. getMore requests basically contain an identifier for the cursor and ask the database if there are any more results, returning the next batch if there are. This process continues until the cursor is exhausted and all results have been returned.

Limits, Skips, and Sorts

The most common query options are limiting the number of results returned, skipping a number of results, and sorting. All these options must be added before a query is sent to the database.

To set a limit, chain the limit function onto your call to find. For example, to only return three results, use this:

> db.c.find().limit(3)

If there are fewer than three documents matching your query in the collection, only the number of matching documents will be returned; limit sets an upper limit, not a lower limit.

skip works similarly to limit:

> db.c.find().skip(3)

This will skip the first three matching documents and return the rest of the matches. If there are fewer than three documents in your collection, it will not return any documents.

sort takes an object: a set of key/value pairs where the keys are key names and the values are the sort directions. The sort direction can be 1 (ascending) or −1 (descending). If multiple keys are given, the results will be sorted in that order. For instance, to sort the results by "username" ascending and "age" descending, we do the following:

> db.c.find().sort({username : 1, age : -1})

These three methods can be combined. This is often handy for pagination. For example, suppose that you are running an online store and someone searches for mp3. If you want 50 results per page sorted by price from high to low, you can do the following:

> db.stock.find({"desc" : "mp3"}).limit(50).sort({"price" : -1})

If that person clicks Next Page to see more results, you can simply add a skip to the query, which will skip over the first 50 matches (which the user already saw on page 1):

> db.stock.find({"desc" : "mp3"}).limit(50).skip(50).sort({"price" : -1})

However, large skips are not very performant; there are suggestions for how to avoid them in the next section.

Comparison order

MongoDB has a hierarchy as to how types compare. Sometimes you will have a single key with multiple types: for instance, integers and booleans, or strings and nulls. If you do a sort on a key with a mix of types, there is a predefined order that they will be sorted in. From least to greatest value, this ordering is as follows:

  1. Minimum value

  2. Null

  3. Numbers (integers, longs, doubles, decimals)

  4. Strings

  5. Object/document

  6. Array

  7. Binary data

  8. Object ID

  9. Boolean

  10. Date

  11. Timestamp

  12. Regular expression

  13. Maximum value

Avoiding Large Skips

Using skip for a small number of documents is fine. But for a large number of results, skip can be slow, since it has to find and then discard all the skipped results. Most databases keep more metadata in the index to help with skips, but MongoDB does not yet support this, so large skips should be avoided. Often you can calculate the results of the next query based on the previous one.

Paginating results without skip

The easiest way to do pagination is to return the first page of results using limit and then return each subsequent page as an offset from the beginning:

> // do not use: slow for large skips
> var page1 = db.foo.find(criteria).limit(100)
> var page2 = db.foo.find(criteria).skip(100).limit(100)
> var page3 = db.foo.find(criteria).skip(200).limit(100)
...

However, depending on your query, you can usually find a way to paginate without skips. For example, suppose we want to display documents in descending order based on "date". We can get the first page of results with the following:

> var page1 = db.foo.find().sort({"date" : -1}).limit(100)

Then, assuming the date is unique, we can use the "date" value of the last document as the criterion for fetching the next page:

var latest = null;

// display first page
while (page1.hasNext()) {
   latest = page1.next();
   display(latest);
}

// get next page
var page2 = db.foo.find({"date" : {"$lt" : latest.date}});
page2.sort({"date" : -1}).limit(100);

Now the query does not need to include a skip.

Finding a random document

One fairly common problem is how to get a random document from a collection. The naive (and slow) solution is to count the number of documents and then do a find, skipping a random number of documents between zero and the size of the collection:

> // do not use
> var total = db.foo.count()
> var random = Math.floor(Math.random()*total)
> db.foo.find().skip(random).limit(1)

It is actually highly inefficient to get a random element this way: you have to do a count (which can be expensive if you are using criteria), and skipping large numbers of elements can be time-consuming.

It takes a little forethought, but if you know you’ll be looking up a random element in a collection, there’s a much more efficient way to do so. The trick is to add an extra random key to each document when it is inserted. For instance, if we’re using the shell, we could use the Math.random() function (which creates a random number between 0 and 1):

> db.people.insertOne({"name" : "joe", "random" : Math.random()})
> db.people.insertOne({"name" : "john", "random" : Math.random()})
> db.people.insertOne({"name" : "jim", "random" : Math.random()})

Now, when we want to find a random document from the collection, we can calculate a random number and use that as a query criterion, instead of using skip:

> var random = Math.random()
> result = db.people.findOne({"random" : {"$gt" : random}})

There is a slight chance that random will be greater than any of the "random" values in the collection, and no results will be returned. We can guard against this by simply returning a document in the other direction:

> if (result == null) {
...     result = db.people.findOne({"random" : {"$lte" : random}})
... }

If there aren’t any documents in the collection, this technique will end up returning null, which makes sense.

This technique can be used with arbitrarily complex queries; just make sure to have an index that includes the random key. For example, if we want to find a random plumber in California, we can create an index on "profession", "state", and "random":

> db.people.ensureIndex({"profession" : 1, "state" : 1, "random" : 1})

This allows us to quickly find a random result (see Chapter 5 for more information on indexing).

Immortal Cursors

There are two sides to a cursor: the client-facing cursor and the database cursor that the client-side one represents. We have been talking about the client-side one up until now, but we are going to take a brief look at what’s happening on the server.

On the server side, a cursor takes up memory and resources. Once a cursor runs out of results or the client sends a message telling it to die, the database can free the resources it was using. Freeing these resources lets the database use them for other things, which is good, so we want to make sure that cursors can be freed quickly (within reason).

There are a couple of conditions that can cause the death (and subsequent cleanup) of a cursor. First, when a cursor finishes iterating through the matching results, it will clean itself up. Another way is that, when a cursor goes out of scope on the client side, the drivers send the database a special message to let it know that it can kill that cursor. Finally, even if the user hasn’t iterated through all the results and the cursor is still in scope, after 10 minutes of inactivity, a database cursor will automatically “die.” This way, if a client crashes or is buggy, MongoDB will not be left with thousands of open cursors.

This “death by timeout” is usually the desired behavior: very few applications expect their users to sit around for minutes at a time waiting for results. However, sometimes you might know that you need a cursor to last for a long time. In that case, many drivers have implemented a function called immortal, or a similar mechanism, which tells the database not to time out the cursor. If you turn off a cursor’s timeout, you must iterate through all of its results or kill it to make sure it gets closed. Otherwise, it will sit around in the database hogging resources until the server is restarted.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.129.247.196