In Chapter 14, “Manipulating MongoDB Documents from Node.js,” you learned how to create and manipulate documents and got a brief glimpse into finding them using the find()
method. This chapter takes a deeper look at accessing documents in a MongoDB collection using the MongoDB Node.js driver module.
There is much more to accessing documents than just returning everything in a collection. This chapter covers using the query
object to limit which documents are returned as well as methods to limit the fields and number of documents in the query results. You also see how to count the number of documents that match query criteria without actually retrieving them from the server. This chapter also covers some advanced aggregation techniques to group the results and even generate a new fully aggregated set of documents.
For the purpose of introducing you to the various methods of accessing data, all the data used for examples in this chapter comes from the same data set. The data set is a collection that contains various information about 5,000 words. This provides a large enough data set to implement the necessary examples.
The structure of objects in this data set is as follows and should be fairly intuitive (which is why it was selected). This document structure gives us fields that are strings, integers, arrays, subdocuments, and arrays of subdocuments:
{ word: <word>, first: <first_letter>, last: <last_letter>, size: <character_count>, letters: [<array_of_characters_in_word_no_repeats>], stats: { vowels:<vowel_count>, consonants:<consonant_count>}, charsets: [ { "type": <consonants_vowels_other>, "chars": [<array_of_characters_of_type_in_word>]}, . . . ], }
Throughout this chapter the various methods of data all use a query
object of some sort to define which documents to retrieve from a MongoDB collection. The query
object is just a standard JavaScript object with special property names that are understood by the MongoDB Node.js driver. These property names match closely the native queries that you can perform inside the MongoDB client, which makes it nice to be able to transfer back and forth.
The properties of the query
object are called operators because they operate on the data to determine whether a document should be included in the result set. These operators are used to match the values of fields in the document against specific criteria.
For example, to find all documents with a count
value greater than 10
and name
value equal to test
, the query
object would be
{count:{$gt:10}, name:'test'}
The operator $gt
specifies documents with a count
field larger than 10
. The standard colon syntax of name:'test'
is also an operator that specifies that the name
field must equal test
. Notice that the query
object has multiple operators. You can include several different operators in the same query.
When specifying field names in a query
object, you can use dot notation to specify subdocument fields. For example, consider the following object format:
{ name:"test", stats: { height:74, eyes:'blue' } }
You can query users that have blue eyes using the following query
object:
{stats.eyes:'blue'}
Table 15.1 lists the more commonly used operators.
Table 15.1 query
object operators that define the result set returned by MongoDB requests
Operator |
Description |
|
Matches documents with fields that have a value equal to the value specified. |
|
Matches values that are greater than the value specified in the query. For example: |
|
Matches values that are equal to or greater than the value specified in the query. For example: |
|
Matches any of the values that exist in an array specified in the query. For example: |
|
Matches values that are less than the value specified in the query. For example: |
|
Matches values that are less than or equal to the value specified in the query. For example: |
|
Matches all values that are not equal to the value specified in the query. For example: |
|
Matches values that do not exist in an array specified to the query. For example: |
|
Joins query clauses with a logical OR; returns all documents that match the conditions of either clause. For example:
|
|
Joins query clauses with a logical AND; returns all documents that match the conditions of both clauses. For example:
|
|
Inverts the effect of a query expression and returns documents that do not match the query expression. For example: |
|
Joins query clauses with a logical NOR; returns all documents that fail to match both clauses. For example: |
|
Matches documents that have the specified field. For example: |
|
Selects documents if a field is of the specified BSON type number. Table 11.1 lists the different BSON type numbers. For example:
|
|
Performs a modulo operation on the value of a field and selects documents with a specified result. The value for the modulo operation is specified as an array with the first number being the number to divide by and the second being the remainder. For example: |
Selects documents where values match a specified regular expression. For example: |
|
|
Matches arrays that contain all elements specified in the query. For example: |
|
Selects documents if an element in the array of subdocuments has fields that match all the specified |
|
Selects documents if the array field is a specified size. For example: |
In addition to the query
object, most of the methods of retrieving documents using the MongoDB Node.js driver also include an options
object. The options
object allows you to define the behavior of the request when retrieving documents. These options allow you to limit the result set, sort items while creating the result set, and much more.
Table 15.2 lists the options that can be set on methods that retrieve documents from the MongoDB server. Not all of these methods are available on every request. For example, when counting items that match the query it doesn’t make sense to specify a limit on the result.
Table 15.2 Options that can be specified in the options
object when querying documents
Option |
Description |
|
Specifies the maximum number of documents to return. |
|
Specifies the sort order of documents as an array of
|
|
Specifies an object whose fields match fields that should be included or excluded from the returned documents. A value of |
|
Specifies the number of documents from the query results to skip before returning a document. Typically used when paginating result sets. |
|
Forces the query to use specific indexes when building the result set. For example: |
|
Returns an explanation of what will happen when performing the query on the server instead of actually running the query. This is essential when trying to debug/optimize complex queries. |
{Boolean, default:false}; specifies whether to use a snapshot query. |
|
|
A Boolean; when true, the cursor is allowed to timeout. |
|
Specifies the maximum number of documents to scan when performing a query before returning. This is useful if you have a collection that has millions of objects and you don’t want queries to run on forever. |
|
Specifies a string that is printed out in the MongoDB logs. This can help when troubleshooting because you can identify queries more easily. |
|
Specifies whether you want to read from a primary or secondary replica or just the nearest MongoDB server in the replica set to perform the query. |
|
Specifies the number of timeout retries to perform on the query before failing. The default is 5. |
|
A Boolean that, when |
You were introduced to the find()
method of the Collection
object in Chapter 14. This method returns a Cursor
object to the callback function providing access to the documents. If no query is specified, all documents are returned, which is rarely the case. Instead, you need a subset of documents that match a set of criteria.
To limit the resulting documents included in the find()
method, you apply a query
object that limits the documents returned in the Cursor
. Listing 15.1 illustrates using query
objects in a find
statement.
Listing 15.1 performs a bunch of different queries against the word collection described earlier in this chapter. You should already recognize all the connection code as well as the code used in displayWords()
to iterate through the cursor and display only the word names in the documents.
In line 20, the following query is used to find words starting with a, b, or c:
{first:{$in: ['a', 'b', 'c']}}
In line 23, the following query is used to find words longer than 12 letters:
{size:{$gt: 12}}
In line 26, the following query is used to find words with an even number of letters:
{size:{$mod: [2,0]}}
In line 29, the following query is used to find words with exactly 12 letters:
{letters:{$size: 12}}
In lines 32 and 33, the following query is used to find words that both begin and end with a vowel:
{$and: [{first:{$in: ['a', 'e', 'i', 'o', 'u']}}, {last:{$in: ['a', 'e', 'i', 'o', 'u']}}]}
In line 37, the following query is used to find words that contain more than 6 vowels:
{"stats.vowels":{$gt:6}}
In line 40, the following query is used to find words that contain all of the vowels:
{letters:{$all: ['a','e','i','o','u']}}
In line 44, the following query is used to find words with non-alphabet characters:
{otherChars: {$exists:true}}
In line 47, a bit more challenging query is used. The $elemMatch
operator is used to match the charsets
subdocuments. The $and
operator forces the type
field to equal other
and the chars
array field to be exactly 2
:
{charsets:{$elemMatch:{$and:[{type:'other'},{chars:{$size:2}}]}}}
Listing 15.1 doc_query.js
: Finding a specific set of documents in a MongoDB collection
01 var MongoClient = require('mongodb').MongoClient; 02 MongoClient.connect("mongodb://localhost/", function(err, db) { 03 var myDB = db.db("words"); 04 myDB.collection("word_stats", findItems); 05 setTimeout(function(){ 06 db.close(); 07 }, 3000); 08 }); 09 function displayWords(msg, cursor, pretty){ 10 cursor.toArray(function(err, itemArr){ 11 console.log(" "+msg); 12 var wordList = []; 13 for(var i=0; i<itemArr.length; i++){ 14 wordList.push(itemArr[i].word); 15 } 16 console.log(JSON.stringify(wordList, null, pretty)); 17 }); 18 } 19 function findItems(err, words){ 20 words.find({first:{$in: ['a', 'b', 'c']}}, function(err, cursor){ 21 displayWords("Words starting with a, b or c: ", cursor); 22 }); 23 words.find({size:{$gt: 12}}, function(err, cursor){ 24 displayWords("Words longer than 12 characters: ", cursor); 25 }); 26 words.find({size:{$mod: [2,0]}}, function(err, cursor){ 27 displayWords("Words with even Lengths: ", cursor); 28 }); 29 words.find({letters:{$size: 12}}, function(err, cursor){ 30 displayWords("Words with 12 Distinct characters: ", cursor); 31 }); 32 words.find({$and: [{first:{$in: ['a', 'e', 'i', 'o', 'u']}}, 33 {last:{$in: ['a', 'e', 'i', 'o', 'u']}}]}, 34 function(err, cursor){ 35 displayWords("Words that start and end with a vowel: ", cursor); 36 }); 37 words.find({"stats.vowels":{$gt:6}}, function(err, cursor){ 38 displayWords("Words containing 7 or more vowels: ", cursor); 39 }); 40 words.find({letters:{$all: ['a','e','i','o','u']}}, 41 function(err, cursor){ 42 displayWords("Words with all 5 vowels: ", cursor); 43 }); 44 words.find({otherChars: {$exists:true}}, function(err, cursor){ 45 displayWords("Words with non-alphabet characters: ", cursor); 46 }); 47 words.find({charsets:{$elemMatch:{$and:[{type:'other'}, 48 {chars:{$size:2}}]}}}, 49 function(err, cursor){ 50 displayWords("Words with 2 non-alphabet characters: ", cursor); 51 }); 52 }
Listing 15.1 Output doc_query.js
: Finding a specific set of documents in a MongoDB collection
Words longer than 12 characters: ["international","administration","environmental","responsibility","investigation", "communication","understanding","significantly","representative"…] Words with 12 Distinct characters: ["uncomfortable","accomplishment","considerably"] Words with non-alphabet characters: ["don't","won't","can't","shouldn't","e-mail","long-term","so-called","mm-hmm", "t-shirt","and/or","health-care","full-time","o'clock","self-esteem"…] Words starting with a, b or c: ["and","a","at","as","all","about","also","any","after","ask","another","american", "against","again","always","area","around","away","among"…] Words containing 7 or more vowels: ["identification","questionnaire","organizational","rehabilitation"] Words that start and end with a vowel: ["a","also","area","ago","able","age","agree","available","anyone","article","argue", "arrive","above","audience","assume","alone","achieve","attitude"…] Words with all 5 vowels: ["education","educational","regulation","evaluation","reputation","communicate", "dialogue","questionnaire","simultaneously","equation","automobile"…] Words with 2 non-alphabet characters: ["two-third's","middle-class'"] Words with even Lengths: ["be","of","in","to","have","it","that","he","with","on","do","this","they","at","we", "from","by","or","as","what","go","if","my","make","know","will","up"…]
When accessing document sets in MongoDB, you may want to only get a count first before deciding to retrieve a set of documents. There are several reasons to count specific document sets. Performing a count is much less intensive on the MongoDB side because when retrieving documents using the find()
and other methods, temporary objects such as cursors must be created and maintained by the server.
Also, when performing operations on the resulting set of documents from a find()
, you should be aware of how many documents you are going to be dealing with, especially in larger environments. Sometimes all you want is a count. For example, if you need to know how many users are configured in your application, you could just count the number of documents in the users
collection.
The count()
method on the Collection
object allows you to get a simple count of documents that match the query
object criteria. The count()
method is formatted exactly the same way as the find()
method, as shown below, and performs the query and options parameters in exactly the same manner.
count([query], [options], callback)
If no query
is specified, the count()
returns a count of all the documents in the database. The callback
function should accept an error
as the first argument and the count
as an integer as the second.
Listing 15.2 illustrates using the count()
method with the same queries performed with find()
in Listing 15.1. The output in Listing 15.2 Output shows that instead of a Cursor
a simple integer is returned and displayed.
Listing 15.2 doc_count.js
: Counting specific sets of documents in a MongoDB collection
01 var MongoClient = require('mongodb').MongoClient; 02 MongoClient.connect("mongodb://localhost/", function(err, db) { 03 var myDB = db.db("words"); 04 myDB.collection("word_stats", countItems); 05 setTimeout(function(){ 06 db.close(); 07 }, 3000); 08 }); 09 function countItems(err, words){ 10 words.count({first:{$in: ['a', 'b', 'c']}}, function(err, count){ 11 console.log("Words starting with a, b or c: " + count); 12 }); 13 words.count({size:{$gt: 12}}, function(err, count){ 14 console.log("Words longer than 12 characters: " + count); 15 }); 16 words.count({size:{$mod: [2,0]}}, function(err, count){ 17 console.log("Words with even Lengths: " + count); 18 }); 19 words.count({letters:{$size: 12}}, function(err, count){ 20 console.log("Words with 12 Distinct characters: " + count); 21 }); 22 words.count({$and: [{first:{$in: ['a', 'e', 'i', 'o', 'u']}}, 23 {last:{$in: ['a', 'e', 'i', 'o', 'u']}}]}, 24 function(err, count){ 25 console.log("Words that start and end with a vowel: " + count); 26 }); 27 words.count({"stats.vowels":{$gt:6}}, function(err, count){ 28 console.log("Words containing 7 or more vowels: " + count); 29 }); 30 words.count({letters:{$all: ['a','e','i','o','u']}}, 31 function(err, count){ 32 console.log("Words with all 5 vowels: " + count); 33 }); 34 words.count({otherChars: {$exists:true}}, function(err, count){ 35 console.log("Words with non-alphabet characters: " + count); 36 }); 37 words.count({charsets:{$elemMatch:{$and:[{type:'other'}, 38 {chars:{$size:2}}]}}}, 39 function(err, count){ 40 console.log("Words with 2 non-alphabet characters: " + count); 41 }); 42 }
Listing 15.2 Output doc_count.js
: Counting specific sets of documents in a MongoDB collection
Words starting with a, b or c: 964 Words longer than 12 characters: 64 Words that start and end with a vowel: 227 Words with even Lengths: 2233 Words with 12 Distinct characters: 3 Words containing 7 or more vowels: 4 Words with non-alphabet characters: 24 Words with all 5 vowels: 16 Words with 2 non-alphabet characters: 2
When finding complex documents on larger systems, you often want to limit what is being returned to reduce the impact on the network, including the memory on both the server and client. You can limit the result sets that match a specific query in three ways: You can simply only accept a limited number of documents, you can limit the fields returned, or you can page the results and get them in chunks.
The simplest method to limit the amount of data returned in a find()
or other query request is to use the limit
option in the options
parameter when performing the request. The limit
parameter, shown below, allows only a fixed number of items to be returned with the Cursor
object. This can save you from accidentally retrieving more objects than your application can handle:
limit:<maximum_documents_to_return>
Listing 15.3 illustrates limiting the results of a find()
request by using the limit:5
option in the options
object. The output in Listing 15.3 Output shows that when limit
is used only five words are retrieved.
Listing 15.3 doc_limit.js
: Limiting the specific set of documents in a MongoDB collection
01 var MongoClient = require('mongodb').MongoClient; 02 MongoClient.connect("mongodb://localhost/", function(err, db) { 03 var myDB = db.db("words"); 04 myDB.collection("word_stats", limitFind); 05 setTimeout(function(){ 06 db.close(); 07 }, 3000); 08 }); 09 function displayWords(msg, cursor, pretty){ 10 cursor.toArray(function(err, itemArr){ 11 console.log(" "+msg); 12 var wordList = []; 13 for(var i=0; i<itemArr.length; i++){ 14 wordList.push(itemArr[i].word); 15 } 16 console.log(JSON.stringify(wordList, null, pretty)); 17 }); 18 } 19 function limitFind(err, words){ 20 words.count({first:'p'}, function(err, count){ 21 console.log("Count of words starting with p : " + count); 22 }); 23 words.find({first:'p'}, function(err, cursor){ 24 displayWords("Words starting with p : ", cursor); 25 }); 26 words.find({first:'p'}, {limit:5}, function(err, cursor){ 27 displayWords("Limiting words starting with p : ", cursor); 28 }); 29 }
Listing 15.3 Output doc_limit.js
: Limiting the specific set of documents in a MongoDB collection
Count of words starting with p : 353 Limiting words starting with p : ["people","put","problem","part","place"] Words starting with p : ["people","put","problem","part","place","program","play","point","provide","power", "political","pay"…]
Another effective method of limiting the resulting data when retrieving documents is to limit which fields are returned. Documents may have many different fields that are useful in some circumstance but not in others. You should consider which fields should be included when retrieving documents from the MongoDB server and only request the ones necessary.
To limit the fields returned from the server, use the fields
option of the options
object. The fields
option enables you to either include or exclude fields by setting the value of the document field to 0
for exclude or 1
for include. You cannot mix includes and excludes in the same expression.
For example, to exclude the fields stats
, value
, and comments
when returning a document, you would use the following fields
option:
{fields:{stats:0, value:0, comments:0}}
Often it is easier to just include a few fields; for example, if you want to include only the name
and value
fields of documents, you would use
{fields:{name:1, value:1}}
Listing 15.4 illustrates using the fields
option to reduce the amount of data returned from the server by excluding fields or specifying fields to include.
Listing 15.4 doc_fields.js
: Limiting the fields returned with a set of documents
01 var MongoClient = require('mongodb').MongoClient; 02 MongoClient.connect("mongodb://localhost/", function(err, db) { 03 var myDB = db.db("words"); 04 myDB.collection("word_stats", limitFields); 05 setTimeout(function(){ 06 db.close(); 07 }, 3000); 08 }); 09 function limitFields(err, words){ 10 words.findOne({word:'the'}, {fields:{charsets:0}}, 11 function(err, item){ 12 console.log("Excluding fields object: "); 13 console.log(JSON.stringify(item, null, 2)); 14 }); 15 words.findOne({word:'the'}, {fields:{word:1,size:1,stats:1}}, 16 function(err, item){ 17 console.log("Including fields object: "); 18 console.log(JSON.stringify(item, null, 2)); 19 }); 20 }
Listing 15.4 Output doc_fields.js
: Limiting the fields returned with a set of documents
Excluding fields object: { "_id": "58f04c8c6ec5050becd012c5", "word": "the", "first": "t", "last": "e", "size": 3, "letters": [ "t", "h", "e" ], "stats": { "vowels": 1, "consonants": 2 } } Including fields object: { "_id": "58f04c8c6ec5050becd012c5", "word": "the", "size": 3, "stats": { "vowels": 1, "consonants": 2 } }
Paging is a common method of reducing the number of documents returned. Paging involves specifying a number of documents to skip in the matching set as well as a limit on the documents returned. Then the skip value is incremented each time by the number returned the previous time.
To implement paging on a set of documents, you need to implement the limit
and skip
options on the options
object. The skip
option specifies a number of documents to skip before returning documents. By moving the skip
value each time, you get another set of documents you can effectively page through in the data set. Also, always include a sort
option when paging data to ensure the order is always the same. For example, the following statements find documents 1–10, then 11–20, and 21–30:
collection.find({},{sort:[['_id':1]], skip:0, limit:10}, function(err, cursor){}); collection.find({},{sort:[['_id':1]], skip:10, limit:10}, function(err, cursor){}); collection.find({},{sort:[['_id':1]], skip:20, limit:10}, function(err, cursor){});
Listing 15.5 illustrates using limit
and skip
to page through a specific set of documents. Each time a new find()
request is implemented, which more closely mimics what would happen when handling paging requests from a webpage. Listing 15.5 Output shows the output of Listing 15.5. Notice that words are retrieved 10 at a time.
Warning
If the data on the system changes in such a way that it affects the results of the query, the skip may miss some items or include items again in a subsequent paged request.
Listing 15.5 doc_paging.js
: Paging results from a specific set of documents in a MongoDB collection
01 var util = require('util'); 02 var MongoClient = require('mongodb').MongoClient; 03 MongoClient.connect("mongodb://localhost/", function(err, db) { 04 var myDB = db.db("words"); 05 myDB.collection("word_stats", function(err, collection){ 06 pagedResults(err, collection, 0, 10); 07 }); 08 }); 09 function displayWords(msg, cursor, pretty){ 10 cursor.toArray(function(err, itemArr){ 11 console.log(" "+msg); 12 var wordList = []; 13 for(var i=0; i<itemArr.length; i++){ 14 wordList.push(itemArr[i].word); 15 } 16 console.log(JSON.stringify(wordList, null, pretty)); 17 }); 18 } 19 function pagedResults(err, words, startIndex, pageSize){ 20 words.find({first:'v'}, 21 {limit:pageSize, skip:startIndex, sort:[['word',1]]}, 22 function(err, cursor){ 23 cursor.count(true, function(err, cursorCount){ 24 displayWords("Page Starting at " + startIndex, cursor); 25 if (cursorCount === pageSize){ 26 pagedResults(err, words, startIndex+pageSize, pageSize); 27 } else { 28 cursor.db.close(); 29 } 30 }); 31 }); 32 }
Listing 15.5 Output doc_paging.js
: Paging results from a specific set of documents in a MongoDB collection
Page Starting at 0 ["vacation","vaccine","vacuum","valid","validity","valley","valuable","value", "van","vanish"] Page Starting at 10 ["variable","variation","variety","various","vary","vast","vegetable","vehicle", "vendor","venture"] Page Starting at 20 ["verbal","verdict","version","versus","vertical","very","vessel","veteran","via", "victim"] Page Starting at 30 ["victory","video","view","viewer","village","violate","violation","violence", "violent","virtual"] Page Starting at 40 ["virtually","virtue","virus","visible","vision","visit","visitor","visual", "vital","vitamin"] Page Starting at 50 ["vocal","voice","volume","voluntary","volunteer","vote","voter","voting","vs", "vulnerable"] Page Starting at 60 []
An important aspect of retrieving documents from a MongoDB database is the ability to get it in a sorted format. This is especially helpful if you are only retrieving a certain number, such as the top 10, or if you are paging the requests. The options
object provides the sort
option that allows you to specify the sort order and direction of one or more fields in the document.
The sort
option is specified using an array of [field,<sort_order>]
pairs, where sort_order
is 1
for ascending and -1
for descending. For example, to sort on the name
field descending first and then the value
field ascending you would use the following:
sort:[['value':-1]['name':1]]
Listing 15.6 illustrates using the sort
option to find and sort lists of words in different ways. Notice that in line 29 the words are sorted by size first and then by last letter, whereas in line 33 they are sorted by last letter first and then by size. The two different sort orders result in different lists of words returned.
Listing 15.6 doc_sort.js
: Sorting results of a find()
request for a set of documents in a MongoDB collection
01 var MongoClient = require('mongodb').MongoClient; 02 MongoClient.connect("mongodb://localhost/", function(err, db) { 03 var myDB = db.db("words"); 04 myDB.collection("word_stats", sortItems); 05 setTimeout(function(){ 06 db.close(); 07 }, 3000); 08 }); 09 function displayWords(msg, cursor, pretty){ 10 cursor.toArray(function(err, itemArr){ 11 console.log(" "+msg); 12 var wordList = []; 13 for(var i=0; i<itemArr.length; i++){ 14 wordList.push(itemArr[i].word); 15 } 16 console.log(JSON.stringify(wordList, null, pretty)); 17 }); 18 } 19 function sortItems(err, words){ 20 words.find({last:'w'}, function(err, cursor){ 21 displayWords("Words ending in w: ", cursor); 22 }); 23 words.find({last:'w'}, {sort:{word:1}}, function(err, cursor){ 24 displayWords("Words ending in w sorted ascending: ", cursor); 25 }); 26 words.find({last:'w'}, {sort:{word:-1}}, function(err, cursor){ 27 displayWords("Words ending in w sorted, descending: ", cursor); 28 }); 29 words.find({first:'b'}, {sort:[['size',-1],['last',1]]}, 30 function(err, cursor){ 31 displayWords("B words sorted by size then by last letter: ", cursor); 32 }); 33 words.find({first:'b'}, {sort:[['last',1],['size',-1]]}, 34 function(err, cursor){ 35 displayWords("B words sorted by last letter then by size: ", cursor); 36 }); 37 }
Listing 15.6 Output doc_sort.js
: Sorting results of a find()
request for a set of documents in a MongoDB collection
Words ending in w: ["know","now","how","new","show","few","law","follow","allow","grow","low","view", "draw","window","throw","interview","tomorrow"… Words ending in w sorted ascending: ["allow","arrow","below","blow","borrow","bow","chew","cow","crew","draw","elbow", "eyebrow","fellow","few","flow"… Words ending in w sorted, descending: ["yellow","wow","withdraw","window","widow","view","tomorrow","throw","swallow", "straw","somehow","snow"… B words sorted by size then by last letter: ["businessman","background","basketball","biological","behavioral","boyfriend", "beginning"… B words sorted by last letter then by size: ["bacteria","banana","bomb","bulb","basic","background","boyfriend","backyard", "balanced","behind","beyond"…
A useful query against a MongoDB collection is to get a list of the distinct values for a single field in a set of documents. Distinct means that even though there are thousands of documents, you only want to know the unique values that exist.
The distinct()
method on Collection
objects allows you to find a list of distinct values for a specific field. The syntax for the distinct()
method is shown below:
distinct(key,[query],[options],callback)
The key
parameter is the string value of the field name you want to get values for. You can specify subdocuments using the dot syntax such as stats.count
. The query
parameter is an object with standard query options defined in Table 15.1. The options
parameter is an options
object that allows you to define the readPreference
option defined in Table 15.2. The callback
function should accept an error
as the first parameter, and the results
parameter, which is an array of distinct values for the field specified in the key
parameter, as the second.
Listing 15.7 illustrates finding the distinct values in the words collection. Notice that a query in line 14 limits the words to those starting with u
. In line 18, to access the stats.vowels
field, the dot syntax is used.
Listing 15.7 doc_distinct.js
: Finding distinct field values in a specific set of documents in a MongoDB collection
01 var MongoClient = require('mongodb').MongoClient; 02 MongoClient.connect("mongodb://localhost/", function(err, db) { 03 var myDB = db.db("words"); 04 myDB.collection("word_stats", distinctValues); 05 setTimeout(function(){ 06 db.close(); 07 }, 3000); 08 }); 09 function distinctValues(err, words){ 10 words.distinct('size', function(err, values){ 11 console.log(" Sizes of words: "); 12 console.log(values); 13 }); 14 words.distinct('first', {last:'u'}, function(err, values){ 15 console.log(" First letters of words ending in u: "); 16 console.log(values); 17 }); 18 words.distinct('stats.vowels', function(err, values){ 19 console.log(" Numbers of vowels contained in words: "); 20 console.log(values); 21 }); 22 }
Listing 15.7 Output doc_distinct.js
: Finding distinct field values in a specific set of documents in a MongoDB collection
Sizes of words: [ 3, 2, 1, 4, 5, 9, 6, 7, 8, 10, 11, 12, 13, 14 ] First letters of words ending in u: [ 'y', 'm', 'b' ] Numbers of vowels contained in words: [ 1, 2, 0, 3, 4, 5, 6, 7 ]
When performing operations on large data sets, it is often useful to group the results based on the distinct values of one or more fields in a document. This could be done in code after retrieving the documents; however, it is much more efficient to have MongoDB do it for you as part of a single request that is already iterating through the documents.
To group the results of a query together, you can use the group()
method on the Collection
object. The group request first collects all the documents that match a query
, and then adds a group object to an array based on distinct values of a set of keys
, performs operations on the group objects, and returns the array of group objects. The syntax for the group()
method is shown below:
group(keys, query, initial, reduce, finalize, command, [options], callback)
The parameters of the group()
method are described in the following list:
keys
: This can be an object, array, or function expressing the keys to group by. The simplest method is to specify the key(s) in an object such as {field1:true, field2:true}
or an array such as
['first', 'last'].
query
: Query object that defines which documents to be included in the initial set. See Table 15.1 for a list of query options.
initial
: Specifies an initial group object to use when aggregating data while grouping. An initial object is created for each distinct set of keys
. The most common use is a counter that tracks a count of items that match the keys. For example:{"count":0}
reduce
: This is a function(obj, prev)
with two parameters, obj
and prev
. This function is executed on each document that matches the query. The obj
parameter is the current document, and the prev
parameter is the object created by the initial
parameter. You can then use the obj
object to update the prev
object with new values such as counts or sums. For example, to increment the count value you would use:function(obj, prev) { prev.count++; }
finalize
: This is a function(obj)
that accepts one parameter, obj
, which is the final object resulting from the initial
parameter and updated as prev
in the reduce
function. This function is called on the resulting object for each distinct key
before returning the array in the response.
command
: A Boolean that, when true
, specifies the command will run using the internal group command instead of eval()
. The default is true
.
options
: This object allows you to define the readPreference
option.
callback
: This accepts an error
as the first parameter and an array of the results
objects as the second.
Listing 15.8 illustrates implementing grouping of words based on various key sets. Lines 10–18 implement a basic grouping of words by first and last letter. The query in line 11 limits the words to those beginning with o and ending with a vowel. The initial object for each has a count property only, which is updated for each matching document in the function on line 13.
Lines 19–28 add the concept of summing the total vowels in the documents while grouping them together by incrementing the prev.totalVowels
with the obj.stats.vowels
value in line 23. Then lines 29–40 show the use of a finalize function that adds a new obj.total
property to the group objects that is a sum of the obj.vowels
and obj.consonants
properties of the object.
Listing 15.8 doc_group.js
: Grouping a set of documents by specific fields in a MongoDB collection
01 var MongoClient = require('mongodb').MongoClient; 02 MongoClient.connect("mongodb://localhost/", function(err, db) { 03 var myDB = db.db("words"); 04 myDB.collection("word_stats", groupItems); 05 setTimeout(function(){ 06 db.close(); 07 }, 3000); 08 }); 09 function groupItems(err, words){ 10 words.group(['first','last'], 11 {first:'o',last:{$in:['a','e','i','o','u']}}, 12 {"count":0}, 13 function (obj, prev) { prev.count++; }, true, 14 function(err, results){ 15 console.log(" 'O' words grouped by first and last" + 16 " letter that end with a vowel: "); 17 console.log(results); 18 }); 19 words.group(['first'], 20 {size:{$gt:13}}, 21 {"count":0, "totalVowels":0}, 22 function (obj, prev) { 23 prev.count++; prev.totalVowels += obj.stats.vowels; 24 }, {}, true, 25 function(err, results){ 26 console.log(" Words grouped by first letter larger than 13: "); 27 console.log(results); 28 }); 29 words.group(['first'],{}, {"count":0, "vowels":0, "consonants":0}, 30 function (obj, prev) { 31 prev.count++; 32 prev.vowels += obj.stats.vowels; 33 prev.consonants += obj.stats.consonants; 34 },function(obj){ 35 obj.total = obj.vowels + obj.consonants; 36 }, true, 37 function(err, results){ 38 console.log(" Words grouped by first letter with totals: "); 39 console.log(results); 40 }); 41 }
Listing 15.8 Output doc_group.js
: Grouping a set of documents by specific fields in a MongoDB collection
'O' words grouped by first and last letter that end with a vowel: [ { first: 'o', last: 'e', count: 21 }, { first: 'o', last: 'o', count: 1 }, { first: 'o', last: 'a', count: 1 } ] Words grouped by first letter larger than 13: [ { first: 'a', count: 4, totalVowels: 22 }, { first: 'r', count: 5, totalVowels: 30 }, { first: 'c', count: 2, totalVowels: 11 }, { first: 't', count: 2, totalVowels: 10 }, { first: 'i', count: 4, totalVowels: 24 }, { first: 'd', count: 2, totalVowels: 11 }, { first: 's', count: 1, totalVowels: 6 }, { first: 'o', count: 1, totalVowels: 7 } ] Words grouped by first letter with totals: [ { first: 't', count: 250, vowels: 545, consonants: 1017, total: 1562 }, { first: 'b', count: 218, vowels: 417, consonants: 769, total: 1186 }, { first: 'a', count: 295, vowels: 913, consonants: 1194, total: 2107 }, { first: 'o', count: 118, vowels: 356, consonants: 435, total: 791 }, { first: 'i', count: 189, vowels: 655, consonants: 902, total: 1557 }, { first: 'h', count: 139, vowels: 289, consonants: 511, total: 800 }, { first: 'f', count: 203, vowels: 439, consonants: 774, total: 1213 }, { first: 'y', count: 16, vowels: 31, consonants: 50, total: 81 }, { first: 'w', count: 132, vowels: 255, consonants: 480, total: 735 }, { first: 'd', count: 257, vowels: 675, consonants: 1102, total: 1777 }, { first: 'c', count: 451, vowels: 1237, consonants: 2108, total: 3345 }, { first: 's', count: 509, vowels: 1109, consonants: 2129, total: 3238 }, { first: 'n', count: 82, vowels: 205, consonants: 314, total: 519 }, { first: 'g', count: 112, vowels: 236, consonants: 414, total: 650 }, { first: 'm', count: 200, vowels: 488, consonants: 778, total: 1266 }, { first: 'k', count: 21, vowels: 33, consonants: 70, total: 103 }, { first: 'u', count: 58, vowels: 173, consonants: 233, total: 406 }, { first: 'p', count: 353, vowels: 902, consonants: 1575, total: 2477 }, { first: 'j', count: 33, vowels: 72, consonants: 114, total: 186 }, { first: 'l', count: 142, vowels: 307, consonants: 503, total: 810 }, { first: 'v', count: 60, vowels: 163, consonants: 218, total: 381 }, { first: 'e', count: 239, vowels: 788, consonants: 1009, total: 1797 }, { first: 'r', count: 254, vowels: 716, consonants: 1011, total: 1727 }, { first: 'q', count: 16, vowels: 50, consonants: 59, total: 109 }, { first: 'z', count: 1, vowels: 2, consonants: 2, total: 4 } ]
One benefit of MongoDB is the ability to MapReduce the results of database queries into a completely different structure than the original collections. MapReduce is the process of mapping the values on a DB lookup into a completely different form and then reducing it to make it more consumable.
MongoDB has a MapReduce framework but has also added a framework that simplifies the process of piping one MapReduce operation into another in a series to produce some extraordinary results with the data. Aggregation is the concept of applying a series of operations to documents on the MongoDB server as they are being compiled into a result set. This is much more efficient than retrieving them and processing them in your Node.js application because the MongoDB server can operate on chunks of data locally.
aggregate()
MethodThe Collection
object provides the aggregate()
method to perform aggregation operations on data. The syntax for the aggregate()
method is shown below:
aggregate(operators, [options], callback)
The operators
parameter is an array of aggregation operators, shown in Table 15.3, that allow you to define what aggregation operation to perform on the data. The options
parameter is an object that allows you to set the readPreference
property that defines where to read the data from. The callback
parameter should be a function that accepts an error
as the first parameter and a results
array as the second. The results
array is the fully aggregated object set returned by the aggregation.
The aggregation framework provided by MongoDB is powerful in that it allows you to pipe the results of one aggregation operator into another multiple times. To illustrate this, look at the following data set:
{o_id:"A", value
:50, type:"X"}
{o_id:"A", value:75, type:"X"}
{o_id:"B", value:80, type:"X"}
{o_id:"C", value:45, type:"Y"}
The following aggregation operator set would pipeline the results of the $match
into the $group
operator and then return the grouped set in the results parameter of the callback function. Notice that when referencing the values of fields in documents that the field name is prefixed by a dollar sign, for example $o_id
and $value
. This syntax tells the aggregate framework to treat it as a field value instead of a string.
aggregate([{$match:{type:"X"}}, {$group:{set_id:"$o_id", total: {$sum: "$value"}}}, function(err, results){}]);
After the $match
operator completes, the documents that would be applied to $group
would be:
{o_id:"A", value:50, type:"X"} {o_id:"A", value:75, type:"X"} {o_id:"B", value:80, type:"X"}
Then after the $group
operator is applied, a new array of objects is sent to the callback
function with set_id
and total
fields as shown below:
{set_id:"A", total:"125"} {set_id:"B", total:"80"}
Table 15.3 defines the types of aggregation commands that you can include in the operators
parameter to the aggregate()
method:
Table 15.3 Aggregation operators that can be used in the aggregate()
method
Operator |
Description |
|
Reshapes the documents by renaming, adding, or removing fields. You can also recompute values and add subdocuments. For example, the following includes {$project{title:1, name:0}} The following is an example of renaming {$project{title:"$name"}} The following is an example of adding a new field total and computing its value from {$project{total:{$add:["$price", "$tax"]}}} |
|
Filters the document set using the query operators defined in Table 15.1. For example: {$match:{value:{$gt:50}}} |
|
Restricts the number of documents that can be passed to the next pipe in the aggregation. For example: {$limit:5} |
|
Specifies a number of documents to skip before processing the next pipe in the aggregation. For example: {$skip:10} |
|
Specifies an array field that splits, with a separate document created for each value. For example: {$unwind:"$myArr"} |
|
Groups the documents together into a new set of documents for the next level in the pipe. The fields of the new object must be defined in the {$group:{set_id:"$o_id", total: {$sum: "$value"}}} |
|
Sorts the documents before passing them on to the next pipe in the aggregation. The sort specifies an object with {$sort: {name:1, age:-1}} |
|
Returns a collection’s or view’s statistics. |
|
Restricts each document in the stream based on stored values in the documents. Can implement field level redaction. Every one input document outputs one or zero documents. |
Selects a specific number of random documents from its input. |
|
|
Returns an ordered stream of documents based on proximity to a geospatial point. An additional distance field and a location identifier field can be included in the output documents. |
|
Used to perform an equality match between a field in the input documents with field documents in a joined collection. |
|
Writes the resulting documents of the aggregation pipeline to a collection. The |
|
Returns statistics of the use of each index for the collection. |
|
Processes multiple aggregation pipelines in a single stage on the same input documents. Allows the creation of multifaceted aggregations that can characterize data across multiple dimensions or facets in a single stage. |
|
Categorizes incoming documents into groups called buckets, based on certain expressions and bucket boundaries. |
|
Categorizes incoming documents into a specific number of groups called buckets, based on certain expressions. The bucket boundaries are automatically determined to try and evenly distribute documents into the specified number of buckets. |
|
Sorts incoming documents by the value of a specified expression and then determines the number of documents in each specific group. |
|
Adds new fields to documents. |
|
Replaces a document with a specified embedded document, replacing all fields in the input document. |
|
Returns a count of all documents at this point of the aggregation pipeline. |
|
Searches a collection and adds a new |
When you implement the aggregation operators, you are building a new document that will be passed to the next level in the aggregation pipeline. The MongoDB aggregation framework provides a number of expression operators that help when computing values for new fields or for comparison of existing fields in the documents.
When operating on a $group
aggregation pipe, multiple documents match the defined fields in the new documents created. MongoDB provides a set of operators that you can apply to those documents and use to compute values for fields in the new group document based on values of fields in the original set of documents. Table 15.4 lists the $group
expression operators.
Table 15.4 Aggregation $group
expression operators
Operator |
Description |
|
Returns an array of all the unique values for the selected field among all the documents in that group. For example: colors: {$addToSet: "$color"} |
|
Returns the first value for a field in a group of documents. For example: firstValue:{$first: "$value"} |
|
Returns the last value for a field in a group of documents. For example: lastValue:{$last: "$value"} |
|
Returns the highest value for a field in a group of documents. For example: maxValue:{$max: "$value"} |
|
Returns the lowest value for a field in a group of documents. For example: minValue:{$min: "$value"} |
|
Returns an average of all the values for a field in a group of documents. For example: avgValue:{$avg: "$value"} |
|
Returns an array of all values for the selected field among all the documents in that group of documents. For example: username:{$push: "$username"} |
|
Returns the sum of all the values for a field in a group of documents. For example: total:{$sum: "$value"} |
Several string and arithmetic operators also can be applied when computing new field values. Table 15.5 lists some of the more common operators that you can apply when computing new field values in the aggregation operators:
Table 15.5 String and arithmetic operators used in aggregation expressions
Operator |
Description |
|
Computes the sum of an array of numbers. For example: valuePlus5:{$add:["$value", 5]} |
|
Takes two numbers and divides the first number by the second. For example: valueDividedBy5:{$divide:["$value", 5]} |
Takes two numbers and calculates the modulo of the first number divided by the second. For example: valueMod5:{$mod:["$value", 5]} |
|
|
Computes the product of an array of numbers. For example: valueTimes5:{$multiply:["$value", 5]} |
|
Takes two numbers and subtracts the second number from the first. For example: valueMinus5:{$minus:["$value", 5]} |
|
Concatenates two strings. For example: title:{$concat:["$title", " ", "$name"]} |
|
Compares two strings and returns an integer that reflects the comparison. For example: isTest:{$ |
|
Takes a string and returns a portion of that string. For example: hasTest:{$ |
|
Converts a string to lowercase. For example: titleLower:{$ |
|
Converts a string to uppercase. For example: titleUpper:{$ |
Listing 15.9 illustrates three examples of implementing aggregation against the words collection.
The first example, in lines10–20, implements a $match
to get words beginning in vowels and then a $group
to calculate the largest and smallest sizes. The results are then sorted using $sort
and displayed in Listing 15.9 Output.
The second example, in lines 21–27, uses $match
to limit the words to size 4. Then $limit
is used to only process five documents in the $project
operator. The third example, in lines 28–34, uses a $group
to get the average
size of the words and set the _id
value to the word. Then the words are sorted in descending order by average
and displayed in Listing 15.9 Output.
Listing 15.9 doc_aggregate.js
: Grouping a set of documents by specific fields in a MongoDB collection
01 var MongoClient = require('mongodb').MongoClient; 02 MongoClient.connect("mongodb://localhost/", function(err, db) { 03 var myDB = db.db("words"); 04 myDB.collection("word_stats", aggregateItems); 05 setTimeout(function(){ 06 db.close(); 07 }, 3000); 08 }); 09 function aggregateItems(err, words){ 10 words.aggregate([{$match: {first:{$in:['a','e','i','o','u']}}}, 11 {$group: {_id:"$first", 12 largest:{$max:"$size"}, 13 smallest:{$min:"$size"}, 14 total:{$sum:1}}}, 15 {$sort: {_id:1}}], 16 function(err, results){ 17 console.log("Largest and smallest word sizes for " + 18 "words beginning with a vowel: "); 19 console.log(results); 20 }); 21 words.aggregate([{$match: {size:4}}, 22 {$limit: 5}, 23 {$project: {_id:"$word", stats:1}}], 24 function(err, results){ 25 console.log("Stats for 5 four letter words: "); 26 console.log(results); 27 }); 28 words.aggregate([{$group: {_id:"$first", average:{$avg:"$size"}}}, 29 {$sort: {average:-1}}, 30 {$limit: 5}], 31 function(err, results){ 32 console.log("Letters with largest average word size: "); 33 console.log(results); 34 }); 35 }
Listing 15.9 Output doc_aggregate.js
: Grouping a set of documents by specific fields in a MongoDB collection
Stats for 5 four letter words: [ { stats: { vowels: 2, consonants: 2 }, _id: 'have' }, { stats: { vowels: 1, consonants: 3 }, _id: 'that' }, { stats: { vowels: 1, consonants: 3 }, _id: 'with' }, { stats: { vowels: 1, consonants: 3 }, _id: 'this' }, { stats: { vowels: 1, consonants: 3 }, _id: 'they' } ] Largest and smallest word sizes for words beginning with a vowel: [ { _id: 'a', largest: 14, smallest: 1, total: 295 }, { _id: 'e', largest: 13, smallest: 3, total: 239 }, { _id: 'i', largest: 14, smallest: 1, total: 189 }, { _id: 'o', largest: 14, smallest: 2, total: 118 }, { _id: 'u', largest: 13, smallest: 2, total: 58 } ] Letters with largest average word size: [ { _id: 'i', average: 8.238095238095237 }, { _id: 'e', average: 7.523012552301255 }, { _id: 'c', average: 7.419068736141907 }, { _id: 'a', average: 7.145762711864407 }, { _id: 'p', average: 7.01699716713881 } ]
In this chapter you looked at the query
and options
objects used by Collection
methods to access documents in the database. The query
object allows you to limit which documents are considered for operations. The options
object allows you to control the interaction of the requests to limit the number of documents returned, which document to start on, and what fields should be returned.
The distinct()
, group()
, and aggregate()
methods allow you to group documents based on field values. The MongoDB aggregation framework is a powerful feature that allows you to process documents on the server before returning them to the client. The aggregation framework allows you to pipe documents from one aggregation operation to the next, each time mapping and reducing to a more defined set of data.
18.222.196.175