15
Accessing MongoDB from Node.js

In Chapter 14, “Manipulating MongoDB Documents from Node.js,” you learned how to create and manipulate documents and got a brief glimpse into finding them using the find() method. This chapter takes a deeper look at accessing documents in a MongoDB collection using the MongoDB Node.js driver module.

There is much more to accessing documents than just returning everything in a collection. This chapter covers using the query object to limit which documents are returned as well as methods to limit the fields and number of documents in the query results. You also see how to count the number of documents that match query criteria without actually retrieving them from the server. This chapter also covers some advanced aggregation techniques to group the results and even generate a new fully aggregated set of documents.

Introducing the Data Set

For the purpose of introducing you to the various methods of accessing data, all the data used for examples in this chapter comes from the same data set. The data set is a collection that contains various information about 5,000 words. This provides a large enough data set to implement the necessary examples.

The structure of objects in this data set is as follows and should be fairly intuitive (which is why it was selected). This document structure gives us fields that are strings, integers, arrays, subdocuments, and arrays of subdocuments:

{
  word: <word>,
  first: <first_letter>,
  last: <last_letter>,
  size: <character_count>,
  letters: [<array_of_characters_in_word_no_repeats>],
  stats: {
    vowels:<vowel_count>, consonants:<consonant_count>},
  charsets: [
    {
      "type": <consonants_vowels_other>,
      "chars": [<array_of_characters_of_type_in_word>]},
    . . .
  ],
}

Understanding Query Objects

Throughout this chapter the various methods of data all use a query object of some sort to define which documents to retrieve from a MongoDB collection. The query object is just a standard JavaScript object with special property names that are understood by the MongoDB Node.js driver. These property names match closely the native queries that you can perform inside the MongoDB client, which makes it nice to be able to transfer back and forth.

The properties of the query object are called operators because they operate on the data to determine whether a document should be included in the result set. These operators are used to match the values of fields in the document against specific criteria.

For example, to find all documents with a count value greater than 10 and name value equal to test, the query object would be

{count:{$gt:10}, name:'test'}

The operator $gt specifies documents with a count field larger than 10. The standard colon syntax of name:'test' is also an operator that specifies that the name field must equal test. Notice that the query object has multiple operators. You can include several different operators in the same query.

When specifying field names in a query object, you can use dot notation to specify subdocument fields. For example, consider the following object format:

{
  name:"test",
  stats: { height:74, eyes:'blue' }
}

You can query users that have blue eyes using the following query object:

{stats.eyes:'blue'}

Table 15.1 lists the more commonly used operators.

Table 15.1 query object operators that define the result set returned by MongoDB requests

Operator

Description

$eq

Matches documents with fields that have a value equal to the value specified.

$gt

Matches values that are greater than the value specified in the query. For example: {size:{$gt:5}}

$gte

Matches values that are equal to or greater than the value specified in the query. For example: {size:{$gte:5}}

$in

Matches any of the values that exist in an array specified in the query. For example: {name:{$in:['item1', 'item2']}}

$lt

Matches values that are less than the value specified in the query. For example: {size:{$lt:5}}

$lte

Matches values that are less than or equal to the value specified in the query. For example: {size:{$lte:5}}

$ne

Matches all values that are not equal to the value specified in the query. For example: {name:{$ne:"badName"}}

$nin

Matches values that do not exist in an array specified to the query. For example: {name:{$nin:['item1', 'item2']}}

$or

Joins query clauses with a logical OR; returns all documents that match the conditions of either clause. For example:

{$or:[{size:{$lt:5}}, {size:{$gt:10}}]}

$and

Joins query clauses with a logical AND; returns all documents that match the conditions of both clauses. For example:

{$and:[{size:{$gt:5}}, {size:{$lt:10}}]}

$not

Inverts the effect of a query expression and returns documents that do not match the query expression. For example: {$not:{size:{$lt:5}}}

$nor

Joins query clauses with a logical NOR; returns all documents that fail to match both clauses. For example: {$nor:{size:{$lt:5}},{name:"myName"}}

$exists

Matches documents that have the specified field. For example: {specialField:{$exists:true}}

$type

Selects documents if a field is of the specified BSON type number. Table 11.1 lists the different BSON type numbers. For example:

{specialField: {$type:<BSONtype>}}

$mod

Performs a modulo operation on the value of a field and selects documents with a specified result. The value for the modulo operation is specified as an array with the first number being the number to divide by and the second being the remainder. For example: {number:{$mod:[2,0]}}

$regex

Selects documents where values match a specified regular expression. For example: {myString:{$regex:'some.*exp'}}

$all

Matches arrays that contain all elements specified in the query. For example: {myArr:{$all:['one','two','three']}}

$elemMatch    

Selects documents if an element in the array of subdocuments has fields that match all the specified $elemMatch conditions. For example: {myArr:{$elemMatch:{value:{$gt:5},size:{$lt:3}}}}

$size

Selects documents if the array field is a specified size. For example: {myArr:{$size:5}}

Understanding Query Options Objects

In addition to the query object, most of the methods of retrieving documents using the MongoDB Node.js driver also include an options object. The options object allows you to define the behavior of the request when retrieving documents. These options allow you to limit the result set, sort items while creating the result set, and much more.

Table 15.2 lists the options that can be set on methods that retrieve documents from the MongoDB server. Not all of these methods are available on every request. For example, when counting items that match the query it doesn’t make sense to specify a limit on the result.

Table 15.2 Options that can be specified in the options object when querying documents

Option

Description

limit

Specifies the maximum number of documents to return.

sort

Specifies the sort order of documents as an array of [field, <sort_order>] elements where sort order is 1 for ascending and -1 for descending. For example:

sort:[['name':1],['value':-1]]

fields

Specifies an object whose fields match fields that should be included or excluded from the returned documents. A value of 1 means include, a value of 0 means exclude. You can only include or exclude, not both. For example: fields:{name:1,value:1}

skip

Specifies the number of documents from the query results to skip before returning a document. Typically used when paginating result sets.

hint

Forces the query to use specific indexes when building the result set. For example: hint:{'_id':1}

explain

Returns an explanation of what will happen when performing the query on the server instead of actually running the query. This is essential when trying to debug/optimize complex queries.

snapshot

{Boolean, default:false}; specifies whether to use a snapshot query.

timeout

A Boolean; when true, the cursor is allowed to timeout.

maxScan

Specifies the maximum number of documents to scan when performing a query before returning. This is useful if you have a collection that has millions of objects and you don’t want queries to run on forever.

comment

Specifies a string that is printed out in the MongoDB logs. This can help when troubleshooting because you can identify queries more easily.

readPreference

Specifies whether you want to read from a primary or secondary replica or just the nearest MongoDB server in the replica set to perform the query.

numberOfRetries    

Specifies the number of timeout retries to perform on the query before failing. The default is 5.

partial

A Boolean that, when true, indicates the cursor will return partial results when querying against data shared between sharded systems.

Finding Specific Sets of Documents

You were introduced to the find() method of the Collection object in Chapter 14. This method returns a Cursor object to the callback function providing access to the documents. If no query is specified, all documents are returned, which is rarely the case. Instead, you need a subset of documents that match a set of criteria.

To limit the resulting documents included in the find() method, you apply a query object that limits the documents returned in the Cursor. Listing 15.1 illustrates using query objects in a find statement.

Listing 15.1 performs a bunch of different queries against the word collection described earlier in this chapter. You should already recognize all the connection code as well as the code used in displayWords() to iterate through the cursor and display only the word names in the documents.

In line 20, the following query is used to find words starting with a, b, or c:

{first:{$in: ['a', 'b', 'c']}}

In line 23, the following query is used to find words longer than 12 letters:

{size:{$gt: 12}}

In line 26, the following query is used to find words with an even number of letters:

{size:{$mod: [2,0]}}

In line 29, the following query is used to find words with exactly 12 letters:

{letters:{$size: 12}}

In lines 32 and 33, the following query is used to find words that both begin and end with a vowel:

{$and: [{first:{$in: ['a', 'e', 'i', 'o', 'u']}},
        {last:{$in: ['a', 'e', 'i', 'o', 'u']}}]}

In line 37, the following query is used to find words that contain more than 6 vowels:

{"stats.vowels":{$gt:6}}

In line 40, the following query is used to find words that contain all of the vowels:

{letters:{$all: ['a','e','i','o','u']}}

In line 44, the following query is used to find words with non-alphabet characters:

{otherChars: {$exists:true}}

In line 47, a bit more challenging query is used. The $elemMatch operator is used to match the charsets subdocuments. The $and operator forces the type field to equal other and the chars array field to be exactly 2:

{charsets:{$elemMatch:{$and:[{type:'other'},{chars:{$size:2}}]}}}

Listing 15.1 doc_query.js: Finding a specific set of documents in a MongoDB collection

01 var MongoClient = require('mongodb').MongoClient;
02 MongoClient.connect("mongodb://localhost/", function(err, db) {
03   var myDB = db.db("words");
04   myDB.collection("word_stats", findItems);
05   setTimeout(function(){
06     db.close();
07   }, 3000);
08 });
09 function displayWords(msg, cursor, pretty){
10   cursor.toArray(function(err, itemArr){
11     console.log("
"+msg);
12     var wordList = [];
13     for(var i=0; i<itemArr.length; i++){
14       wordList.push(itemArr[i].word);
15     }
16     console.log(JSON.stringify(wordList, null, pretty));
17   });
18 }
19 function findItems(err, words){
20   words.find({first:{$in: ['a', 'b', 'c']}}, function(err, cursor){
21     displayWords("Words starting with a, b or c: ", cursor);
22   });
23   words.find({size:{$gt: 12}}, function(err, cursor){
24     displayWords("Words longer than 12 characters: ", cursor);
25   });
26   words.find({size:{$mod: [2,0]}}, function(err, cursor){
27     displayWords("Words with even Lengths: ", cursor);
28   });
29   words.find({letters:{$size: 12}}, function(err, cursor){
30     displayWords("Words with 12 Distinct characters: ", cursor);
31   });
32   words.find({$and: [{first:{$in: ['a', 'e', 'i', 'o', 'u']}},
33                      {last:{$in: ['a', 'e', 'i', 'o', 'u']}}]},
34              function(err, cursor){
35     displayWords("Words that start and end with a vowel: ", cursor);
36   });
37   words.find({"stats.vowels":{$gt:6}}, function(err, cursor){
38     displayWords("Words containing 7 or more vowels: ", cursor);
39   });
40   words.find({letters:{$all: ['a','e','i','o','u']}},
41              function(err, cursor){
42     displayWords("Words with all 5 vowels: ", cursor);
43   });
44   words.find({otherChars: {$exists:true}}, function(err, cursor){
45     displayWords("Words with non-alphabet characters: ", cursor);
46   });
47   words.find({charsets:{$elemMatch:{$and:[{type:'other'},
48                                           {chars:{$size:2}}]}}},
49              function(err, cursor){
50     displayWords("Words with 2 non-alphabet characters: ", cursor);
51   });
52 }

Listing 15.1 Output doc_query.js: Finding a specific set of documents in a MongoDB collection

Words longer than 12 characters:
["international","administration","environmental","responsibility","investigation",
"communication","understanding","significantly","representative"…]
Words with 12 Distinct characters:
["uncomfortable","accomplishment","considerably"]
Words with non-alphabet characters:
["don't","won't","can't","shouldn't","e-mail","long-term","so-called","mm-hmm",
"t-shirt","and/or","health-care","full-time","o'clock","self-esteem"…]
Words starting with a, b or c:
["and","a","at","as","all","about","also","any","after","ask","another","american",
"against","again","always","area","around","away","among"…]
Words containing 7 or more vowels:
["identification","questionnaire","organizational","rehabilitation"]
Words that start and end with a vowel:
["a","also","area","ago","able","age","agree","available","anyone","article","argue",
"arrive","above","audience","assume","alone","achieve","attitude"…]
Words with all 5 vowels:
["education","educational","regulation","evaluation","reputation","communicate",
"dialogue","questionnaire","simultaneously","equation","automobile"…]
Words with 2 non-alphabet characters:
["two-third's","middle-class'"]
Words with even Lengths:
["be","of","in","to","have","it","that","he","with","on","do","this","they","at","we",
"from","by","or","as","what","go","if","my","make","know","will","up"…]

Counting Documents

When accessing document sets in MongoDB, you may want to only get a count first before deciding to retrieve a set of documents. There are several reasons to count specific document sets. Performing a count is much less intensive on the MongoDB side because when retrieving documents using the find() and other methods, temporary objects such as cursors must be created and maintained by the server.

Also, when performing operations on the resulting set of documents from a find(), you should be aware of how many documents you are going to be dealing with, especially in larger environments. Sometimes all you want is a count. For example, if you need to know how many users are configured in your application, you could just count the number of documents in the users collection.

The count() method on the Collection object allows you to get a simple count of documents that match the query object criteria. The count() method is formatted exactly the same way as the find() method, as shown below, and performs the query and options parameters in exactly the same manner.

count([query], [options], callback)

If no query is specified, the count() returns a count of all the documents in the database. The callback function should accept an error as the first argument and the count as an integer as the second.

Listing 15.2 illustrates using the count() method with the same queries performed with find() in Listing 15.1. The output in Listing 15.2 Output shows that instead of a Cursor a simple integer is returned and displayed.

Listing 15.2 doc_count.js: Counting specific sets of documents in a MongoDB collection

01 var MongoClient = require('mongodb').MongoClient;
02 MongoClient.connect("mongodb://localhost/", function(err, db) {
03   var myDB = db.db("words");
04   myDB.collection("word_stats", countItems);
05   setTimeout(function(){
06     db.close();
07   }, 3000);
08 });
09 function countItems(err, words){
10   words.count({first:{$in: ['a', 'b', 'c']}}, function(err, count){
11     console.log("Words starting with a, b or c: " + count);
12   });
13   words.count({size:{$gt: 12}}, function(err, count){
14     console.log("Words longer than 12 characters: " + count);
15   });
16   words.count({size:{$mod: [2,0]}}, function(err, count){
17     console.log("Words with even Lengths: " + count);
18   });
19   words.count({letters:{$size: 12}}, function(err, count){
20     console.log("Words with 12 Distinct characters: " + count);
21   });
22   words.count({$and: [{first:{$in: ['a', 'e', 'i', 'o', 'u']}},
23                      {last:{$in: ['a', 'e', 'i', 'o', 'u']}}]},
24              function(err, count){
25     console.log("Words that start and end with a vowel: " + count);
26   });
27   words.count({"stats.vowels":{$gt:6}}, function(err, count){
28     console.log("Words containing 7 or more vowels: " + count);
29   });
30   words.count({letters:{$all: ['a','e','i','o','u']}},
31               function(err, count){
32     console.log("Words with all 5 vowels: " + count);
33   });
34   words.count({otherChars: {$exists:true}}, function(err, count){
35     console.log("Words with non-alphabet characters: " + count);
36   });
37   words.count({charsets:{$elemMatch:{$and:[{type:'other'},
38               {chars:{$size:2}}]}}},
39               function(err, count){
40     console.log("Words with 2 non-alphabet characters: " + count);
41   });
42 }

Listing 15.2 Output doc_count.js: Counting specific sets of documents in a MongoDB collection

Words starting with a, b or c: 964
Words longer than 12 characters: 64
Words that start and end with a vowel: 227
Words with even Lengths: 2233
Words with 12 Distinct characters: 3
Words containing 7 or more vowels: 4
Words with non-alphabet characters: 24
Words with all 5 vowels: 16
Words with 2 non-alphabet characters: 2

Limiting Result Sets

When finding complex documents on larger systems, you often want to limit what is being returned to reduce the impact on the network, including the memory on both the server and client. You can limit the result sets that match a specific query in three ways: You can simply only accept a limited number of documents, you can limit the fields returned, or you can page the results and get them in chunks.

Limiting Results by Size

The simplest method to limit the amount of data returned in a find() or other query request is to use the limit option in the options parameter when performing the request. The limit parameter, shown below, allows only a fixed number of items to be returned with the Cursor object. This can save you from accidentally retrieving more objects than your application can handle:

limit:<maximum_documents_to_return>

Listing 15.3 illustrates limiting the results of a find() request by using the limit:5 option in the options object. The output in Listing 15.3 Output shows that when limit is used only five words are retrieved.

Listing 15.3 doc_limit.js: Limiting the specific set of documents in a MongoDB collection

01 var MongoClient = require('mongodb').MongoClient;
02 MongoClient.connect("mongodb://localhost/", function(err, db) {
03   var myDB = db.db("words");
04   myDB.collection("word_stats", limitFind);
05   setTimeout(function(){
06     db.close();
07   }, 3000);
08 });
09 function displayWords(msg, cursor, pretty){
10   cursor.toArray(function(err, itemArr){
11     console.log("
"+msg);
12     var wordList = [];
13     for(var i=0; i<itemArr.length; i++){
14       wordList.push(itemArr[i].word);
15     }
16     console.log(JSON.stringify(wordList, null, pretty));
17   });
18 }
19 function limitFind(err, words){
20   words.count({first:'p'}, function(err, count){
21     console.log("Count of words starting with p : " + count);
22   });
23   words.find({first:'p'}, function(err, cursor){
24     displayWords("Words starting with p : ", cursor);
25   });
26   words.find({first:'p'}, {limit:5}, function(err, cursor){
27     displayWords("Limiting words starting with p : ", cursor);
28   });
29 }

Listing 15.3 Output doc_limit.js: Limiting the specific set of documents in a MongoDB collection

Count of words starting with p : 353
 
Limiting words starting with p :
["people","put","problem","part","place"]
Words starting with p :
["people","put","problem","part","place","program","play","point","provide","power",
"political","pay"…]

Limiting Fields Returned in Objects

Another effective method of limiting the resulting data when retrieving documents is to limit which fields are returned. Documents may have many different fields that are useful in some circumstance but not in others. You should consider which fields should be included when retrieving documents from the MongoDB server and only request the ones necessary.

To limit the fields returned from the server, use the fields option of the options object. The fields option enables you to either include or exclude fields by setting the value of the document field to 0 for exclude or 1 for include. You cannot mix includes and excludes in the same expression.

For example, to exclude the fields stats, value, and comments when returning a document, you would use the following fields option:

{fields:{stats:0, value:0, comments:0}}

Often it is easier to just include a few fields; for example, if you want to include only the name and value fields of documents, you would use

{fields:{name:1, value:1}}

Listing 15.4 illustrates using the fields option to reduce the amount of data returned from the server by excluding fields or specifying fields to include.

Listing 15.4 doc_fields.js: Limiting the fields returned with a set of documents

01 var MongoClient = require('mongodb').MongoClient;
02 MongoClient.connect("mongodb://localhost/", function(err, db) {
03   var myDB = db.db("words");
04   myDB.collection("word_stats", limitFields);
05   setTimeout(function(){
06     db.close();
07   }, 3000);
08 });
09 function limitFields(err, words){
10   words.findOne({word:'the'}, {fields:{charsets:0}},
11                 function(err, item){
12     console.log("Excluding fields object: ");
13     console.log(JSON.stringify(item, null, 2));
14   });
15   words.findOne({word:'the'}, {fields:{word:1,size:1,stats:1}},
16                 function(err, item){
17     console.log("Including fields object: ");
18     console.log(JSON.stringify(item, null, 2));
19   });
20 }

Listing 15.4 Output doc_fields.js: Limiting the fields returned with a set of documents

Excluding fields object:
{
  "_id": "58f04c8c6ec5050becd012c5",
  "word": "the",
  "first": "t",
  "last": "e",
  "size": 3,
  "letters": [
    "t",
    "h",
    "e"
  ],
  "stats": {
    "vowels": 1,
    "consonants": 2
  }
}
Including fields object:
{
  "_id": "58f04c8c6ec5050becd012c5",
  "word": "the",
  "size": 3,
  "stats": {
    "vowels": 1,
    "consonants": 2
  }
}

Paging Results

Paging is a common method of reducing the number of documents returned. Paging involves specifying a number of documents to skip in the matching set as well as a limit on the documents returned. Then the skip value is incremented each time by the number returned the previous time.

To implement paging on a set of documents, you need to implement the limit and skip options on the options object. The skip option specifies a number of documents to skip before returning documents. By moving the skip value each time, you get another set of documents you can effectively page through in the data set. Also, always include a sort option when paging data to ensure the order is always the same. For example, the following statements find documents 1–10, then 11–20, and 21–30:

collection.find({},{sort:[['_id':1]], skip:0, limit:10},
                function(err, cursor){});
collection.find({},{sort:[['_id':1]], skip:10, limit:10}, function(err, cursor){});
collection.find({},{sort:[['_id':1]], skip:20, limit:10}, function(err, cursor){});

Listing 15.5 illustrates using limit and skip to page through a specific set of documents. Each time a new find() request is implemented, which more closely mimics what would happen when handling paging requests from a webpage. Listing 15.5 Output shows the output of Listing 15.5. Notice that words are retrieved 10 at a time.

Warning

If the data on the system changes in such a way that it affects the results of the query, the skip may miss some items or include items again in a subsequent paged request.

Listing 15.5 doc_paging.js: Paging results from a specific set of documents in a MongoDB collection

01 var util = require('util');
02 var MongoClient = require('mongodb').MongoClient;
03 MongoClient.connect("mongodb://localhost/", function(err, db) {
04   var myDB = db.db("words");
05   myDB.collection("word_stats", function(err, collection){
06     pagedResults(err, collection, 0, 10);
07   });
08 });
09 function displayWords(msg, cursor, pretty){
10   cursor.toArray(function(err, itemArr){
11     console.log("
"+msg);
12     var wordList = [];
13     for(var i=0; i<itemArr.length; i++){
14       wordList.push(itemArr[i].word);
15     }
16     console.log(JSON.stringify(wordList, null, pretty));
17   });
18 }
19 function pagedResults(err, words, startIndex, pageSize){
20   words.find({first:'v'},
21              {limit:pageSize, skip:startIndex, sort:[['word',1]]},
22              function(err, cursor){
23     cursor.count(true, function(err, cursorCount){
24       displayWords("Page Starting at " + startIndex, cursor);
25       if (cursorCount === pageSize){
26         pagedResults(err, words, startIndex+pageSize, pageSize);
27       } else {
28         cursor.db.close();
29       }
30     });
31   });
32 }

Listing 15.5 Output doc_paging.js: Paging results from a specific set of documents in a MongoDB collection

Page Starting at 0
["vacation","vaccine","vacuum","valid","validity","valley","valuable","value",
"van","vanish"]

Page Starting at 10
["variable","variation","variety","various","vary","vast","vegetable","vehicle",
"vendor","venture"]
 
Page Starting at 20
["verbal","verdict","version","versus","vertical","very","vessel","veteran","via",
"victim"]
 
Page Starting at 30
["victory","video","view","viewer","village","violate","violation","violence",
"violent","virtual"]
 
Page Starting at 40
["virtually","virtue","virus","visible","vision","visit","visitor","visual",
"vital","vitamin"]
 
Page Starting at 50
["vocal","voice","volume","voluntary","volunteer","vote","voter","voting","vs",
"vulnerable"]
 
Page Starting at 60
[]

Sorting Result Sets

An important aspect of retrieving documents from a MongoDB database is the ability to get it in a sorted format. This is especially helpful if you are only retrieving a certain number, such as the top 10, or if you are paging the requests. The options object provides the sort option that allows you to specify the sort order and direction of one or more fields in the document.

The sort option is specified using an array of [field,<sort_order>] pairs, where sort_order is 1 for ascending and -1 for descending. For example, to sort on the name field descending first and then the value field ascending you would use the following:

sort:[['value':-1]['name':1]]

Listing 15.6 illustrates using the sort option to find and sort lists of words in different ways. Notice that in line 29 the words are sorted by size first and then by last letter, whereas in line 33 they are sorted by last letter first and then by size. The two different sort orders result in different lists of words returned.

Listing 15.6 doc_sort.js: Sorting results of a find() request for a set of documents in a MongoDB collection

01 var MongoClient = require('mongodb').MongoClient;
02 MongoClient.connect("mongodb://localhost/", function(err, db) {
03   var myDB = db.db("words");
04   myDB.collection("word_stats", sortItems);
05   setTimeout(function(){
06     db.close();
07   }, 3000);
08 });
09 function displayWords(msg, cursor, pretty){
10   cursor.toArray(function(err, itemArr){
11     console.log("
"+msg);
12     var wordList = [];
13     for(var i=0; i<itemArr.length; i++){
14       wordList.push(itemArr[i].word);
15     }
16     console.log(JSON.stringify(wordList, null, pretty));
17   });
18 }
19 function sortItems(err, words){
20   words.find({last:'w'}, function(err, cursor){
21     displayWords("Words ending in w: ", cursor);
22   });
23   words.find({last:'w'}, {sort:{word:1}}, function(err, cursor){
24     displayWords("Words ending in w sorted ascending: ", cursor);
25   });
26   words.find({last:'w'}, {sort:{word:-1}}, function(err, cursor){
27     displayWords("Words ending in w sorted, descending: ", cursor);
28   });
29   words.find({first:'b'}, {sort:[['size',-1],['last',1]]},
30              function(err, cursor){
31     displayWords("B words sorted by size then by last letter: ", cursor);
32   });
33   words.find({first:'b'}, {sort:[['last',1],['size',-1]]},
34              function(err, cursor){
35     displayWords("B words sorted by last letter then by size: ", cursor);
36   });
37 }

Listing 15.6 Output doc_sort.js: Sorting results of a find() request for a set of documents in a MongoDB collection

Words ending in w:
["know","now","how","new","show","few","law","follow","allow","grow","low","view",
"draw","window","throw","interview","tomorrow"…
Words ending in w sorted ascending:
["allow","arrow","below","blow","borrow","bow","chew","cow","crew","draw","elbow",
"eyebrow","fellow","few","flow"…
Words ending in w sorted, descending:
["yellow","wow","withdraw","window","widow","view","tomorrow","throw","swallow",
"straw","somehow","snow"…
B words sorted by size then by last letter:
["businessman","background","basketball","biological","behavioral","boyfriend",
"beginning"…
B words sorted by last letter then by size:
["bacteria","banana","bomb","bulb","basic","background","boyfriend","backyard",
"balanced","behind","beyond"…

Finding Distinct Field Values

A useful query against a MongoDB collection is to get a list of the distinct values for a single field in a set of documents. Distinct means that even though there are thousands of documents, you only want to know the unique values that exist.

The distinct() method on Collection objects allows you to find a list of distinct values for a specific field. The syntax for the distinct() method is shown below:

distinct(key,[query],[options],callback)

The key parameter is the string value of the field name you want to get values for. You can specify subdocuments using the dot syntax such as stats.count. The query parameter is an object with standard query options defined in Table 15.1. The options parameter is an options object that allows you to define the readPreference option defined in Table 15.2. The callback function should accept an error as the first parameter, and the results parameter, which is an array of distinct values for the field specified in the key parameter, as the second.

Listing 15.7 illustrates finding the distinct values in the words collection. Notice that a query in line 14 limits the words to those starting with u. In line 18, to access the stats.vowels field, the dot syntax is used.

Listing 15.7 doc_distinct.js: Finding distinct field values in a specific set of documents in a MongoDB collection

01 var MongoClient = require('mongodb').MongoClient;
02 MongoClient.connect("mongodb://localhost/", function(err, db) {
03   var myDB = db.db("words");
04   myDB.collection("word_stats", distinctValues);
05   setTimeout(function(){
06     db.close();
07   }, 3000);
08 });
09 function distinctValues(err, words){
10   words.distinct('size', function(err, values){
11     console.log("
Sizes of words: ");
12     console.log(values);
13   });
14   words.distinct('first', {last:'u'}, function(err, values){
15     console.log("
First letters of words ending in u: ");
16     console.log(values);
17   });
18   words.distinct('stats.vowels', function(err, values){
19     console.log("
Numbers of vowels contained in words: ");
20     console.log(values);
21   });
22 }

Listing 15.7 Output doc_distinct.js: Finding distinct field values in a specific set of documents in a MongoDB collection

Sizes of words:
[ 3, 2, 1, 4, 5, 9, 6, 7, 8, 10, 11, 12, 13, 14 ]
 
First letters of words ending in u:
[ 'y', 'm', 'b' ]
 
Numbers of vowels contained in words:
[ 1, 2, 0, 3, 4, 5, 6, 7 ]

Grouping Results

When performing operations on large data sets, it is often useful to group the results based on the distinct values of one or more fields in a document. This could be done in code after retrieving the documents; however, it is much more efficient to have MongoDB do it for you as part of a single request that is already iterating through the documents.

To group the results of a query together, you can use the group() method on the Collection object. The group request first collects all the documents that match a query, and then adds a group object to an array based on distinct values of a set of keys, performs operations on the group objects, and returns the array of group objects. The syntax for the group() method is shown below:

group(keys, query, initial, reduce, finalize, command, [options], callback)

The parameters of the group() method are described in the following list:

Images keys: This can be an object, array, or function expressing the keys to group by. The simplest method is to specify the key(s) in an object such as {field1:true, field2:true} or an array such as

['first', 'last'].

Images query: Query object that defines which documents to be included in the initial set. See Table 15.1 for a list of query options.

Images initial: Specifies an initial group object to use when aggregating data while grouping. An initial object is created for each distinct set of keys. The most common use is a counter that tracks a count of items that match the keys. For example:
{"count":0}

Images reduce: This is a function(obj, prev) with two parameters, obj and prev. This function is executed on each document that matches the query. The obj parameter is the current document, and the prev parameter is the object created by the initial parameter. You can then use the obj object to update the prev object with new values such as counts or sums. For example, to increment the count value you would use:
function(obj, prev) { prev.count++; }

Images finalize: This is a function(obj) that accepts one parameter, obj, which is the final object resulting from the initial parameter and updated as prev in the reduce function. This function is called on the resulting object for each distinct key before returning the array in the response.

Images command: A Boolean that, when true, specifies the command will run using the internal group command instead of eval(). The default is true.

Images options: This object allows you to define the readPreference option.

Images callback: This accepts an error as the first parameter and an array of the results objects as the second.

Listing 15.8 illustrates implementing grouping of words based on various key sets. Lines 10–18 implement a basic grouping of words by first and last letter. The query in line 11 limits the words to those beginning with o and ending with a vowel. The initial object for each has a count property only, which is updated for each matching document in the function on line 13.

Lines 19–28 add the concept of summing the total vowels in the documents while grouping them together by incrementing the prev.totalVowels with the obj.stats.vowels value in line 23. Then lines 29–40 show the use of a finalize function that adds a new obj.total property to the group objects that is a sum of the obj.vowels and obj.consonants properties of the object.

Listing 15.8 doc_group.js: Grouping a set of documents by specific fields in a MongoDB collection

01 var MongoClient = require('mongodb').MongoClient;
02 MongoClient.connect("mongodb://localhost/", function(err, db) {
03   var myDB = db.db("words");
04   myDB.collection("word_stats", groupItems);
05   setTimeout(function(){
06     db.close();
07   }, 3000);
08 });
09 function groupItems(err, words){
10   words.group(['first','last'],
11               {first:'o',last:{$in:['a','e','i','o','u']}},
12               {"count":0},
13               function (obj, prev) { prev.count++; }, true,
14               function(err, results){
15         console.log("
'O' words grouped by first and last" +
16                     " letter that end with a vowel: ");
17         console.log(results);
18   });
19   words.group(['first'],
20               {size:{$gt:13}},
21               {"count":0, "totalVowels":0},
22               function (obj, prev) {
23                 prev.count++; prev.totalVowels += obj.stats.vowels;
24               }, {}, true,
25               function(err, results){
26     console.log("
Words grouped by first letter larger than 13: ");
27     console.log(results);
28   });
29   words.group(['first'],{}, {"count":0, "vowels":0, "consonants":0},
30               function (obj, prev) {
31                 prev.count++;
32                 prev.vowels += obj.stats.vowels;
33                 prev.consonants += obj.stats.consonants;
34               },function(obj){
35                 obj.total = obj.vowels + obj.consonants;
36               }, true,
37               function(err, results){
38         console.log("
Words grouped by first letter with totals: ");
39         console.log(results);
40   });
41 }

Listing 15.8 Output doc_group.js: Grouping a set of documents by specific fields in a MongoDB collection

'O' words grouped by first and last letter that end with a vowel:
[ { first: 'o', last: 'e', count: 21 },
  { first: 'o', last: 'o', count: 1 },
  { first: 'o', last: 'a', count: 1 } ]
 
Words grouped by first letter larger than 13:
[ { first: 'a', count: 4, totalVowels: 22 },
  { first: 'r', count: 5, totalVowels: 30 },
  { first: 'c', count: 2, totalVowels: 11 },
  { first: 't', count: 2, totalVowels: 10 },
  { first: 'i', count: 4, totalVowels: 24 },
  { first: 'd', count: 2, totalVowels: 11 },
  { first: 's', count: 1, totalVowels: 6 },
  { first: 'o', count: 1, totalVowels: 7 } ]
 
Words grouped by first letter with totals:
[ { first: 't',
    count: 250,
    vowels: 545,
    consonants: 1017,
    total: 1562 },
  { first: 'b',
    count: 218,
    vowels: 417,
    consonants: 769,
    total: 1186 },
  { first: 'a',
    count: 295,
    vowels: 913,
    consonants: 1194,
    total: 2107 },
  { first: 'o',
    count: 118,
    vowels: 356,
    consonants: 435,
    total: 791 },
  { first: 'i',
    count: 189,
    vowels: 655,
    consonants: 902,
    total: 1557 },
  { first: 'h',
    count: 139,
    vowels: 289,
    consonants: 511,
    total: 800 },
  { first: 'f',
    count: 203,
    vowels: 439,
    consonants: 774,
    total: 1213 },
  { first: 'y', count: 16, vowels: 31, consonants: 50, total: 81 },
  { first: 'w',
    count: 132,
    vowels: 255,
    consonants: 480,
    total: 735 },
  { first: 'd',
    count: 257,
    vowels: 675,
    consonants: 1102,
    total: 1777 },
  { first: 'c',
    count: 451,
    vowels: 1237,
    consonants: 2108,
    total: 3345 },
  { first: 's',
    count: 509,
    vowels: 1109,
    consonants: 2129,
    total: 3238 },
  { first: 'n', count: 82, vowels: 205, consonants: 314, total: 519 },
  { first: 'g',
    count: 112,
    vowels: 236,
    consonants: 414,
    total: 650 },
  { first: 'm',
    count: 200,
    vowels: 488,
    consonants: 778,
    total: 1266 },
  { first: 'k', count: 21, vowels: 33, consonants: 70, total: 103 },
  { first: 'u', count: 58, vowels: 173, consonants: 233, total: 406 },
  { first: 'p',
    count: 353,
    vowels: 902,
    consonants: 1575,
    total: 2477 },
  { first: 'j', count: 33, vowels: 72, consonants: 114, total: 186 },
  { first: 'l',
    count: 142,
    vowels: 307,
    consonants: 503,
    total: 810 },
  { first: 'v', count: 60, vowels: 163, consonants: 218, total: 381 },
  { first: 'e',
    count: 239,
    vowels: 788,
    consonants: 1009,
    total: 1797 },
  { first: 'r',
    count: 254,
    vowels: 716,
    consonants: 1011,
    total: 1727 },
  { first: 'q', count: 16, vowels: 50, consonants: 59, total: 109 },
  { first: 'z', count: 1, vowels: 2, consonants: 2, total: 4 } ]

Applying MapReduce by Aggregating Results

One benefit of MongoDB is the ability to MapReduce the results of database queries into a completely different structure than the original collections. MapReduce is the process of mapping the values on a DB lookup into a completely different form and then reducing it to make it more consumable.

MongoDB has a MapReduce framework but has also added a framework that simplifies the process of piping one MapReduce operation into another in a series to produce some extraordinary results with the data. Aggregation is the concept of applying a series of operations to documents on the MongoDB server as they are being compiled into a result set. This is much more efficient than retrieving them and processing them in your Node.js application because the MongoDB server can operate on chunks of data locally.

Understanding the aggregate() Method

The Collection object provides the aggregate() method to perform aggregation operations on data. The syntax for the aggregate() method is shown below:

aggregate(operators, [options], callback)

The operators parameter is an array of aggregation operators, shown in Table 15.3, that allow you to define what aggregation operation to perform on the data. The options parameter is an object that allows you to set the readPreference property that defines where to read the data from. The callback parameter should be a function that accepts an error as the first parameter and a results array as the second. The results array is the fully aggregated object set returned by the aggregation.

Using Aggregation Framework Operators

The aggregation framework provided by MongoDB is powerful in that it allows you to pipe the results of one aggregation operator into another multiple times. To illustrate this, look at the following data set:

{o_id:"A", value:50, type:"X"}
{o_id:"A", value:75, type:"X"}
{o_id:"B", value:80, type:"X"}
{o_id:"C", value:45, type:"Y"}

The following aggregation operator set would pipeline the results of the $match into the $group operator and then return the grouped set in the results parameter of the callback function. Notice that when referencing the values of fields in documents that the field name is prefixed by a dollar sign, for example $o_id and $value. This syntax tells the aggregate framework to treat it as a field value instead of a string.

aggregate([{$match:{type:"X"}},
           {$group:{set_id:"$o_id", total: {$sum: "$value"}}},
           function(err, results){}]);

After the $match operator completes, the documents that would be applied to $group would be:

{o_id:"A", value:50, type:"X"}
{o_id:"A", value:75, type:"X"}
{o_id:"B", value:80, type:"X"}

Then after the $group operator is applied, a new array of objects is sent to the callback function with set_id and total fields as shown below:

{set_id:"A", total:"125"}
{set_id:"B", total:"80"}

Table 15.3 defines the types of aggregation commands that you can include in the operators parameter to the aggregate() method:

Table 15.3 Aggregation operators that can be used in the aggregate() method

Operator

Description

$project

Reshapes the documents by renaming, adding, or removing fields. You can also recompute values and add subdocuments. For example, the following includes title and excludes name:

{$project{title:1, name:0}}

The following is an example of renaming name to title:

{$project{title:"$name"}}

The following is an example of adding a new field total and computing its value from price and tax fields:

{$project{total:{$add:["$price", "$tax"]}}}

$match

Filters the document set using the query operators defined in Table 15.1. For example:

{$match:{value:{$gt:50}}}

$limit

Restricts the number of documents that can be passed to the next pipe in the aggregation. For example:

{$limit:5}

$skip

Specifies a number of documents to skip before processing the next pipe in the aggregation. For example:

{$skip:10}

$unwind

Specifies an array field that splits, with a separate document created for each value. For example:

{$unwind:"$myArr"}

$group

Groups the documents together into a new set of documents for the next level in the pipe. The fields of the new object must be defined in the $group object. You can also apply group expression operators, listed in Table 15.4, to the multiple documents in the group. For example, to sum the value field:

{$group:{set_id:"$o_id", total: {$sum: "$value"}}}

$sort

Sorts the documents before passing them on to the next pipe in the aggregation. The sort specifies an object with field:<sort_order> properties where <sort_order> is 1 for ascending and -1 for descending. For example:

{$sort: {name:1, age:-1}}

$collStatus

Returns a collection’s or view’s statistics.

$redact

Restricts each document in the stream based on stored values in the documents. Can implement field level redaction. Every one input document outputs one or zero documents.

$sample

Selects a specific number of random documents from its input.

$geoNear

Returns an ordered stream of documents based on proximity to a geospatial point. An additional distance field and a location identifier field can be included in the output documents.

$lookup

Used to perform an equality match between a field in the input documents with field documents in a joined collection.

$out

Writes the resulting documents of the aggregation pipeline to a collection. The $out stage must be used in the last stage of the pipeline.

$indexStats

Returns statistics of the use of each index for the collection.

$facet

Processes multiple aggregation pipelines in a single stage on the same input documents. Allows the creation of multifaceted aggregations that can characterize data across multiple dimensions or facets in a single stage.

$bucket

Categorizes incoming documents into groups called buckets, based on certain expressions and bucket boundaries.

$bucketAuto

Categorizes incoming documents into a specific number of groups called buckets, based on certain expressions. The bucket boundaries are automatically determined to try and evenly distribute documents into the specified number of buckets.

$sortByCount    

Sorts incoming documents by the value of a specified expression and then determines the number of documents in each specific group.

$addFields

Adds new fields to documents.

$replaceRoot

Replaces a document with a specified embedded document, replacing all fields in the input document.

$count

Returns a count of all documents at this point of the aggregation pipeline.

$graphLookup

Searches a collection and adds a new array field on each output document that contains the traversal results of the search for that document.

Implementing Aggregation Expression Operators

When you implement the aggregation operators, you are building a new document that will be passed to the next level in the aggregation pipeline. The MongoDB aggregation framework provides a number of expression operators that help when computing values for new fields or for comparison of existing fields in the documents.

When operating on a $group aggregation pipe, multiple documents match the defined fields in the new documents created. MongoDB provides a set of operators that you can apply to those documents and use to compute values for fields in the new group document based on values of fields in the original set of documents. Table 15.4 lists the $group expression operators.

Table 15.4 Aggregation $group expression operators

Operator

Description

$addToSet

Returns an array of all the unique values for the selected field among all the documents in that group. For example:

colors: {$addToSet: "$color"}

$first

Returns the first value for a field in a group of documents. For example:

firstValue:{$first: "$value"}

$last

Returns the last value for a field in a group of documents. For example:

lastValue:{$last: "$value"}

$max

Returns the highest value for a field in a group of documents. For example:

maxValue:{$max: "$value"}

$min

Returns the lowest value for a field in a group of documents. For example:

minValue:{$min: "$value"}

$avg

Returns an average of all the values for a field in a group of documents. For example:

avgValue:{$avg: "$value"}

$push

Returns an array of all values for the selected field among all the documents in that group of documents. For example:

username:{$push: "$username"}

$sum

Returns the sum of all the values for a field in a group of documents. For example:

total:{$sum: "$value"}

Several string and arithmetic operators also can be applied when computing new field values. Table 15.5 lists some of the more common operators that you can apply when computing new field values in the aggregation operators:

Table 15.5 String and arithmetic operators used in aggregation expressions

Operator

Description

$add

Computes the sum of an array of numbers. For example:

valuePlus5:{$add:["$value", 5]}

$divide

Takes two numbers and divides the first number by the second. For example:

valueDividedBy5:{$divide:["$value", 5]}

$mod

Takes two numbers and calculates the modulo of the first number divided by the second. For example:

valueMod5:{$mod:["$value", 5]}

$multiply

Computes the product of an array of numbers. For example:

valueTimes5:{$multiply:["$value", 5]}

$subtract

Takes two numbers and subtracts the second number from the first. For example:

valueMinus5:{$minus:["$value", 5]}

$concat

Concatenates two strings. For example:

title:{$concat:["$title", " ", "$name"]}

$strcasecmp

Compares two strings and returns an integer that reflects the comparison. For example:

isTest:{$strcasecmp:["$value", "test"]}

$substr

Takes a string and returns a portion of that string. For example:

hasTest:{$substr:["$value", "test"]}

$toLower

Converts a string to lowercase. For example:

titleLower:{$toLower:"$title"}

$toUpper

Converts a string to uppercase. For example:

titleUpper:{$toUpper:"$title"}

Aggregation Examples

Listing 15.9 illustrates three examples of implementing aggregation against the words collection.

The first example, in lines10–20, implements a $match to get words beginning in vowels and then a $group to calculate the largest and smallest sizes. The results are then sorted using $sort and displayed in Listing 15.9 Output.

The second example, in lines 21–27, uses $match to limit the words to size 4. Then $limit is used to only process five documents in the $project operator. The third example, in lines 28–34, uses a $group to get the average size of the words and set the _id value to the word. Then the words are sorted in descending order by average and displayed in Listing 15.9 Output.

Listing 15.9 doc_aggregate.js: Grouping a set of documents by specific fields in a MongoDB collection

01 var MongoClient = require('mongodb').MongoClient;
02 MongoClient.connect("mongodb://localhost/", function(err, db) {
03   var myDB = db.db("words");
04   myDB.collection("word_stats", aggregateItems);
05   setTimeout(function(){
06     db.close();
07   }, 3000);
08 });
09 function aggregateItems(err, words){
10   words.aggregate([{$match: {first:{$in:['a','e','i','o','u']}}},
11                    {$group: {_id:"$first",
12                              largest:{$max:"$size"},
13                              smallest:{$min:"$size"},
14                              total:{$sum:1}}},
15                    {$sort: {_id:1}}],
16               function(err, results){
17     console.log("Largest and smallest word sizes for " +
18                 "words beginning with a vowel: ");
19     console.log(results);
20   });
21   words.aggregate([{$match: {size:4}},
22                    {$limit: 5},
23                    {$project: {_id:"$word", stats:1}}],
24               function(err, results){
25     console.log("Stats for 5 four letter words: ");
26     console.log(results);
27   });
28   words.aggregate([{$group: {_id:"$first", average:{$avg:"$size"}}},
29                     {$sort: {average:-1}},
30                     {$limit: 5}],
31               function(err, results){
32     console.log("Letters with largest average word size: ");
33     console.log(results);
34   });
35 }

Listing 15.9 Output doc_aggregate.js: Grouping a set of documents by specific fields in a MongoDB collection

Stats for 5 four letter words:
[ { stats: { vowels: 2, consonants: 2 }, _id: 'have' },
  { stats: { vowels: 1, consonants: 3 }, _id: 'that' },
  { stats: { vowels: 1, consonants: 3 }, _id: 'with' },
  { stats: { vowels: 1, consonants: 3 }, _id: 'this' },
  { stats: { vowels: 1, consonants: 3 }, _id: 'they' } ]
Largest and smallest word sizes for words beginning with a vowel:
[ { _id: 'a', largest: 14, smallest: 1, total: 295 },
  { _id: 'e', largest: 13, smallest: 3, total: 239 },
  { _id: 'i', largest: 14, smallest: 1, total: 189 },
  { _id: 'o', largest: 14, smallest: 2, total: 118 },
  { _id: 'u', largest: 13, smallest: 2, total: 58 } ]
Letters with largest average word size:
[ { _id: 'i', average: 8.238095238095237 },
  { _id: 'e', average: 7.523012552301255 },
  { _id: 'c', average: 7.419068736141907 },
  { _id: 'a', average: 7.145762711864407 },
  { _id: 'p', average: 7.01699716713881 } ]

Summary

In this chapter you looked at the query and options objects used by Collection methods to access documents in the database. The query object allows you to limit which documents are considered for operations. The options object allows you to control the interaction of the requests to limit the number of documents returned, which document to start on, and what fields should be returned.

The distinct(), group(), and aggregate() methods allow you to group documents based on field values. The MongoDB aggregation framework is a powerful feature that allows you to process documents on the server before returning them to the client. The aggregation framework allows you to pipe documents from one aggregation operation to the next, each time mapping and reducing to a more defined set of data.

Next

In the next chapter, you use the mongoose module to implement an Object Document Model (ODM), which provides a more structured approach to data modeling from Node.js.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.196.175