What You’ll Learn in This Hour:
Paging documents from a large dataset in Java
Limiting which fields are returned from documents in Java
Using methods to generate lists of distinct field values in documents from a collection in Java
Implementing grouping from Java to group documents and build a return dataset
Applying an aggregation pipeline to build a dataset from documents in a collection from Java
This hour continues the last hour’s coverage of the Java MongoDB driver and how to implement it to retrieve data in your Java applications. This hour takes you through the process of using Java applications to limit the return results by limiting the number of documents returned, limiting the fields returned, and paging through large sets.
This hour also covers how to implement distinct grouping and aggregation operations from a Java application. These operations enable you to process data on the server before returning it to the Java application, reducing the amount of data sent and the work required in the application.
When finding documents on larger databases with more complex documents, you often want to limit what is being returned in requests to reduce the impact on the network, memory on both server and client, and so on. You have three ways to limit the result sets that match a specific query. You can simply accept only a limited number of documents, you can limit the fields that are returned, or you can page the results and get them in paged chunks.
The simplest method for limiting the amount of data returned in a find()
or other query request is to use the limit()
method on the DBCursor
object returned by the find()
operation. The limit()
method limits the cursor so that it returns only a fixed number of items. This can save you from accidentally retrieving more objects than your application can handle.
For example, the following code displays only the first 10 documents in a collection even though there could be thousands:
DBCursor cursor = wordsColl.find();
cursor.limit(10);
while(cursor.hasNext()){
DBObject word = cursor.next();
System.out.println(word);
}
Another extremely effective method of limiting the resulting data when retrieving documents is to limit which fields are returned. Documents might have a lot of fields that are useful in some circumstance but not in others. Consider which fields should be included when retrieving documents from the MongoDB server, and request only the ones necessary.
To limit the fields returned from the server in a find()
operation on a DBCollection
object, you can use the fields
parameter. It is a BasicDBObject
that contains the files with a value of true
to include or false
to exclude.
For example, to exclude the fields stats
, value
, and comments
when returning documents, you would use null
for the query
object because you are not finding all objects. You would use the following fields
object:
BasicDBObject fields = new BasicDBObject("stats", false);
fields.append("value", false);
fields.append("comments", false);
DBCursor cursor = myColl.find(null, fields);
Often including just a few fields is easier. For example, if you want to include only the word
and size
fields of documents where the first
field equals t
, you would use:
BasicDBObject query = new BasicDBObject("first", "t");
BasicDBObject fields = new BasicDBObject("word", true);
fields.append("size", true);
DBCursor cursor = myColl.find(query, fields);
A common method of reducing the number of documents returned is to use paging. Paging involves specifying a number of documents to skip in the matching set, as well as a limit on the documents returned. Then the skip value is incremented each time by the amount returned the previous time.
To implement paging on a set of documents, you need to implement the limit()
and skip()
methods on the DBCursor
object. The skip()
method enables you to specify a number of documents to skip before returning documents.
By incrementing the value used in the skip()
method by the size used in limit()
each time you get another set of documents, you can effectively page through the dataset.
For example, the following statements find documents 11–20:
DBCursor cursor = collection.find();
cursor.limit(10);
cursor.skip(10);
Always include a sort
option when paging data, to ensure that the order of documents is the same.
A useful query against a MongoDB collection is to get a list of the distinct values for a single field in a set of documents. Distinct means that even though thousands of documents exist, you want to know only the unique values.
The distinct()
method on DBCollection
objects enables you to find a list of distinct values for a specific field. The syntax for the distinct()
method follows:
distinct(key, [query])
The key
parameter is the string value of the field name you want to get values for. You can specify subdocuments using dot syntax, such as stats.count
. The query
parameter is an object with standard query options to limit the documents used to evaluate for distinct field values.
For example, to find the distinct last names of users over 65 in a collection that has documents with first
, last
, and age
fields, you would use the following operation:
BasicDBObject query = new BasicDBObject("age",
new BasicDBObject("$gt", 5));
lastNames = myCollection.distinct('last', query);
The distinct()
method returns an array with the distinct values for the field specified. For example:
["Smith", "Jones", ...]
When performing operations on large datasets in Java, grouping the results based on the distinct values of one or more fields in a document is often useful. This can be done in code after retrieving the documents, but it is much more efficient to have the MongoDB server do it for you as part of a single request that is already iterating though the documents.
In Java, to group the results of a query, you can use the group()
method on the DBCollection
object. The group request collects all the documents that match a query
, adds a group object to an array based on distinct values of a set of keys
, performs operations on the group objects, and returns the array of group objects.
The syntax for the group()
methods follows:
group({key, cond , initial, reduce, [finalize]})
The key
, cond
, and initial
parameters are BasicDBObjects
that define the fields to use, query, and limit documents and initial value settings. The reduce
and finalize
methods are String
objects that contain a string form of a JavaScript function that will be run on the server to reduce
and finalize
the request. See Hour 9, “Utilizing the Power of Grouping, Aggregation, and Map Reduce,” for more information on these parameters.
To illustrate, the following code implements a basic grouping by generating the key
, cond
, and initial
objects and passing in a reduce
function as a string:
BasicDBObject key = new BasicDBObject("first", true);
BasicDBObject cond = new BasicDBObject("last", "a");
cond.append("size", 5);
BasicDBObject initial = new BasicDBObject("count", 0);
String reduce = "function (obj, prev) { prev.count++; }";
DBObject result = collection.group(key, cond, initial, reduce);
The result from the group method is a DBObject
that contains the groupings. To illustrate the results, the following code displays the items in the group one at a time:
for (Object name: group.toMap().values()) {
System.out.println(name);
}
Another valuable tool when working with MongoDB in Java applications is the aggregation framework. The DBCollection
object provides the aggregate()
method to perform aggregation operations on data. The syntax for the aggregate()
method follows:
aggregate(operator, [operator, ...])
The operator
parameter is one or more operator
objects that provide the pipeline for aggregating the results. The actual operator is a DBObject
built with the operators. Hour 9 described the aggregation operators, so you should already be familiar with them.
As an example, the following code creates $group
and $limit
operators. The $group
operator groups by _id
of the word
field and adds an average
field using the $avg
that averages a field named size
. Notice that the field names must be prefixed with $
in aggregation operations:
BasicDBObject groupOps = new BasicDBObject("_id", "$word");
groupOps.append("average", new BasicDBObject("$avg", "$size"));
BasicDBObject group = new BasicDBObject("$group", groupOps);
BasicDBObject limit = new BasicDBObject("$limit", 10);
AggregationOutput result = collection.aggregate(group, limit);
The result from the aggregate()
method is an AggregationOutput
object that contains the aggregation results. The results()
method on the AggregationOutput
object returns an iteratable object that you can use to access the results. To illustrate, the following code accesses and displays the items in the aggregated results one at a time:
for (Iterator<DBObject> items = result.results().iterator(); items.hasNext();){
System.out.println(items.next());
}
In this hour, you learned how to use additional methods on the DBCollection
and Cursor
objects. You learned that the limit()
method can reduce the documents the cursor
returns and that using limit()
and skip()
enables you to page through a large dataset. Using a fields
parameter on the find()
method enables you to reduce the number of fields returned from the database.
This hour also covered applying the distinct()
, group()
, and aggregate()
methods on the DBCollection
object to perform data gathering operations from a Java application. These operations enable you to process data on the server before returning it to the Java application, reducing the amount of data sent and the work required in the application.
Q. Do any Java frameworks support MongoDB?
A. Yes. For example, the Spring framework supports MongoDB.
The workshop consists of a set of questions and answers designed to solidify your understanding of the material covered in this hour. Try answering the questions before looking at the answers.
1. How do you get documents 21–30 represented by a Cursor
object in Java?
2. How do you find the different values for a specific field on documents in a collection in a Java application?
3. How do you return the first 10 documents from a collection in Java?
4. How do you prevent a specific field from being returned from a database query in Java?
1. Call limit(10)
and skip(20)
on the Cursor
object.
2. Use the distinct()
method on the DBCollection
object.
3. Use the limit(10)
method on the DBCursor
object.
4. Set the field to false
in the fields
object passed to the find()
method.
1. Write a new Java application that finds words in the example dataset that start with n
, sort them by length descending, and then display the top five.
2. Extend the JavaAggregate.java
file to include a function that performs an aggregate that matches words with a length of 4
, limits it to only five documents, and finally projects the word as the _id
field and displays the stats. The matching MongoDB shell aggregation would look similar to the following:
{$match: {size:4}},
{$limit: 5},
{$project: {_id:"$word", stats:1}}
3.149.235.63