MongoDB on Windows

Of course, if we want to follow the samples in this chapter, an installation of MongoDB is required on our local machine. You can do this from the official site (https://www.mongodb.com), where you'll find the installation software for the most popular operating systems (Windows, Mac, Linux, and Solaris).

You'll also find different editions of the product, including an Enterprise version for different flavors of Mongo. For the purpose of this topic, we can use the popular Community Edition Server version and download and install it using the .msi file resulting from the process.

As the documentation indicates, the installer includes all other software dependencies and will automatically upgrade any older version of MongoDB that's previously been installed. The current version (at the time of writing this) is 3.2.6, and it changes periodically. The process only takes a few seconds:

MongoDB on Windows

File structure and default configuration

As a result of the installation, a set of files will appear in the Program Files/MongoDB directory, containing a number of utilities and tools, plus the server itself. The main files to keep track of are mongod.exe, which is the server executable, and the command-line utility (mongo.exe), which provides a set of interactive options and allows data operations as well.

If you launch the server, a command window will show up, presenting some default configuration parameters:

  • It creates a default data directory in c:datadb, which is the default physical location of its internal data as well as the user's. Within this directory, a journal data file is created by default. It can be changed with a mondod –dbpath U:datapath command.
  • Another storing location is initialized in c:datadbdiagnostic.data, especially dedicated to activity monitoring.
  • Port 27017 is assigned to start listening for connections via TCP. You can change it in the configuration or by calling Mongod.exe with the --port [number] argument.

At this point, you can start interacting with the database. To do this, in a command-line fashion, you should use mongo.exe. Once launched, you can ask for help, and an initial list of commands will be presented.

A simple show dbs command will output, in my case, two databases that are present (previous databases of prior installations are not deleted, since they are located at another directory):

File structure and default configuration

In order to connect to a given database, we can type use <db_name> as the capture shows. This command also allows the creation of a new database. Hence, if the database exists, MongoDB switches to it; otherwise, it creates a new one.

A more useful feature allows you to ask for help on a concrete database. For example, if our Personal database contains a People collection, we can ask for specific help with a commands such as the following:

use Personal
db.Personal.help()

Another helpful utility is mongoimport.exe, which allows you to import data from a physical file we might have. We'll use this tool to import a flat JSON file obtained from the Union Cicliste International (http://www.uci.ch/road/ranking/) with the stats for 2016. Once we move the file to the c:datadb directory (this can be done from another location anyway), we can use the following command to import this data into a new database:

mongoimport --jsonArray --db Cyclists --collection Ranking16 < c:datadbRanking15.json
2016-05-06T13:57:49.755+0200    connected to: localhost
2016-05-06T13:57:49.759+0200    imported 40 documents

After this, we can start querying the database once we switch into it and find the first document in our collection:

File structure and default configuration

As you can see, the first command tells us the number of documents inserted, and the next one retrieves the first document. There's something to point out here, and that is the _id element in the document. It is automatically inserted by the importing process in order to uniquely identify each document in the collection.

Some useful commands

Usually, we can use the big collection of commands provided by Mongo to query the database in different ways. For example, if I want to list all cyclists from Great Britain, I can write the following:

> db.Ranking16.find( {"Nation": "Great Britain"} )
{ "_id" : ObjectId("572c8b77e8200fb42f000019"), "Rank" : "25 (24)", "Name" : "Geraint THOMAS", "Nation" : "Great Britain", "Team" : "SKY", "Age*" : 30, "Points" : 743 }
{ "_id" : ObjectId("572c8b77e8200fb42f000022"), "Rank" : "34 (32)", "Name" : "Ian STANNARD", "Nation" : "Great Britain", "Team" : "SKY", "Age*" : 29, "Points" : 601 }
{ "_id" : ObjectId("572c8b77e8200fb42f000025"), "Rank" : "37 (35)", "Name" : "Ben SWIFT", "Nation" : "Great Britain", "Team" : "SKY", "Age*" : 29, "Points" : 556 }

So, in order to filter information, the find() method expects a criteria written using the object notation syntax, which is typical of JavaScript. However, we can also select one from the total number of results, indicating it with an array syntax:

> db.Ranking16.find( {"Nation": "Great Britain"} )[0]
{
  "_id" : ObjectId("572c8b77e8200fb42f000019"),
  "Rank" : "25 (24)",
  "Name" : "Geraint THOMAS",
  "Nation" : "Great Britain",
  "Team" : "SKY",
  "Age*" : 30,
  "Points" : 743
}

As you can imagine, other options allow the projection of the required elements in a document instead of retrieving the whole one. For instance, we can ask for the names and ages of all the cyclists from Spain in this list using the following:

> db.Ranking16.find( {"Nation": "Spain"}, {"Name":1, "Age*":1} )
{ "_id" : ObjectId("572c8b77e8200fb42f000006"), "Name" : "Alberto CONTADOR VELASCO", "Age*" : 34 }
{ "_id" : ObjectId("572c8b77e8200fb42f00000a"), "Name" : "Alejandro VALVERDE BELMONTE", "Age*" : 36 }
{ "_id" : ObjectId("572c8b77e8200fb42f00000e"), "Name" : "Jon IZAGUIRRE INSAUSTI", "Age*" : 27 }
{ "_id" : ObjectId("572c8b77e8200fb42f00001c"), "Name" : "Samuel SANCHEZ GONZALEZ", "Age*" : 38 }

The numbers associated with the fields to be retrieved only indicate presence required (we want them in the output list) if they're bigger than 0 or absence if they are 0.

Let's say we need the list of Italian cyclists with their names and teams and no other field. We can type the following:

> db.Ranking16.find( {"Nation": "Italy"}, {"Name":1, "Team":1, "_id": 0 } )
{ "Name" : "Sonny COLBRELLI", "Team" : "BAR" }
{ "Name" : "Enrico GASPAROTTO", "Team" : "WGG" }
{ "Name" : "Diego ULISSI", "Team" : "LAM" }
{ "Name" : "Giovanni VISCONTI", "Team" : "MOV" }

Other combinations allow you to use JavaScript declarations to retrieve partial information that can be used later to get another result set. Here, we load the query into a variable and call it directly:

> var fellows = db.Ranking16.find({"Nation":"Australia"} , { "Name":1 , "Nation":1, "_id":0 });
> fellows
{ "Name" : "Richie PORTE", "Nation" : "Australia" }
{ "Name" : "Simon GERRANS", "Nation" : "Australia" }
{ "Name" : "Michael MATTHEWS", "Nation" : "Australia" }

Operators

The list of available operators in MongoDB is quite large, and they can be categorized according to the purpose in three main categories, as the official documentation shows:

  • Query and projection
  • Update
  • Aggregation pipeline

Each of these categories contains a large number of options, so you can refer to the official documentation for more details (https://docs.mongodb.com/manual/reference/operator/). For the purpose of this chapter, we'll use a few of the most common operators that appear in everyday work with MongoDB. The following table lists the most used operators:

Operator

Description

$eq

Matches values that are equal to a specified value

$gt

Matches values that are greater than a specified value

$gte

Matches values that are greater than or equal to a specified value

$lt

Matches values that are less than a specified value

$lte

Matches values that are less than or equal to a specified value

$ne

Matches all values that are not equal to a specified value

$in

Matches any of the values specified in an array

$nin

Matches none of the values specified in an array

Note that you can find some of these operators in different contexts or domain queries: for instance, most of the operators in the preceding table are also present in the set of operators linked to the Aggregation pipeline.

Another important clue is that these areas provide mechanisms to deal with information in many ways depending on the context. Actually, many of the operators that we find available in the SQL Server or Oracle RDBMS have an equivalent here, always preceded by the $ sign. For example, you can use the arithmetic operators in the Aggregation pipeline to create calculated fields, or you can use some mathematical operators defined as MongoDB commands, that remind, even syntactically, those that we can find in the Math static class in C# or JavaScript: $abs, $ceil, $log, $sqrt, and so on.

This happens with other typical RDBMS operators, such as the aggregation operators commonly used in statistical queries: $sum, $avg, $first, and so on. Other common families of operators that facilitate management operations are Date operators, String operators, Array operators, and Set operators.

The way to use them always depends on the context of the operation to be performed. In queries, we can embed them as part of the expressions that serve as the filtering criteria. However, keep in mind that the operand and operator form an object expression criteria. Also, remember that several of these expressions can be indicated with comma separation.

Let's imagine that we want the list of cyclists with more than 1,000 points and less than 1,300 points. We could express it as follows:

> db.Ranking16.find( {"Points": {$gt:1000, $lte: 1300}}, {"Name":1, "_id": 0 } )
{ "Name" : "Alexander KRISTOFF" }
{ "Name" : "Sep VANMARCKE" }
{ "Name" : "Ilnur ZAKARIN" }
{ "Name" : "Alejandro VALVERDE BELMONTE" }
{ "Name" : "Sergio Luis HENAO MONTOYA" }
{ "Name" : "Richie PORTE" }
{ "Name" : "Wouter POELS" }

Observe that there's an implicit AND operator in the way we express the points limits (the minimum and maximum) separated by commas.

The OR operator can also be expressed in this manner ($or), but the syntax for some cases requires careful separation of concerns. Let's imagine a case where we need to find a cyclist belonging to Commonwealth, for example. We need an $or operator to express this condition according to this syntax (we're omitting other nations not present on the list for brevity):

{ $or: [ {"Nation" : "Great Britain"}, { "Nation": "Ireland" }, {"Nation" : "Australia"} ] }

Effectively, the results of such query would be as follows:

> db.Ranking16.find( { $or : [ {"Nation": "Great Britain"}, { "Nation" : "Ireland"}, { "Nation": "Australia" } ] } , {"Name":1, "_id": 0 } )
{ "Name" : "Richie PORTE" }
{ "Name" : "Simon GERRANS" }
{ "Name" : "Geraint THOMAS" }
{ "Name" : "Michael MATTHEWS" }
{ "Name" : "Daniel MARTIN" }
{ "Name" : "Ian STANNARD" }
{ "Name" : "Ben SWIFT" }

Altering data – the rest of CRUD operations

The operations that modify the contents of our database are represented by three methods:

  • Add: insert()
  • Delete: remove()
  • Modify: update()

For example, in the first case, we can express the insertion in a JavaScript variable and use that variable to pass it to to the insert() method:

> var newCyclist = {
... "Rank" : 139,
... "Name": "Lawson CRADDOCK",
... "Nation": "United States",
... "Team" : "CPT",
... "Age*": 24,
... "Points": 208
... }
> db.Ranking16.insert(newCyclist)
WriteResult({ "nInserted" : 1 })

We can see that there's an extra line from Mongo, indicating that a new document has been inserted (also, an array can be passed for a multiple insertion).

Besides, there's another important factor we already mentioned, which has to do with flexibility. Let's say we want to include another important runner from the US, such as Tejay Van Garderen, but in this case, we have some extra information related to the details of his nation, such as State (Washington) and City (Tacoma) he was born in. We want to include this information in the collection.

We will proceed in the same way, only assigning to the Nation value a complex value made of three fields: Name, State, and City. We can proceed in exactly the same way as earlier but with these changes included.

After the process, a look at the content will show the information structure inserted, along with its new values:

> newCyclist
{
  "Rank" : 139,
  "Name" : "Lawson CRADDOCK",
  "Nation" : {
    "Name" : "United States",
    "State" : "Washington",
    "City" : "Tacoma"
  },
  "Team" : "CPT",
  "Age*" : 24,
  "Points" : 208
}

The insertion went fine, but I made a (copy/paste) mistake and didn't change the name of the runner properly (the rest of the data is fine, but the name has to be modified). So, we can use the update() command in order to achieve this goal.

It's simple; we just have to localize the target document as the first parameter and indicate the new data as the second parameter:

> db.Ranking16.update({ "Name": "Lawson CRADDOCK" }, { "Name" : "Tejay VAN GARDEREN"})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })

The results: one document found and one modified.

Text indexes

Now, we want to list all the cyclists from the United States in our collection. MongoDB provides an interesting possibility: create a text index to be used later in text searches. At creation time, we can indicate which text fields (along with their data types) need to be included in the index; for example, take a look at the following:

> db.Ranking16.createIndex( { Name: "text", Nation: "text"} )
{
  "createdCollectionAutomatically" : false,
  "numIndexesBefore" : 1,
  "numIndexesAfter" : 2,
  "ok" : 1
}

With the previous code, we have indexed two fields, and the total number of indexes now is two (remember that the _id index is created automatically). This is perfect for practical usage, since we now can write the following:

> db.Ranking16.find( { $text: { $search: "Tejay Lawson" } }).pretty()
{
  "_id" : ObjectId("572cdb8c03caae1d2e97b8f1"),
  "Rank" : 52,
  "Name" : "Tejay VAN GARDEREN",
  "Nation" : {
    "Name" : "United States",
    "State" : "Washington",
    "City" : "Tacoma"
  },
  "Team" : "BMC",
  "Age*" : 28,
  "Points" : 437
}
{
  "_id" : ObjectId("572cdcc103caae1d2e97b8f2"),
  "Rank" : 139,
  "Name" : "Lawson CRADDOCK",
  "Nation" : "United States",
  "Team" : "CPT",
  "Age*" : 24,
  "Points" : 308
}

Note that the search was made without indicating the position of the string in the field. The output shows both documents with their different data structures for the Nation field.

If we don't have any indexes, it is also possible to use other operators for search, such as $in, which uses the following syntax prototype:

{ field: { $in: [<value1>, <value2>, ... <valueN> ] } }

So, we can rewrite a similar query containing all cyclists from France and Spain as follows:

> db.Ranking16.find( {"Nation": { $in: ["France", "Spain"] }}, {"_id":0, "Rank":0, "Points":0, "Age*":0, "Team":0})
{ "Name" : "Thibaut PINOT", "Nation" : "France" }
{ "Name" : "Alberto CONTADOR VELASCO", "Nation" : "Spain" }
{ "Name" : "Alejandro VALVERDE BELMONTE", "Nation" : "Spain" }
{ "Name" : "Jon IZAGUIRRE INSAUSTI", "Nation" : "Spain" }
{ "Name" : "Arnaud DEMARE", "Nation" : "France" }
{ "Name" : "Bryan COQUARD", "Nation" : "France" }
{ "Name" : "Nacer BOUHANNI", "Nation" : "France" }
{ "Name" : "Samuel SANCHEZ GONZALEZ", "Nation" : "Spain" }
{ "Name" : "Romain BARDET", "Nation" : "France" }
{ "Name" : "Julian ALAPHILIPPE", "Nation" : "France" }

For deletion, the procedure is pretty straightforward. Just remember that deletions affect one or more documents depending on the criteria defined for the operation. In this case, remember that there is no equivalent to the cascade behavior we might configure in the relational model.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.94.190