Introducing the write operations

In MongoDB, we have three kinds of write operations: insert, update, and remove. To run these operations, MongoDB provides three interfaces: db.document.insert, db.document.update, and db.document.remove. The write operations in MongoDB are targeted to a specific collection and are atomic on the level of a single document.

The write operations are as important as the read operations when we are modeling documents in MongoDB. The atomicity in a level of a single document can determine whether we embed documents or not. We will go into this in a little more detail in Chapter 7, Scaling, but the activity of choosing a shard key will be decisive in whether we write an operation's performance because, depending on the key choice, we will write in one or many shards.

Also, another determining factor in a writing operations' performance is related to the MongoDB physical model. There are many recommendations given by 10gen but let's focus on those that have the greatest impact on our development. Due to MongoDB's update model, which is based on random I/O operations, it is recommended that you use solid state discs, or SSD. The solid state disk has superior performance compared to spinning disks, in terms of random I/O operations. Even though spinning disks are cheaper, and the cost to scale an infrastructure based on this kind of hardware is not that expensive either, the use of SSDs or increasing the RAM is still more effective. Studies on this subject show us that SSDs outperform spinning disks by 100 times for random I/O operations.

Another important thing to understand about write operations is how the documents are actually written on disk by MongoDB. MongoDB uses a journaling mechanism to write operations, and this mechanism uses a journal to write the change operation before we write it in the data files. This is very useful, especially when we have a dirty shutdown. MongoDB will use the journal files to recover the database state to a consistent state when the mongod process is restarted.

As stated in Chapter 2, Data Modeling with MongoDB, the BSON specification allows us to have a document with the maximum size of 16 MB. Since its 2.6 version, MongoDB uses a space allocation strategy for a record, or document, named "power of two sized allocation." As its name suggests, MongoDB will allocate to each document a size in bytes that is its size to the power of two (for example, 32, 64, 128, 256, 512, …), considering that the minimum size of a document is 32 bytes. This strategy allocates more space than the document really needs, giving it more space to grow.

Inserts

The insert interface is one of the possible ways of creating a new document in MongoDB. The insert interface has the following syntax:

db.collection.insert(
   <document or array of documents>,    
   {      
      writeConcern: <document>,      
      ordered: <boolean>    
   }
)

Here:

  • document or array of documents is either a document or an array with one or many documents that should be created in the targeted collection.
  • writeConcern is a document expressing the write concern.
  • ordered should be a Boolean value, which if true will carry out an ordered process on the documents of the array, and if there is an error in a document, MongoDB will stop processing it. Otherwise, if the value is false, it will carry out an unordered process and it will not stop if an error occurs. By default, the value is true.

In the following example, we can see how an insert operation can be used:

db.customers.insert({
   username: "customer1", 
   email: "[email protected]", 
   password: hex_md5("customer1paswd")
})

As we did not specify a value for the _id field, it will be automatically generated with a unique ObjectId value. The document created by this insert operation is:

{ 
   "_id" : ObjectId("5487ada1db4ff374fd6ae6f5"), 
   "username" : "customer1", 
   "email" : "[email protected]", 
   "password" : "b1c5098d0c6074db325b0b9dddb068e1" 
}

As you observed in the first paragraph of this section, the insert interface is not the only way to create new documents in MongoDB. By using the upsert option on updates, we could also create new documents. Let's go into more detail regarding this now.

Updates

The update interface is used to modify previous existing documents in MongoDB, or even to create new ones. To select which document we would like to change, we will use a criterion. An update can modify the field values of a document or an entire document.

An update operation will modify only one document at a time. If the criterion matches more than one document, then it is necessary to pass a document with a multi parameter with the true value to the update interface. If the criteria matches no document and the upsert parameter is true, a new document will be created, or else it will update the matching document.

The update interface is represented as:

db.collection.update(
   <query>,
   <update>,
   {      
      upsert: <boolean>,      
      multi: <boolean>,      
      writeConcern: <document>    
   }
)

Here:

  • query is the criteria
  • update is the document containing the modification to be applied
  • upsert is a Boolean value that, if true, creates a new document if the criteria does not match any document in the collection
  • multi is a Boolean value that, if true, updates every document that meets the criteria
  • writeConcern is a document expressing the write concern

Using the document created in the previous session, a sample update would be:

db.customers.update(
   {username: "customer1"}, 
   {$set: {email: "[email protected]"}}
)

The modified document is:

{ 
   "_id" : ObjectId("5487ada1db4ff374fd6ae6f5"), 
   "username" : "customer1", 
   "email" : "[email protected]", 
   "password" : "b1c5098d0c6074db325b0b9dddb068e1"
}

The $set operator allows us to update only the email field of the matched documents.

Otherwise, you may have this update:

db.customers.update(
   {username: "customer1"}, 
   {email: "[email protected]"}
)

In this case, the modified document would be:

{ 
   "_id" : ObjectId("5487ada1db4ff374fd6ae6f5"), 
   "email" : "[email protected]" 
}

That is, without the $set operator, we modify the old document with the one passed as a parameter on the update. Besides the $set operator, we also have other important update operators:

  • $inc increments the value of a field with the specified value:
    db.customers.update(
       {username: "johnclay"}, 
       {$inc: {"details.age": 1}}
    )
    

    This update will increment the field details.age by 1 in the matched documents.

  • $rename will rename the specified field:
    db.customers.update(
       {email: "[email protected]"}, 
       {$rename: {username: "login"}}
    )
    

    This update will rename the field username to login in the matched documents.

  • $unset will remove the field from the matched document:
    db.customers.update(
       {email: "[email protected]"}, 
       {$unset: {login: ""}}
    )
    

    This update will remove the login field from the matched documents.

As the write operations are atomic at the level of a single document, we can afford to be careless with the use of the preceding operators. All of them can be safely used.

Write concerns

Many of the discussions surrounding non-relational databases are related to the ACID concept. We, as database professionals, software engineers, architects, and developers, are fairly accustomed to the relational universe, and we spend a lot of time developing without caring about ACID matters.

Nevertheless, we should understand by now why we really have to take this matter into consideration, and how these simple four letters are essential in the non-relational world. In this section, we will discuss the letter D, which means durability, in MongoDB.

Durability in database systems is a property that tells us whether a write operation was successful, whether the transaction was committed, and whether the data was written on non-volatile memory in a durable medium, such as a hard disk.

Unlike relational database systems, the response to a write operation in NoSQL databases is determined by the client. Once again, we have the possibility to make a choice on our data modeling, addressing the specific needs of a client.

In MongoDB, the response of a successful write operation can have many levels of guarantee. This is what we call a write concern. The levels vary from weak to strong, and the client determines the strength of guarantee. It is possible for us to have, in the same collection, both a client that needs a strong write concern and another that needs a weak one.

The write concern levels that MongoDB offers us are:

  • Unacknowledged
  • Acknowledged
  • Journaled
  • Replica acknowledged

Unacknowledged

As its name suggests, with an unacknowledged write concern, the client will not attempt to respond to a write operation. If this is possible, only network errors will be captured. The following diagram shows that drivers will not wait that MongoDB acknowledge the receipt of write operations:

Unacknowledged

In the following example, we have an insert operation in the customers collection with an unacknowledged write concern:

db.customers.insert(
{username: "customer1", email: "[email protected]", password: hex_md5("customer1paswd")}, 
{writeConcern: {w: 0}}
)

Acknowledged

With this write concern, the client will have an acknowledgement of the write operation, and see that it was written on the in-memory view of MongoDB. In this mode, the client can catch, among other things, network errors and duplicate keys. Since the 2.6 version of MongoDB, this is the default write concern.

As you saw earlier, we can't guarantee that a write on the in-memory view of MongoDB will be persisted on the disk. In the event of a failure of MongoDB, the data in the in-memory view will be lost. The following diagram shows that drivers wait MongoDB acknowledge the receipt of write operations and applied the change to the in-memory view of data:

Acknowledged

In the following example, we have an insert operation in the customers collection with an acknowledged write concern:

db.customers.insert(
{username: "customer1", email: "[email protected]", password: hex_md5("customer1paswd")}, 
{writeConcert: {w: 1}}
)

Journaled

With a journaled write concern, the client will receive confirmation that the write operation was committed in the journal. Thus, the client will have a guarantee that the data will be persisted on the disk, even if something happens to MongoDB.

To reduce the latency when we use a journaled write concern, MongoDB will reduce the frequency in which it commits operations to the journal from the default value of 100 milliseconds to 30 milliseconds. The following diagram shows that drivers will wait MongoDB acknowledge the receipt of write operations only after committing the data to the journal:

Journaled

In the following example, we have an insert in the customers collection with a journaled write concern:

db.customers.insert(
{username: "customer1", email: "[email protected]", password: hex_md5("customer1paswd")}, 
{writeConcern: {w: 1, j: true}}

)

Replica acknowledged

When we are working with replica sets, it is important to be sure that a write operation was successful not only in the primary node, but also that it was propagated to members of the replica set. For this purpose, we use a replica acknowledged write concern.

By changing the default write concern to replica acknowledged, we can determine the number of members of the replica set from which we want the write operation confirmation. The following diagram shows that drivers will wait that MongoDB acknowledge the receipt of write operations on a specified number of the replica set members:

Replica acknowledged

In the following example, we will wait until the write operation propagates to the primary and at least two secondary nodes:

db.customers.insert(
{username: "customer1", email: "[email protected]", password: hex_md5("customer1paswd")}, 
{writeConcern: {w: 3}}
)

We should include a timeout property in milliseconds to avoid that a write operation remains blocked in a case of a node failure.

In the following example, we will wait until the write operation propagates to the primary and at least two secondary nodes, with a timeout of three seconds. If one of the two secondary nodes from which we are expecting a response fails, then the method times out after three seconds:

db.customers.insert(
{username: "customer1", email: "[email protected]", password: hex_md5("customer1paswd")}, 
{writeConcern: {w: 3, wtimeout: 3000}}
)

Bulk writing documents

Sometimes it is quite useful to insert, update, or delete more than one record of your collection. MongoDB provides us with the capability to perform bulk write operations. A bulk operation works in a single collection, and can be either ordered or unordered.

As with the insert method, the behavior of an ordered bulk operation is to process records serially, and if an error occurs, MongoDB will return without processing any of the remaining operations.

The behavior of an unordered operation is to process in parallel, so if an error occurs, MongoDB will still process the remaining operations.

We also can determine the level of acknowledgement required for bulk write operations. Since its 2.6 version, MongoDB has introduced new bulk methods with which we can insert, update, or delete documents. However, we can make a bulk insert only by passing an array of documents on the insert method.

In the following example, we make a bulk insert using the insert method:

db.customers.insert(
[
{username: "customer3", email: "[email protected]", password: hex_md5("customer3paswd")}, 
{username: "customer2", email: "[email protected]", password: hex_md5("customer2paswd")}, 
{username: "customer1", email: "[email protected]", password: hex_md5("customer1paswd")}
]
)

In the following example, we make an unordered bulk insert using the new bulk methods:

var bulk = db.customers.initializeUnorderedBulkOp();
bulk.insert({username: "customer1", email: "[email protected]", password: hex_md5("customer1paswd")});
bulk.insert({username: "customer2", email: "[email protected]", password: hex_md5("customer2paswd")});
bulk.insert({username: "customer3", email: "[email protected]", password: hex_md5("customer3paswd")});
bulk.execute({w: "majority", wtimeout: 3000});

We should use all the power tools MongoDB provides us with, but not without paying all our possible attention. MongoDB has a limit of executing a maximum of 1,000 bulk operations at a time. So, if this limit is exceeded, MongoDB will divide the operations into groups of a maximum of 1,000 bulk operations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.198.83