15

Fault Tolerance and High Availability

In this chapter, we will try to fit in the information that we didn’t manage to discuss in the previous chapters, and we will place emphasis on some other topics. Throughout the previous 14 chapters, we have gone all the way from covering the basic concepts of effective querying, to administration and data management, to scaling and high-availability (HA) concepts.

We will discuss how our application design should be accommodating and proactive with regard to our database needs. We will go over patterns and anti-patterns for schema design.

Day-to-day operations are another area that we will discuss, including tips and best practices that can help us to avoid nasty surprises down the line.

In light of the continued attempts by ransomware to infect and hold MongoDB servers hostage, we will offer more tips on security.

Finally, we will try to sum up the advice that’s been given in a series of checklists that should be followed to ensure that the best practices are properly set up and followed.

This chapter covers the following topics:

  • Application design
  • Elevating operations
  • Boosting security

Application design

In this section, we will describe some useful tips for application design that we did not cover or emphasize enough in the previous chapters.

Schema-less doesn’t mean schema design-less

A big part of MongoDB’s success can be attributed to the increased popularity of object-relational maps (ORMs)/object document maps (ODMs). Especially with languages such as JavaScript and the MongoDB, Express, Angular, and Node (MEAN) stack, the developer can use JavaScript from the frontend (Angular/Express) to the backend (Node.js) to the database (MongoDB). This is frequently coupled with an ODM that abstracts away the internals of the database, mapping collections to Node.js models.

The major advantage is that developers don’t need to fiddle with the database schema design, as this is automatically provided by the ODM. The downside is that database collections and schema designs are left up to the ODM, which does not have the business domain knowledge of different fields and access patterns.

In the case of MongoDB and other NoSQL-based databases, this boils down to making architectural decisions based not only on immediate needs but also on what needs to be done down the line. On an architectural level, this may mean that instead of a monolithic approach, we can combine different database technologies for our diverse and evolving needs by using a graph database for graph-related querying, a relational database for hierarchical, unbounded data, and MongoDB for JavaScript Object Notation (JSON) retrieval, processing, and storage.

In fact, many of MongoDB’s successful use cases come from the fact that it’s not being used as a one-size-fits-all solution, but only for the use cases that make sense.

Design patterns

Relational database management system (RDBMS) schema design has evolved and matured over the decades. The third normal form and Boyce–Codd normal form (BCNF) are primarily used to model data in a relational database.

MongoDB schema design has similarly evolved in the past decade, and there are several patterns and anti-patterns that can guide us when designing new databases or migrating existing workloads. We are going to examine both of these in the following sections.

Attribute pattern

The attribute pattern is commonly used when we have sparse attribute values across several logically grouped fields. We restructure a field/value pair into a subdocument with {key:field , val: value}.

For example, have a look at the following document with price information for a fan across the United States (US), the United Kingdom (UK), and Germany:

{
   Product: 'fan',
   Price_uk: '30',
   price_us: '35',
   Price_de: '40'
}
Would be restructured as:
{
   Product: 'fan',
   Prices: [
{   Country: 'uk', price: 30 },
{   Country: 'us', price: 35 },
{   Country: 'de', price: 40 }
]
}

This way, we can now create indexes for prices and countries and query them across all of our products.

Tree traversal pattern

Storing hierarchical, tree-like data is a common use case for any database. E-commerce classification of products by category and subcategory is probably the earliest application of hierarchical data storage in a database.

Data that follows a hierarchical structure can be stored in a traditional relational database with a standardized process using database and set theory, as explained in the bibliography and articles such as Mike Hillyer’s blog post: http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/.

Storing a tree structure in a document-oriented database such as MongoDB is not much different. In this section, we will explain the fundamental building blocks that we can use to store and query our hierarchical data.

We have four different ways to traverse through a tree structure.

First of all, we can store a reference to the child document or documents. This can be as simple as storing the objectId value of the documents that are children to the current node.

Second, we can store a reference to the parent document. This can be as simple as storing the objectId value of the document that is parent to the current node.

These two methods are the simplest way to store parent-child references. Traversing the tree top down or bottom up can then be achieved by using the parent/child reference until there is none in the root and leaf nodes, respectively.

Third, we can store an array of ancestors for each document. Arrays are ordered by insertion order in MongoDB, so we need to insert the elements in order from the root to the leaf node. The last element is the parent node to the current document, and every element to the left of it in the array is its parent.

Finally, we can store the full list of ancestors in a string field, separated by the delimiter of our choice. This is similar to the third option, with the advantage that we can index and query using a regular expression (regex) for any category, from root to leaf.

For example, we may have a fan document under the following categorization:

home and kitchen -> kitchen and home appliances -> heating, cooling and air quality -> fans -> Desk fans

Then, using the four methods here, we would have the following:

  • { parent: 'heating, cooling and air quality' }
  • { children: ['Desk fans'] }
  • { ancestors: ['home and kitchen', 'kitchen and home appliances', 'heating, cooling and air quality'] }
  • { ancestors: 'homeandkitchen.kitchenandhomeappliances.heatingcoolingandairquality' }

In all of these methods, we would be referencing the ObjectId values of the documents that belong to the given category.

Polymorphic pattern

Polymorphism in object-oriented programming (OOP) refers to the ability of a language to reuse the same interface for multiple heterogeneous underlying data types.

In MongoDB, polymorphism is omnipresent because of the flexibility it provides. Each document in a collection can have different fields and there is no restriction imposed by the database on the values that these fields may have.

This level of flexibility can result in increased maintenance costs for the application developers. We can restrict this flexibility by using JSON schemas, as we learned in previous chapters.

The polymorphic pattern attempts to keep the flexibility and deal with the complexity at the application level. We can use a field’s value as a guide to indicate the structure of the document in our application.

For example, if we have a 'vehicles' collection, we can have a field named 'type' that can have a set of values such as 'car', 'bike', or 'motorcycle'.

The application code can then be configured to expect different fields from a car (for example, mileage) than what it would expect from a bike. We can use enumerators (enums) or similar programming language structs to constrain the possible values of the type field.

Schema versioning pattern

MongoDB is schema-less, but as we have pointed out a few times already, we need to design our data storage in databases, collections, and documents. The schema versioning pattern addresses the need to change our schema over time.

It is one of the simplest patterns. All we need to do is add a field to declare the version of our schema. It can be, for example, version or schema_version. We start with {version: 1} and incrementally increase the version every time that we change the document schema by adding, modifying, or removing one or more fields. Our application code can adapt to the different document versions and use different business logic to process our data.

This pattern is useful when we need to change our schema without any downtime to adapt the structure of our documents to the new version.

Computed pattern

The computed pattern is used when we have many reads performing repeated calculations over a set of fields in a document. For example, if we have a user dashboard for a banking application that integrates with external services, we may have a counter of the number of active integrations. Instead of querying every time that we load the user profile for the subdocuments corresponding to the active integrations, we can calculate it at the time that we add or remove an integration and store it in a separate field.

Subset pattern

The subset pattern aims to strike a balance between the core MongoDB design direction that we should keep data that gets queried together in the same document, keeping unnecessary data in a single document.

We have already learned in previous chapters that we should not embed an unbounded number of fields or subdocuments in a single document because we will end up reaching the global document size limit of 16 megabytes (MB).

We should be keeping data that gets queried together in the same document to avoid having to perform multiple reads across different collections to get our data.

What the subset pattern instructs us is that we should be trying to keep only the data that we need to query together in the same document. For example, if we have a document to store movie ratings for each user, we should only store the necessary metadata that we would display, along with each movie rating on the page, and store the extended metadata for each rating in a separate collection. This way, we can fetch all movie ratings in a single read and only fetch the extended metadata for each rating when the user clicks through to a different page to view the details of each rating.

This way, we can reduce our working set for the user collection so that it can easier fit into memory and avoid page swaps that would result in database slowdown.

Extended reference pattern

The extended reference pattern is similar and sometimes complementary to the subset pattern. Instead of separating our data fields based on where we need to access them, we are duplicating the data fields that we most need in order to avoid multiple queries across different collections.

Using the same example of users and movie ratings, we could duplicate the fields that we most frequently access from a movie rating into the users’ collection to avoid querying the movie rating collection.

Similar to the subset pattern, we don’t embed every single field but only the fields that we need to access at the same time as the user.

The downside of duplicating data is that we need to make sure that we update it in multiple places when we need to. Depending on the use case, this can be an advantage if we need to keep historical data with a snapshot at the time of creating the duplicating data. In that case, we would be updating the primary reference field and keeping copies of the data intact for archiving and reporting purposes.

Bucket pattern

The bucket pattern is most commonly used in Internet of Things (IoT) applications. For example, if we have a sensor reading values every minute, the most straightforward design is to store every reading in one document. This will quickly result in millions of tiny documents and, quite possibly, lots of data duplication.

We can instead use the bucket pattern and store one document for each extended time period—say, 1 day—reducing the number of documents to 1/1,440 minutes in a day. We can combine this with an index in the sensor_id field and archive older data as this becomes less probable to query in real time.

Outlier pattern

As with many of the other patterns discussed already, the outlier pattern is not unique to MongoDB. This pattern aims to address the commonly observed Pareto principle of 80/20.

Essentially, we should code for edge cases but not drive our design by outliers at the expense of a more efficient solution for the overwhelming majority.

For example, if we store movie ratings per user in an array, we may end up with very few users that have rated thousands of movies each. The vast majority of users are only rating a few movies each, and this is what should guide our design toward using an array to store the movie ratings.

We can code for the edge case of the few 'super-reviewers' by adding an extra field in the document with a descriptive name such as 'super_reviewer' and store the overflowing ratings in a separate collection.

Approximation pattern

More of a statistics trick than a pattern itself, the approximation pattern uses the law of large numbers to converge a randomly generated value with the actual value over time.

An example can be updating page views for a page. We can update page views by incrementing the page_views field by 1 each time a user visits the page. This will result in an accurate calculation of page views at any given time at the expense of having to update the document 1,000 times for every 1,000 page views.

We can also use rand(0,999) to generate a value between 0 and 999 inclusive and only update the counter by 1.000 when the value is 0, 1,337, or any other value of our preference. This will statistically only update the value every 1 in 1,000 times and over a long time will converge with the actual value.

This trick offloads a lot of the database workload at the expense of having an accurate page view count at any given time.

MongoDB published this pattern along with the rest of the patterns that we discussed in this section. More information is available in the link at the end of this chapter.

Design anti-patterns

MongoDB recommends avoiding the following design anti-patterns. Similar to the design patterns, these are dependent on the use case, and while we should generally avoid the practices outlined next, we should use our judgment according to our specific use case.

Massive arrays anti-pattern

Arrays are commonly used in MongoDB to store multiple values related to one field. Storing more than a few hundred values in one array can start having an impact in terms of performance, especially if we are using an index over that field. Storing an unbounded number of elements in an array should be avoided.

The recommended solution is to use the Extended Reference pattern and break down the array’s elements into multiple parts.

Unnecessary indexes anti-pattern

Creating indexes is straightforward, and since it happens in the background, it will not affect our operations. Indexes speed up reads and by design slow down updates and inserts, as every update and insert needs to be reflected in the index as well. Indexes also take up space, which may become an issue when we have too many indexes.

The generic recommendation by MongoDB is to have fewer than 50 indexes per collection and to avoid wildcard indexes unless there is both a need and planning ahead for their needs.

We should design the indexes that we will use in our collections as much as we design the schema and data structures. The recommendations in Chapter 8, Indexing, should serve as guidance to design and implement the correct indexes over our data.

Bloated documents anti-pattern

One of MongoDB’s core principles is to store in the same place data that is accessed together—that is, embed in the same document data that we will access together to avoid having to perform multiple queries.

Following this advice down to the T, we can end up with bloated documents that end up being several MB in size. This can be a problem because of the global document size limit of 16 MB.

Further to that, this slows down the database as every operation needs to fetch, update, and delete several MB instead of a few kilobytes (KB) at a time. The solution to this anti-pattern is to design ahead and only store in the same document data that is accessed together, resisting the temptation to store all our data in a single document and a single collection.

Case-insensitive queries without matching indexes anti-pattern

A common issue with indexes and queries is trying to query our data in a case-insensitive way when our indexes are case-sensitive. The index cannot be used and MongoDB resorts to a full collection scan, slowing down the query.

The solution to this problem is to create a case-insensitive index and/or to use a collation strength of 1 or 2 (the default is 3, which is case-sensitive) in the collection level.

Read performance optimization

In this section, we will discuss some tips for optimizing read performance. Read performance is directly correlated to the number of queries and their complexity. Performing fewer queries in a schema without complex nested data structures and arrays will generally result in better read performance. However, many times, optimizing for read performance can mean that the write performance will degrade. This is something to keep in mind and continuously measure when we are making performance optimizations in MongoDB.

Consolidating read querying

We should aim to have as few queries as possible. This can be achieved by embedding information into subdocuments instead of having separate entities. This can lead to an increased write load, as we have to keep the same data points in multiple documents and maintain their values everywhere when they change in one place.

The design considerations here are noted as follows:

  • The read performance benefits from data duplication/denormalization
  • The data integrity benefits from data references (DBRef or in-application code, using an attribute as a foreign key (FK))

We should denormalize especially if our read/write ratio is too high (our data rarely changes values, but it gets accessed several times in between), if our data can afford to be inconsistent for brief periods of time, and—most importantly—if we absolutely need our reads to be as fast as possible and are willing to pay the price in consistency/write performance.

The most obvious candidates for fields that we should denormalize (embed) are dependent fields. If we have an attribute or a document structure that we don’t plan to query on its own, but only as part of a contained attribute/document, then it makes sense to embed it, rather than have it in a separate document/collection.

Using our MongoDB books example, a book can have a related data structure that refers to a review from a reader of the book. If our most common use case is showing a book along with its associated reviews, then we can embed reviews into the book document.

The downside to this design is that when we want to find all of the book reviews by a user, this will be costly, as we will have to iterate all of the books for the associated reviews. Denormalizing users and embedding their reviews can be a solution to this problem.

A counterexample is data that can grow unbounded. In our example, embedding reviews along with heavy metadata can lead to an issue if we hit the 16 MB document size limit. A solution is to distinguish between data structures that we expect to grow rapidly and those that we don’t and to keep an eye on their sizes through monitoring processes that query our live dataset at off-peak times and report on attributes that may pose a risk down the line.

Note

Don’t embed data that can grow unbounded.

When we embed attributes, we have to decide whether we will use a subdocument or an enclosing array.

When we have a unique identifier (UID) to access the subdocument, we should embed it as a subdocument. If we don’t know exactly how to access it or we need the flexibility to be able to query for an attribute’s values, then we should embed it in an array.

For example, with our books collection, if we decide to embed reviews into each book document, we have the following two design options:

  • Here’s the code for a book document with an array: 

    {

    Isbn: '1001',

    Title: 'Mastering MongoDB',

    Reviews: [

    { 'user_id': 1, text: 'great book', rating: 5 },

    { 'user_id': 2, text: 'not so bad book', rating: 3 },

    ]

    }

  • Here’s the code for a book with an embedded document:

    {

    Isbn: '1001',

    Title: 'Mastering MongoDB',

    Reviews:

    { 'user_id': 1, text: 'great book', rating: 5 },

    { 'user_id': 2, text: 'not so bad book', rating: 3 },

    }

The array structure has the advantage that we can directly query MongoDB for all of the reviews with a rating value greater than 4 through the embedded array reviews.

Using the embedded document structure, on the other hand, we can retrieve all of the reviews the same way that we would when using the array, but if we want to filter them, it has to be done on the application side, rather than on the database side.

Defensive coding

More of a generic principle, defensive coding refers to a set of practices and software designs that ensures the continuing functionality of a piece of software under unforeseen circumstances.

It prioritizes code quality, readability, and predictability. Readability was best explained by John F. Woods in his comp.lang.c++ post, on September 24, 1991:

Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live. Code for readability.

Our code should be readable and understandable by humans, as well as by machines. With code quality metrics, as derived by static analysis tools, code reviews, and bugs reported/resolved, we can estimate the quality of our code base and aim for a certain threshold at each sprint, or when we are ready to release. Code predictability, on the other hand, means we should always expect results in unexpected input and program states.

These principles apply to every software system. In the context of system programming using MongoDB, there are some extra steps that we must take to ensure that code predictability and—subsequently—quality are measured by the number of resulting bugs.

MongoDB limitations that will result in a loss of database functionality should be monitored and evaluated on a periodic basis, as follows:

  • Document size limit: We should keep an eye on collections in which we expect to have documents growing the most, running a background script to examine document sizes and alert us if we have documents approaching the limit (16 MB), or if the average size has grown significantly since the last check.
  • Data integrity checks: If we are using denormalization for read optimization, then it’s a good practice to check for data integrity. Through a software bug or a database error, we may end up with inconsistent duplicate data among collections.
  • Schema checks: If we don’t want to use the document validation feature of MongoDB, but rather we want a lax document schema, it’s still a good idea to periodically run scripts to identify fields that are present in our documents and their frequencies. Then, along with relative access patterns, we can identify whether these fields can be identified and consolidated. This is mostly useful if we are ingesting data from another system wherein data input changes over time, which may result in a wildly varying document structure at our end.
  • Data storage checks: This mostly applies when using Memory Mapped Storage Engine version 1 (MMAPv1), where document padding optimization can help performance. By keeping an eye on document size relative to its padding, we can make sure that our size-modifying updates won’t incur a move of the document in physical storage.

These are the basic checks that we should implement when defensively coding for our MongoDB application. On top of this, we need to defensively code our application-level code to make sure that when failures occur in MongoDB, our application will continue operating—perhaps with degraded performance, but still operational.

An example of this is replica set failover and failback. When our replica set primary fails, there is a brief period to detect this failure and the new primary is elected, promoted, and operational. During this brief period, we should make sure that our application continues to operate in read-only mode, instead of throwing a 500 error code. In most cases, electing a new primary is done in seconds, but in some cases, we may end up in the minority end of a network partition and unable to contact a primary for a long period of time. Similarly, some secondaries may end up in a recovering state (for example, if they fall way behind the primary in replication); our application should be able to pick a different secondary in this case.

Designing for secondary access is one of the most useful examples of defensive coding. Our application should weigh between fields that can only be accessed by the primary to ensure data consistency and fields that are okay to be updated in near real time, instead of in real time, in which case we can read these from secondary servers. By keeping track of replication lag for our secondaries by using automated scripts, we can have a view of our cluster’s load and how safe it is to enable this functionality.

Another defensive coding practice is to always perform writes with Journaling on. Journaling helps to recover from server crashes and power failures.

Finally, we should aim to use replica sets as early as possible. Other than the performance and workload improvements, they help us to recover from server failures.

Monitoring integrations

All of this adds up to greater adoption of monitoring tools and services. As much as we can script some of them, integrating with cloud and on-premises monitoring tools can help us to achieve more in a smaller amount of time.

The metrics that we keep a track of should do one of the following:

  • Detect failures: Failure detection is a reactive process, where we should have clear protocols in place for what happens when each of the failure-detection flags goes off. For example, what should the recovery steps be if we lose a server, a replica set, or a shard?
  • Prevent failures: Failure prevention, on the other hand, is a proactive process, designed to help us catch problems before they become a potential source of failure in the future. For example, central processing unit (CPU)/storage/memory usage should be actively monitored with yellow and red thresholds, and clear processes should be put in place as to what we should do in the event that we reach either threshold.

Application design and best practices can help us get ahead of the game. Both are a continuous journey that we need to embark on from the greenfield stage all the way to maintenance and eventual decommissioning of the project. In the next section, we will focus on a few operational tips that we can use to be proactive and foresee any potential performance issues.

Elevating operations

When connecting to our production MongoDB servers, we want to make sure that our operations are as lightweight as possible (and are certainly non-destructive) and do not alter the database state in any sense.

Two useful utilities that we can chain to our queries are shown here:

> db.collection.find(query).maxTimeMS(999)

Our query instance will only take up to 999 milliseconds (ms), and will then return an exceeded time limit error, as follows:

> db.collection.find(query).maxScan(1000)

Our query instance will examine 1000 documents at the most, in order to find results and then return (no error raised).

Whenever we can, we should bind our queries by time or document result size to avoid running unexpectedly long queries that may affect our production database. A common reason for accessing our production database is troubleshooting degraded cluster performance. This can be investigated via cloud monitoring tools, as we described in previous chapters.

The db.currentOp() command, through the MongoDB shell, will give us a list of all current operations. We can then isolate the ones that have large .secs_running values and identify them through the .query field.

If we want to kill an in-progress operation that takes a long time, we need to note the value of the .opid field and pass it on db.killOp(<opid>).

Finally, it’s important to recognize (from an operational standpoint) that everything may go wrong. We must have a backup strategy in place that is implemented consistently. Most importantly, we should practice restoring from backup to make sure that it works as intended.

We should be extremely cautious when issuing any command on a live cluster and dry run all operations in a testing/staging environment that mirrors the production environment as closely as possible. The last part of this chapter will provide a few tips to boost security from a holistic perspective.

Boosting security

After the recent waves of ransomware that were locking down unsecured MongoDB servers, asking for ransom payments in cryptocurrency from the administrators to unlock the MongoDB servers, many developers have become more security-conscious. Security is one of the items on a checklist that we, as developers, may not prioritize highly enough, due to the optimistic belief that it won’t happen to us. The truth is, in the modern internet landscape, everyone can be a target of automated or directed attacks, so security should always be taken into account, from the early stages of the design to after production deployment.

Enabling security by default

Every database (other than local development servers, perhaps) should be set up with the following in the mongod.conf file:

security:
    authorization: enabled

Note

Secure Sockets Layer (SSL) should always be enabled, as we described in the relevant chapter: Chapter 9, Monitoring, Backup, and Security.

Access should be restricted to only allow communication between application servers and MongoDB servers, and only in the interfaces that are required. Using bind_ip, we can force MongoDB to listen to specific interfaces, instead of the default binding to localhost behavior, as follows:

net:
   bindIp: 127.0.0.1, 10.10.0.10,10.10.0.20

Isolating our servers

We should secure our infrastructure perimeter with Amazon Web Services (AWS) Virtual Private Cloud (VPC) or the equivalent from the cloud provider of our choice. As an extra layer of security, we should isolate our servers in a cloud of their own, only allowing external connections to reach our application servers and never allowing them to directly connect to our MongoDB servers, as illustrated in the following diagram:

Figure 15.1: Cluster defense in depth (DiD)

Figure 15.1: Cluster defense in depth (DiD)

We should invest in role-based authorization. Security lies not only in protecting against data leaks caused by external actors but also in making sure that internal actors have the appropriate levels of access to our data. Using role-based authorization at the MongoDB level, we can make sure that our users have the appropriate levels of access.

Consider Enterprise Edition for large deployments. Enterprise Edition offers some convenient features concerning security and more integrations with well-known tools and should be evaluated for large deployments, with an eye on changing needs as we transition from a single replica set to an enterprise-complex architecture.

Checklists

Operations require the completion of many tasks and complexity. A good practice is to keep a set of checklists with all of the tasks that need to be performed and their order of significance. This will ensure that we don’t let something slip through. A deployment and security checklist, for example, could look like this:

  • Hardware:
    • Storage: How much disk space is needed per node? What is the growth rate?
    • Storage technology: Do we need a solid-state drive (SSD) versus a hard disk drive (HDD)? What is the throughput of our storage?
    • RAM: What is the expected working set? Can we fit it in the random-access memory (RAM)? If not, are we going to be okay with an SSD instead of an HDD? What is the growth rate?
    • CPU: This usually isn’t a concern for MongoDB, but it could be if we planned to run CPU-intensive jobs in our cluster (for example, aggregation or MapReduce).
    • Network: What are the network links between servers? This is usually trivial if we are using a single data center, but it can get complicated if we have multiple data centers and/or offsite servers for disaster recovery (DR).
  • Security:
    • Enable auth.
    • Enable SSL.
    • Disable REpresentational State Transfer (REST)/Hypertext Transfer Protocol (HTTP) interfaces.
    • Isolate our servers (for example, VPC).
    • Authorization is enabled. With great power comes great responsibility. Make sure that the powerful users are the ones that you trust. Don’t give potentially destructive powers to inexperienced users.

A monitoring and operations checklist could look like this:

  • Monitoring:
    • Usage of hardware (CPU, memory, storage, and network).
    • Health checks, using Pingdom or an equivalent service to make sure that we get a notification when one of our servers fails.
    • Client performance monitoring: Integrating periodic mystery shopper tests using the service as a customer in a manual or automated way, from an end-to-end (E2E) perspective, in order to find out whether it behaves as expected. We don’t want to learn about application performance issues from our customers.
    • Use MongoDB Cloud Manager monitoring; it has a free tier, it can provide useful metrics, and it is the tool that MongoDB engineers can take a look at if we run into issues and need their help, especially as a part of support contracts.
  • DR:
    • Evaluate the risk: What is the risk, from a business perspective, of losing MongoDB data? Can we recreate this dataset? If yes, how costly is it in terms of time and effort?
    • Devise a plan: Have a plan for each failure scenario, with the exact steps that we need to take if something happens.
    • Test the plan: Having a dry run of every recovery strategy is as important as having one. Many things can go wrong in DR, and having an incomplete plan (or one that fails in each purpose) is something that we shouldn’t allow to happen in any circumstance.
    • Have an alternative to the plan: No matter how well we devise a plan and test it, anything can go wrong during planning, testing, or execution. We need to have a backup plan for our plan, in case we can’t recover our data using plan A. This is also called plan B, or the last-resort plan. It doesn’t have to be efficient, but it should alleviate any business reputation risks.
    • Load test: We should make sure that we load test our application E2E before deployment, with a realistic workload. This is the only way to ensure that our application will behave as expected.

Summary

In this chapter, we covered some topics that were not detailed in previous chapters. It is important to apply the best practices according to our workload requirements. We started by covering some patterns and anti-patterns that MongoDB has identified over the years. Read performance is usually what we want to optimize for; that is why we discussed consolidating queries and the denormalization of our data.

Operations are also important when we go from deployment to ensuring the continuous performance and availability of our cluster. Security is something that we often don’t think about until it affects us. That’s why we should invest the time beforehand to plan and make sure that we have the measures in place to be sufficiently secure.

Finally, we introduced the concept of checklists to keep track of our tasks and to make sure that we complete all of them before major operational events (deployment, cluster upgrades, moving to sharding from replica sets, and so on)

Further reading

You can refer to the following links for further information:

Closing remarks

Congratulations for making it this far and thank you for embarking on this journey with me for the past 15 chapters! It’s been 11 years since I first visited MongoDB’s first office ever in New York to attend a multi-day training. As a young engineer, I was inspired not only by their technical merits but equally as much by their behavioral traits, their friendly and hardworking attitude, and overall professional ethics.

Being an outsider and still very close to the team for a few days, I could clearly see that this team would succeed. In these 11 years, a lot has changed in my life. My beloved mother, Evi, and my dearest father-in-law, Christos, passed away and a new member of our family is now on the way.

On the other hand, MongoDB has been a constant source of satisfaction and joy to work with. I was lucky enough to witness firsthand its whole evolution from a small, niche database to a fully fledged ecosystem that can serve almost everyone, from solo founders to the major Fortune 500 corporations.

If there is one thing I learned watching MongoDB’s story unfold, it is that small gains add up over time. We can’t become the best version of ourselves in a day, but hard work, ethics, and persistence always pay off over time. Getting just a little bit better every day as a professional and as a member of society is at least one of the ways to ensure a better future for everyone.

In the words of Dale Carnegie: “Most of the important things in the world have been accomplished by people who have kept on trying when there seemed to be no hope at all.

Live long and prosper!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.8.42