Checklists

Operations require the completion of many tasks of varying time and complexity to be marked as done. A good practice is to keep a set of checklists with all the tasks that need to be performed and their order of significance. This will ensure that we don't let something slip through.

A deployment and security checklist, for example, could be:

Hardware:
- Storage: How much disk space is needed per node? What is the growth rate?
- Storage technology: Do we need SSD versus HDD? What is the throughput of our storage?
- RAM: What is the expected working set? Can we fit it in the RAM? If not, are we going to be okay with SSD instead of HDD? What is the growth rate?
- CPU: Usually not a concern for MongoDB, but could be if we plan to run CPU-intensive jobs in our cluster (for example, aggregation, MapReduce).
- Network: What are the network links between servers? This is usually trivial if we are using a single data center but can get complicated if we have multiple data centers and/or offsite servers for disaster recovery.
Security:
- Enable auth.
- Enable SSL.
- Disable REST/HTTP interface.
- Isolate our servers (for example, VPC).
- Authorization is enabled. With power comes great responsibility. Make sure power users are the ones that you trust. Don't give potentially destructive powers to inexperienced users.

A monitoring and operations checklist:

Monitoring:
- Usage of hardware mentioned above (CPU, memory, storage, network).
- Health checks using Pingdom or an equivalent service to make sure that we get a notification when one of our servers fails.
- Client performance monitoring: Integrate periodic mystery shopper tests, using the service in a manual or automated way as a customer from an end-to-end perspective in order to find out if it behaves as expected. We don't want to learn about application performance issues from our customers.
- Use MongoDB Cloud Manager monitoring—it has a free tier, can provide useful metrics, and is the tool that MongoDB engineers can take a look at if we run into issues and need their help, especially as part of support contracts.
Disaster recovery:
- Evaluate risk: What is the risk from the business perspective of losing MongoDB data? Can we recreate this dataset and, if yes, how costly is it in terms of time and effort needed?
- Devise a plan: Have a plan for each failure scenario, with the exact steps that we need to take in case it happens.
- Test the plan: Having a dry run of every recovery strategy is as important as having one. Many things can go wrong in disaster recovery and having an incomplete plan, or one that fails in each purpose, is something that we shouldn't allow to happen under any circumstance.
- Have an alternative to the plan: No matter how well we devise a plan and test it, anything can go wrong during planning, testing, or execution. We need to have a backup plan for our plan in case we can't recover our data using plan A. This is also called plan B, or the last resort plan. It doesn't have to be efficient, but it should alleviate any business reputation risk.
- Load test: We should make sure we load test our application end to end before deployment with a realistic workload. This is the only way to ensure that our application will behave as expected.

Table of Contents for Checklists

Create new playlist

Sign In

Sign Up

Table of Contents for
Checklists