Defensive coding

More of a generic principle, defensive coding refers to a set of practices and software design that ensure the continuing function of a piece of software under unforeseen circumstances.

It prioritizes code quality, readability, and predictability.

Readability is best explained by John F. Woods in his comp.lang.c++ post on September 24, 1991:

"Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live. Code for readability."

Our code should be readable and understandable by humans as well as machines. With code quality metrics as derived by static analysis tools, code reviews, and bugs reported/resolved, we can estimate the quality of our code base and aim for a certain threshold at each sprint or when we are ready to release.

Code predictability, on the other hand, means that we should always have expected results in unexpected inputs and program state.

These principles apply to every software system. In the context of system programming using MongoDB, there are some extra steps that we must take to ensure code predictability and subsequently quality is measured by the number of resulting bugs.

MongoDB limitations that will result in the loss of database functionality should be monitored and evaluated on a periodic basis:

  • Document size limit: We should keep an eye on collections that we expect to have documents growing the most, running a background script to examine document sizes and alert us if we have documents approaching the limit (16 MB), or the average size has grown significantly since the last check.
  • Data integrity checks: If we are using denormalization for read optimization then it's a good practice to check for data integrity. Through a software bug or a database error, we may end up with inconsistent duplicate data among collections.
  • Schema checks: If we don't want to use the document validation feature of MongoDB but rather have a lax document schema, it's still a good idea to periodically run scripts to identify fields that are present in our documents and their frequency. Then along with relative access patterns, we can identify if these fields can be identified and consolidated. This is mostly useful if we are ingesting data from another system where data input changes over time, which may result in a wildly varying document structure on our end.
  • Data storage checks: This mostly applies if using MMAPv1, where document padding optimization can help performance. By keeping an eye on document size relative to its padding we can make sure that our size-modifying updates won't incur a move of the document in physical storage.

These are the basic checks that we should implement when defensively coding for our MongoDB application. On top of this, we need to defensively code our application-level code to make sure that when failures occur in MongoDB our application will continue operating, maybe with degraded performance but still operational.

An example of this is replica set failover and failback. When our replica set master fails, there is a brief period before the other members of the replica set detect this failure and the new master is elected, promoted, and operational. During this brief period of time, we should make sure that our application continues operating in read-only mode instead of throwing 500 errors. In most cases, electing a new primary is done in seconds, but in some cases we may end up in the minority end of a network partition, not being able to contact a master for a long period of time. Similarly, some secondaries may end up in a recovering state (for example, if they fall way behind the master in replication); our application should be able to pick a different secondary in this case.

Designing for secondary access is one of the most useful examples of defensive coding. Our application should weigh between fields that can only be accessed by the primary to ensure data consistency and fields that are okay to be updated in near-real-time instead of in real time, in which case we can read these from secondary servers. By keeping track of replication lag for our secondaries using automated scripts, we can have a view of our cluster's load and how safe it is to enable this functionality.

Another defensive coding practice is to always perform writes with journaling on. Journaling helps to recover from server crashes and power failures.

Finally, we should aim to use replica sets as early as possible. Other than the performance and workload improvements, they help us to recover from server failures.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.216.143.65