Downtime

Downtime is similar to uptime, but measures the time in which a given system, application, network, or other logical and physical object is not available to the end user. Downtime is subject to some interpretation, as it's defined as a period where the system is not performing its primary function as originally intended. The most ubiquitous example of downtime is the infamous 404 page, which you may have seen before:

In order to understand the availability of your system with the preceding concepts, we can calculate using available uptime and downtime figures:

Availability Percentage = (Uptime / (Uptime + Downtime) x 100

There is a more complex calculation for systems that have redundant pieces that increase the overall stability of a system, but let's stick with our concrete example for now. We'll investigate the redundant pieces of Kubernetes later on in this chapter.

Given these equations, which you can use on your own in order to measure the uptime of your Kubernetes cluster, let's look at a few examples.

Let's look at some of the math behind these concepts. To get started, uptime availability is a function of Mean Time Between Failures (MTBF), divided by the sum of Mean Time to Repair (MTTR) and MTBF combined.

We can calculate MTBF as follows:

MTBF = ‘Total hours in a year' / ‘Number of yearly failures'

And MTTR is represented as follows:

MTTR = (‘Amount of failure' x ‘Time to repair the system') / ‘Total number of failures'

This is represented with the following formula:

Uptime Availability = MTBF/(MTTR + MTBF)
Downtime per Year (Hours) = (1 – Uptime Ratio) x 365 x 24

Table of Contents for Downtime

Create new playlist

Sign In

Sign Up

Table of Contents for
Downtime