Chapter 6. Likelihood Versus Severity

It is important to understand the relationship between severity and likelihood. Managing risk involves knowing when you need to be concerned about severity and not likelihood, or vice versa. Understanding the difference is essential in analyzing the seriousness of risks to your system.

We treat all risks as being composed of two components:

Severity

The cost if the risk happens (for example, what is the impact if customers don’t have power?).

Likelihood

The chance of the risk happening (for example, how likely is a big windstorm?).

Managing risks is managing these two values. You can reduce the severity of a risk happening or you can reduce the likelihood. For any given risk, you don’t need to do both. But considering both is important to understanding the best path forward in managing risks.

Tip

The significance of a risk is the combination of the severity of the risk happening with the likelihood of it happening. To successfully manage risk, you must consider both of these values and how they relate to each other. To reduce risk, you need to reduce at least one of these two values for any given risk.

The best way to understand the difference is by looking at examples of various risks and how their likelihood and severity differ. We’ll use the following example through the remainder of this chapter to help explain the differences:

Let’s assume that we are managing an online T-shirt store. This store is your typical online retailer. They provide a listing of T-shirts available, individual pages that show the details of each T-shirt, including pictures of what they look like, and an order processing system that customers can use to purchase and pay for T-shirts that they want shipped to them.

Now let’s look at some example risks for this store.

The Top 10 List: Low Likelihood, Low Severity Risk

Using our T-shirt store example, let’s assume that the site has a feature that appears on the upper-right side that shows the top 10 best-selling T-shirts. Visitors on the site can see these best sellers and then click to go to and purchase one of them quickly and easily.

Now, what happens if the top 10 list can’t be generated for some reason (perhaps due to a service failure)? If it can’t be displayed, let’s instead assume a static list of T-shirts is displayed, but those shirts displayed aren’t necessarily the current top 10 best sellers. This service failure doesn’t happen often, because the top 10 list is easy to generate and doesn’t tend to have any problems.

What is the risk to our store for having a top 10 list displayed?

Let’s look at this risk:

  • The likelihood of the risk is low because the service that displays the list is apparently quite reliable (I stated the list is easy to generate).

  • But if the list does not appear, how severe is the problem? I stated that if the top 10 list doesn’t appear, an alternate list is shown. Although not ideal, the impact on our customers is probably quite low, and the impact on our business would likely not be very large, either. As such, the severity of this risk is also low.

  • This risk is a Low/Low Risk. This means it has a low likelihood and a low severity.

Risks like this are easy to ignore and typically do not need further attention, because they are rare events and the events themselves have very little negative impact.

The Order Database: Low Likelihood, High Severity Risk

Using our T-shirt store example, let’s assume that your list of orders is stored in a typical database. Whenever a customer generates an order, an entry is created in the database. As you process, collect payment, and ship these orders, you update the data in the database. Later, the data is used to generate financial reports that you can use to show how much business you are doing for purposes such as business planning and tax calculations.

Because the database is important, you run it on high-quality hardware with replicated system components (such as a RAID disk array). You also do regular backups of the data.

However, the database is still a single point of failure. The database contains significant amounts of business-critical data, and your website can’t function (you can’t even take any orders) if the database is not available. Losing the database would be a big loss.

What is the risk to our store associated with the order processing system’s database?

Let’s look at this risk:

  • The likelihood of the failure is quite low, because you are using high-quality, replicated hardware for the database. The database is quite reliable.

  • However, the severity of a failure in the database would be quite high. This is because if the database does fail, your entire order-processing pipeline will be down, and you risk losing business-critical data.

  • This risk is a Low/High Risk. This means it has a low likelihood and a high severity.

  • Risks like this are easy to miss because they do not happen very often (likelihood is low). However, they can be very expensive risks if they are ignored because the cost of failure is very high.

Given the high severity, this is a risk that you might want to look at mitigating that severity. For example, you might want to have a hot database replica standing by, so that you can quickly flip from the broken database to the hot replica. This will let you continue working without significant loss of time or data. Alternatively, you might want to switch to a database technology that distributes data across multiple servers so that you can continue to function even if one of your database servers fail.

Using one of these techniques might very well turn this risk from a Low/High Risk back down to a Low/Medium Risk (low likelihood, medium severity) or even a Low/Low Risk (low likelihood, low severity).

Mitigations such as this, which can dramatically reduce the severity of a problem, are discussed further in Chapter 8.

Custom Fonts: High Likelihood, Low Severity Risk

Using our T-shirt store example, suppose that you decide to spruce up your site a bit by using custom fonts in all of your text and descriptions. You’ve found the perfect font to use, and it is provided (and hosted) by a third-party font service provider. To use the font, your customer’s web browser downloads it directly from the third-party service provider. If the custom font is not available, a standard system font is used and the page looks like it did previously.

However, you’ve noticed this font service provider has problems on occasion, much more often than you’d want. When this service provider has a problem, your customers can’t use the beautiful custom font.

This happens a lot, unfortunately.

What is the risk to your store of using the beautiful custom font?

Let’s look at this risk:

  • The likelihood of the font not appearing is high, because the service provider is inconsistent and has problems often.

  • However, when the problem does occur, your site continues to work—it just doesn’t look quite as spruced up as you’d like. Hence, the severity of the problem is low.

  • Your site might be missing some of its glitz, but it is fully functional without significant problems.

  • This risk is a High/Low Risk. This means it has a high likelihood of occurring but has a low severity.

Mitigations for this risk involve reducing the likelihood of the problem occurring. You can reduce the likelihood of this problem occurring by working with the third-party provider to improve the availability of the service. Or, you can compile a list of backup providers that offer the same or similar fonts, and switch to them if the first provider doesn’t work. These are ways you can reduce the likelihood of the problem occurring.

There is not much you can do to reduce the severity, given that it is already quite low.

T-Shirt Photos: High Likelihood, High Severity Risk

Using our T-shirt store example, let’s look at the T-shirt images (pictures) that appear on your site. These are an incredibly important part of your store because people are typically not going to buy T-shirts if they can’t see what they look like. If your T-shirt images do not appear, your customers will leave your site and you’ll lose orders.

However, the server on which you are hosting your images is flaky. It goes in and out of service and seems to be having problems reading images from its disk. The server is old and needs to be replaced. It fails often and needs to be rebooted regularly. It goes out of service for parts replacement constantly. Yet, this is the server used to host your images.

What is the risk of your site becoming unusable because the images are not available?

Let’s look at this risk:

  • The likelihood of the images not displaying is high because the server is flaky and fails often.

  • The severity of this risk is also high, because if the images aren’t available, your customers will go away and not place orders.

  • This risk is a High/High Risk. This means it has a high likelihood of occurring (the hardware fails often) and it has a high severity when it does occur (customers won’t buy from you).

These types of risks are the most scary. This is a risk that is highly likely to happen, and the problem it introduces is serious to your business.

These are the risks to which you should pay the most attention.

This example might seem obvious, but there likely are many such High/High Risks in your applications. Often, though, these risks might not be obvious until you look closely at your system. This is why risk management is so important.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.202.123