Availability monitoring

Availability monitoring becomes relevant when a solution is working as expected from the technical and functional perspectives. This means that all components throughout the solution must be brought back up and running in order to meet the business requirements. Often, the availability of a solution is measured by Key Performance Indicators (KPIs). Here are some examples of KPIs:

  • Overall availability: This can be measured, quite simply, by taking the planned hours of uptime and comparing them to the actual hours of uptime. Overall availability can be expressed as a percentage. Note that planned maintenance should also be considered when calculating overall availability. Overall availability should be as high as possible.
  • Planned unavailability: This refers to the amount of time devoted to planned maintenance. Planned maintenance is needed, for example, to bring live changes to a solution. This can be calculated by taking the actual hours of planned maintenance and comparing them to the overall availability. This can also be expressed as a percentage. Planned unavailability should be kept as low as possible.
  • Unplanned unavailability: Unplanned availability arises when a solution becomes unavailable due to some incident. These incidents can have lots of causes, whether on the Azure platform or in your own IT environment. Unplanned unavailability can be easily measured by taking the hours of unplanned availability and comparing them with overall availability. Unplanned unavailability should be as low as possible.

These are a few basic KPIs you can consider. You will find many more at KPI Library: http://kpilibrary.com/.

To have a clear understanding of what a customer expects of a service provider, contracts can be set up between both parties. Such contracts are called Service Level Agreements (SLAs). Besides providing an understanding of the expectations of the customer, they can also determine what information has to be provided by the service provider to the customer. When setting up such contracts, think of the following:

  • Solution availability: This can be based upon the KPIs we discussed earlier
  • Performance metrics: Measure, for example, application performance and error rates
  • Response times: Measure the average response times of the solution
  • Planned maintenance: Measure the amount of planned maintenance
  • Usage statistics: Measure, for example, page views, concurrent use, and demographic use

By adding telemetry to your solution, you can collect a lot of this kind of data.

SLAs are not just about writing down expectations and deliverables; they are also about determining and applying penalties when the conditions of the contract are not met. In such cases, and depending on the importance of the solution, serious damage can be done to the organization, so reasonable penalties and consequences must be agreed upon to protect the organization against such damage.

After deciding on the KPIs and SLAs for your organization, it is equally important to follow up on them. You can set up monitoring products that are designed for that purpose to provide transparent insights as to whether SLAs are being met. The following are globally a couple of examples of such products:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.19.54.149