Chapter 4. Performance Monitoring

In this chapter, we will dive into performance, a well-known term that has taken a completely new meaning in SDDC. In fact, if you use the definition from HDDC, you will have difficulty reaching an agreement with the application team (your customers) on where the performance issue lies.

After setting the scene with a familiar story, we will cover the following topics:

  • What exactly is performance?
  • How performance, capacity, and availability are related.
  • Performance SLA, with actual sample values you can use as a guide.

A day in the life of a VMware Admin

To understand what performance actually is, it is always good to begin with the customer. As shared in the previous chapter, the SDDC is providing a service, not a system. We have seen this in almost all customers. Whether the application team or VM owner pays for the service or not, it is a service. The existence of a chargeback model is practically optional. VM owners no longer own, hence care, about the underlying infrastructure.

Here is a common story often told in the virtualization community, which will resonate with you as an IaaS provider:

A VM owner complains to you that her VM is slow. It was not slow yesterday. Her application architect and lead developer have verified that:

  • The VM CPU and RAM utilization did not increase. They are also within a healthy range. The application team has verified that the CPU run queue is also in the healthy range.
  • The disk latency is good. It is below 10 milliseconds.
  • The network isn't dropping any packets.
  • There is no change in the application settings. In fact, the application has not had any changes made in the past month.
  • There hasn't been any recent patch to Windows.
  • There was no reboot.

She says that your VMware environment is a shared environment, and perhaps an increase in the number of Virtual Machines and in the workload of other Virtual Machines is straining your IaaS.

She also says that her other VM, which was P2V recently, was performing much faster in physical mode.

You are right—she is saying it's your fault.

What do you do?

It is certainly a difficult situation to be in. You are in charge of more than 1,000 Virtual Machines. You have successfully consolidated them into 50 ESXi 6.0 hosts, saving the company 950 servers, not to mention a lot of money. You built your reputation in the process, so this matter is not just that her VM is not performing.

You also recall that your team has been adding new VM regularly in the past several months, so she could be right. But why did it happen today, and not, say, 3 days ago?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.236.255