Chapter 5. Capacity Monitoring

Capacity management changes drastically with virtualization. In this chapter, we will learn why it is one of the areas that are greatly impacted by virtualization. We will cover the following topics:

  • The changes in capacity management
  • How you should perform capacity management
  • How many resources are consumed by the SDDC itself
  • When peak utilization is not what it actually is
  • How you should perform capacity management
  • VM rightsizing

Some well-meaning but harmful advice

Can you figure out why the following statements are wrong? They are all well-meaning pieces of advice on the topic of capacity management. I'm sure you have heard them, or even given them.

Regarding cluster RAM:

  • We recommend a 1:2 overcommit ratio between physical RAM and virtual RAM. Going above this is risky.
  • Memory usage on most of your clusters is high, around 90 percent. You should aim for 60 percent as you need to consider HA.
  • Active memory should not exceed 50-60 percent. You need a buffer between active memory and consumed memory.
  • Memory should be running at a high state on each host.

Regarding cluster CPU:

  • The CPU ratio in cluster X is high at 1:5, because it is an important cluster.
  • The rest of your clusters' overcommit ratios look good as they are around 1:3. This gives your some buffer for spike and HA.
  • Keep the overcommitment ratio at 1:4 for tier 3 workload.
  • CPU usage is around 70 percent on cluster Y. Since they are User Acceptance Testing (UAT) servers, don't worry. You should be worried only when they reach 85 percent.
  • The rest of your clusters' CPU utilization is around 25 percent. This is good! You have plenty of capacity left.

The scope of these statements is obviously a VMware vSphere Cluster. From a capacity-monitoring point of view, a cluster is the smallest logical building block, due to HA and DRS. So, it is correct that we perform capacity planning at the cluster level and not at the host or data center level.

Can you figure out where the mistakes are?

You should notice a trend by now. The statements have something in common. Here is another hint—review this great blog by Mark Achtemichuk, a performance expert on VMware: https://blogs.vmware.com/vsphere/2015/11/vcpu-to-pcpu-ratios-are-they-still-relevant.html. In the blog, he explains why static counters such as vCPU:pCPU are no longer sufficient. You need something that reflects the actual live situation in the data center.

The earlier statements are wrong as they focus on the wrong item. They are looking at the cluster, when they should be looking at the VM.

Remember the restaurant analogy we covered in Chapter 3, SDDC Management? Those well-meant pieces of advice were looking at the supplier (provider), when they should have been focusing on the consumer (customer). What's important is your VM.

Note

The way you perform capacity monitoring changes drastically once you take into account performance and availability.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.138.104