When is a peak not a true peak?

One common requirement I get from customers is the need to size for peaks. I've seen many mistakes in defining what a peak actually is.

So, let's elaborate on peaks.

How do you define peak utilization or contention without being overly conservative or aggressive?

There are two dimensions of peaks: you can measure them across time or across members of the group.

Let's take a cluster with eight ESXi hosts as an example. The following chart shows the ESXi Hosts Utilization for the eight hosts.

What's the cluster peak utilization on that day?

When is a peak not a true peak?

The two dimensions of peaks

As you can see from the graphs, it is not so simple. Let's elaborate:

  • Approach 1: You measure across time. You take the average utilization of the cluster, roll up the time period to a longer time period, and calculate the peak of that longer time period. For example, the average cluster utilization peaks at 65 percent at 9:05 am. You roll up the data for one day. This means the peak utilization for that day is 65 percent. This is the most common approach. The problem of this approach is that it is actually an average. For the cluster to hit 80 percent average utilization, some hosts have to hit over 80 percent. This means you can't rule out the possibility that one host might hit near 100 percent. The same logic applies to a VM. If a VM with 16 vCPUs hits 80 percent utilization, some cores probably hit 100 percent. This method results in under-reporting since it is an average.
  • Approach 2: You measure across members of the group. For each data sample, calculate the utilization from the host with the highest utilization. In our cluster example, at 9:05 am, host number 1 has the highest utilization among all hosts. It hits 80 percent. We then infer that the peak cluster utilization at 9:05 am is also 80 percent. You repeat this process for each sample period. You may get different hosts at different times. You will not know which host provides the peak value as it varies from time to time. This method results in over-reporting, as it is the peak of a member. You can technically argue that this is the true peak.

The second approach is useful if you want to know detailed information. You retain the 5-minute granularity. With the first approach, you lose the granularity and each sample becomes 1 day (or 1 month, depending on your timeline). You do not know what time of the day it hits the peak. The first approach will result in a higher average than the second one, because in most cases, your cluster is not perfectly balanced (identical utilization).

In a tier 1 cluster, where you do not oversubscribe, the second approach is better as it will capture the host with the highest peak. The second approach can be achieved using super metrics in vRealize Operations. The first approach requires the View widget with data transformation. As shown in the following screenshot, choose Maximum from the Transformation drop-down field:

When is a peak not a true peak?

The data transformation feature of vRealize Operations

Does this mean you always use the second approach? The answer is no. This approach can be too conservative when the number of members is high. If your data center has 500 hosts and you use the second approach, then your overall data center peak utilization will always be high. All it takes is one host to hit a peak at any given time.

The second approach fits a use case where automatic load balancing should happen. So you expect an overall balanced distribution. A DRS cluster is a good example.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.139.239.41