One common requirement I get from customers is the need to size for peaks. I've seen many mistakes in defining what a peak actually is.
So, let's elaborate on peaks.
How do you define peak utilization or contention without being overly conservative or aggressive?
There are two dimensions of peaks: you can measure them across time or across members of the group.
Let's take a cluster with eight ESXi hosts as an example. The following chart shows the ESXi Hosts Utilization for the eight hosts.
What's the cluster peak utilization on that day?
As you can see from the graphs, it is not so simple. Let's elaborate:
The second approach is useful if you want to know detailed information. You retain the 5-minute granularity. With the first approach, you lose the granularity and each sample becomes 1 day (or 1 month, depending on your timeline). You do not know what time of the day it hits the peak. The first approach will result in a higher average than the second one, because in most cases, your cluster is not perfectly balanced (identical utilization).
In a tier 1 cluster, where you do not oversubscribe, the second approach is better as it will capture the host with the highest peak. The second approach can be achieved using super metrics in vRealize Operations. The first approach requires the View widget with data transformation. As shown in the following screenshot, choose Maximum from the Transformation drop-down field:
Does this mean you always use the second approach? The answer is no. This approach can be too conservative when the number of members is high. If your data center has 500 hosts and you use the second approach, then your overall data center peak utilization will always be high. All it takes is one host to hit a peak at any given time.
The second approach fits a use case where automatic load balancing should happen. So you expect an overall balanced distribution. A DRS cluster is a good example.
3.139.239.41