images

Chapter 13

Monitoring VMware vSphere Performance

The monitoring of VMware vSphere should be a combination of proactive benchmarking and reactive alarm-based actions. vCenter Server provides both methods to help the administrator keep tabs on each of the VMs and hosts as well as the hierarchical objects in the inventory. Using both methods ensures that the administrator is not caught unaware of performance issues or lack of capacity.

vCenter Server provides some exciting new features for monitoring your VMs and hosts, such as expanded performance views and charts, and it greatly expands the number and types of alarms available by default. Together, these features make it much easier to manage and monitor VMware vSphere performance.

In this chapter, you will learn to

  • Use alarms for proactive monitoring
  • Work with performance charts
  • Integrate vCenter Operations Manager
  • Gather performance information using command-line tools
  • Monitor CPU, memory, network, and disk usage by ESXi hosts and VMs

Overview of Performance Monitoring

Monitoring performance is a key part of every vSphere administrator's job. Fortunately, vCenter Server provides a number of ways to get insight into the behavior of the vSphere environment and the VMs running within that environment.

The first tool vCenter Server provides is its alarms mechanism. Alarm definitions can be attached to just about any object within vCenter Server and provide an ideal way to proactively alert the vSphere administrator or datacenter staff about potential performance concerns or resource usage. We'll discuss alarms in detail in the section “Using Alarms.”

Another tool that vCenter Server provides is the content area on the Summary tab of ESXi hosts and VMs. The content area provides quick “at-a-glance” information on resource usage. This information can be useful as a quick barometer of performance, but for more detailed performance information, you will have to dive deeper into the vCenter tools that we'll discuss later in this chapter.

A third tool that provides an at-a-glance performance summary is the Related Objects tab, found on vCenter Server objects, datacenter objects, cluster objects, and ESXi hosts. Figure 13.1 shows the Related Objects images Virtual Machines tab of a cluster object. This tab provides an overview of general performance and resource usage. This information includes CPU utilization, host and guest memory usage, and storage space utilized. As with the Resources pane, this information can be useful, but it is quite limited. However, keep in mind that a quick trip here might help you isolate the one VM that could be causing performance issues for the ESXi host on which it is running.

FIGURE 13.1 The Related Objects images Virtual Machines tab of a cluster object offers a quick look at VM CPU and memory usage.

images

For ESXi clusters, resource pools, and VMs, another tool you can use is the Resource Allocation tab. The Resource Allocation tab provides a picture of how CPU, memory, and storage resources are being used for the entire pool. This high-level method of looking at resource usage is useful for analyzing overall infrastructure utilization. This tab also provides an easy way of adjusting individual VMs or resource pool reservations, limits, and/or shares without editing each object independently.

vCenter Server offers a powerful, in-depth tool on the Performance tab that lets you create charts that depict the actual resource consumption over time for a given ESXi host or VM. The charts provide historical information and can be used for trend analysis. vCenter Server provides many objects and counters to analyze the performance of a single VM or host for a selected interval. The Performance tab provides powerful tools for isolating performance considerations, and we discuss them in greater detail in the section “Working with Performance Charts.”

VMware also allows its enterprise monitoring solution, vCenter Operations Manager Foundation, to be installed alongside vSphere at no charge with the Foundation license. While the default features are limited with the free license, the capabilities are a step up from the standard Performance tab. We'll explain how to install this extra component and what benefits it brings in the section “Understanding vCenter Operations Manager.”

VMware also provides resxtop for an in-depth view of all the counters available in vSphere to help isolate and identify problems in the hypervisor. resxtop runs only inside the vSphere Management Assistant (vMA). We'll take a look at resxtop later in this chapter in the section “Working with resxtop.”

Finally, we'll take the various tools that we've discussed and show how to use them to monitor the four major resources in a vSphere environment: CPU, memory, network, and storage.

Let's get started with a discussion of alarms.

Using Alarms

In addition to using the charts and high-level information tabs, an administrator can create alarms for VMs, hosts, networks, and datastores based on predefined triggers provided with vCenter Server. Depending on the object, these alarms can monitor resource consumption or the state of the object and alert you when certain conditions have been met, such as high resource usage or even low resource usage. These alarms can then provide an action that informs you of the condition by email or Simple Network Management Protocol (SNMP) trap. An action can also automatically run a script or provide other means to correct the problem the VM or host might be experiencing.

With each revision of vSphere, VMware continues to add to the number of built-in default alarms. As you can see in Figure 13.2, the alarms that come with vCenter Server are defined at the topmost object, the vCenter Server object.

FIGURE 13.2 The default alarms for objects in vCenter Server are defined on the vCenter Server object itself.

images

These default alarms are usually generic in nature. Some of the predefined alarms alert the administrator if any of the following situations occur:

  • A host's storage status, CPU status, voltage, temperature, or power status changes.
  • A cluster experiences a vSphere High Availability (HA) error.
  • A datastore runs low on free disk space.
  • A VM's CPU usage, memory usage, disk latency, or even fault tolerance status changes.

In addition to the small sampling of predefined alarms we've just described, there are many more, and VMware has enabled users to create alarms on just about any object within vCenter Server. This greatly increases the ability of vCenter Server to proactively alert administrators to changes within the virtual environment before a problem develops.

Because the default alarms are likely too generic for your administrative needs, creating your own alarms is often necessary. Before showing you how to create an alarm, though, we need to discuss the concept of alarm scopes. Once we've discussed alarm scopes, we'll walk you through creating a few alarms.

Understanding Alarm Scopes

When you create alarms, one thing to keep in mind is the scope of the alarm. In Figure 13.2, you saw the default set of alarms available in vCenter Server. These alarms are defined at the vCenter Server object and thus have the greatest scope—they apply to all objects managed by that vCenter Server instance. It's also possible to create alarms at the datacenter level, the cluster level, the host level, or even the VM level. This allows you, the vSphere administrator, to create specific alarms that are limited in scope and are intended to meet specific monitoring needs.

When you define an alarm on an object, that alarm applies to all objects beneath that object in the vCenter Server hierarchy. The default set of alarms that VMware provides with vCenter Server is defined at the vCenter Server object and therefore applies to all objects—datacenters, hosts, clusters, datastores, networks, and VMs—managed by that instance of vCenter Server. If you were to create an alarm on a resource pool, then the alarm would apply only to VMs found in that resource pool. Similarly, if you were to create an alarm on a specific VM, that alarm would apply only to that specific VM.

Alarms are also associated with specific types of objects. For example, some alarms apply only to VMs, while other alarms apply only to ESXi hosts. You'll want to use this filtering mechanism to your advantage when creating alarms. If you needed to monitor a particular condition on all ESXi hosts, for instance, you could define a host alarm on the datacenter or vCenter Server object and it would apply to all ESXi hosts but not to any VMs.

It's important that you keep these scoping effects in mind when defining alarms so that your new alarms work as expected. You don't want to inadvertently exclude some portion of your vSphere environment by creating an alarm at the wrong point in your hierarchy or by creating the wrong type of alarm.

Now you're ready to look at actually creating alarms.

Creating Alarms

As you've already learned, there are many different types of alarms that administrators might want to create. These could be alarms that monitor resource consumption—such as how much CPU time a VM is consuming or how much RAM an ESXi host has allocated—or these alarms could monitor for specific events, such as whenever a specific distributed virtual port group is modified. In addition, you've already learned that alarms can be created on a variety of different objects within vCenter Server. Regardless of the type of alarm or the type of object to which that alarm is attached, the basic steps for creating an alarm are the same. In the following sections, we'll walk you through creating a few different alarms so that you have the opportunity to see the options available to you.

CREATING A RESOURCE CONSUMPTION ALARM

First, let's create an alarm that monitors resource consumption. As discussed in Chapter 9, “Creating and Managing Virtual Machines,” vCenter Server supports VM snapshots. These snapshots capture a VM at a specific point in time, allowing you to roll back (or revert) to that state later. However, snapshots require additional space on disk, and monitoring disk space usage by snapshots is an important task. In vSphere, vCenter Server lets you create an alarm that monitors VM snapshot space.

Before you create a custom alarm, though, you should ask yourself a few questions. First, is there an existing alarm that already handles this task for you? Browsing the list of predefined alarms available in vCenter Server shows that although some storage-related alarms are present, there is no alarm that monitors snapshot disk usage. Second, if you're going to create a new alarm, where is the appropriate place within vCenter Server to create that alarm? This refers to the earlier discussion of scope: On what object should you create this alarm so that it is properly scoped and will alert you only under the desired conditions? In this particular case, you'd want to be alerted to any snapshot space usage that exceeds your desired threshold, so a higher-level object such as the datacenter object or even the vCenter Server object would be the best place to create the alarm.

YOU MUST USE VCENTER SERVER FOR ALARMS

You can't create alarms by connecting directly to an ESXi host; vCenter Server provides the alarm functionality. You must connect to a vCenter Server instance in order to work with alarms.

Perform the following steps to create an alarm that monitors VM snapshot disk space usage for all VMs in a datacenter:

  1. Launch the vSphere Web Client if it is not already running, and connect to a vCenter Server instance.
  2. Navigate to an inventory view, such as Hosts And Clusters or VMs And Templates.

    You can use the Navigator or the icon on the home screen.

  3. Right-click the datacenter object and select Alarm images New Alarm Definition.
  4. On the General tab in the Alarm Settings dialog box, enter an alarm name and alarm description.
  5. Select Virtual Machines from the Monitor drop-down list.
  6. Be sure that the radio button marked Monitor For Specific Conditions Or State, For Example, CPU Usage is selected. Click Next to move on to the Triggers section.
  7. On the Triggers tab, click the add/plus button to add a new trigger.
  8. Add the trigger type, VM Snapshot Size (GB). For this alarm, you're interested in snapshot size only, but these other triggers are available:
    • VM CPU Demand To Entitlement Ratio
    • VM CPU Ready Time
    • VM CPU Usage
    • VM Disk Aborts
    • VM Disk Resets
    • VM Disk Usage
    • VM Fault Tolerance Latency
    • VM Heartbeat
    • VM Max Total Disk Latency
    • VM Memory Usage
    • VM Network Usage
    • VM Snapshot Size
    • VM State
    • VM Total Size on Disk
  9. Ensure that the Operator column is set to Is Above.
  10. Change the warning and critical conditions to be 1 GB and 2 GB, respectively. Click Next to move to the Actions screen.

    Figure 13.3 shows the Triggers tab after changing the Warning and Critical values.

    FIGURE 13.3 On the Triggers tab, define the conditions that cause the alarm to activate.

    images

    CAUTION: COUNTER VALUES WILL VARY!

    The Is Above condition is selected most often for identifying a VM, host, or datastore that exceeds a certain threshold. The administrator decides what that threshold should be and what is considered abnormal behavior (or at least interesting enough behavior to be monitored). For the most part, monitoring across ESXi hosts and datastores will be consistent. For example, administrators will define a threshold that is worthy of notification—such as CPU, memory, or network utilization—and configure an alarm across all hosts for monitoring the corresponding counter. Similarly, administrators may define a threshold for datastores, such as the amount of free space available, and configure an alarm across all datastores to monitor that metric.

    However, when looking at VM monitoring, it might be more difficult to come up with a single baseline that works for all VMs. Specifically, think about enterprise applications that must perform well for extended periods of time. For these types of scenarios, administrators will want custom alarms for earlier notifications of performance problems. This way, instead of reacting to a problem, administrators can proactively try to prevent problems from occurring.

    For VMs with similar functions like domain controllers and DNS servers, it might be possible to establish baselines and thresholds covering all such infrastructure servers. In the end, the beauty of vCenter Server's alarms is in the flexibility to be as customized and as granular as each individual organization needs.

  11. On the Actions tab, specify any additional actions that should be taken when the alarm is triggered.

    The following actions are available:

    • Send a notification email.
    • Send a notification trap via SNMP.
    • Change the power state on a VM.
    • Migrate a VM.

    If you leave the Actions tab empty, the alarm will alert administrators only within the vSphere Web Client. For now, leave the Actions tab empty.

    CONFIGURING VCENTER SERVER FOR EMAIL AND SNMP NOTIFICATIONS

    To have vCenter Server send an email for a triggered alarm, you must configure vCenter Server with an SMTP server. To configure the SMTP server, from the vSphere Web Client choose the vCenter Server from within the Navigator, and then select the Manage images Settings tab. Click the Edit button on the right, then select Mail in the list on the left, and then supply the SMTP server and the sender account. We recommend using a recognizable sender account so that when you receive an email, you know it came from the vCenter Server computer. You might use something like [email protected].

    Similarly, to have vCenter Server send an SNMP trap, you must configure the SNMP receivers in the same vCenter Server Settings dialog box under SNMP receivers. You may specify from one to four management receivers to monitor for traps.

  12. Click Finish to create the alarm.

The alarm is now created. To view the alarm you just created, select the datacenter object from the Navigator on the left, and then click the Manage images Alarm Definitions tab. You'll see your new alarm listed, as shown in Figure 13.4.

USING DURATION AND ACTION FREQUENCY WITH ALARMS

Let's create another alarm. This time you'll create an alarm that takes advantage of the parameters in the Triggers and Actions area. With the VM snapshot alarm, these parameters didn't really make any sense; all you really needed was just to be alerted when the snapshot exceeded a certain size. With other types of alarms, it may make sense to take advantage of these parameters.

Some triggers are simple state checks, like the VM State trigger, while with others you are able to specify a size, like VM Snapshot Size. There is also a third type, which is a combination of size and time (or duration). Triggers such as VM Network Usage will activate only if the size is over (or under) the set threshold for a set period of time.

As you may have noticed when creating the previous example alarm, alarms have two configurable states: Warning, and Critical. When configuring alarm triggers, you can set the level for both warning and critical conditions, anything below these conditions is considered “Normal”. The transition between these conditions then “triggers” a set of “actions” that are configured on the Actions screen. You can set actions for both transition directions at both criticality levels:

Normal → Warning

Warning → Critical

Critical → Warning

Warning → Normal

FIGURE 13.4 The Defined In column shows where an alarm was defined.

images

The Repeat Actions Every parameter controls the period of time during which a triggered alarm is not reported again. Using the built-in VM CPU usage alarm as our example, the Frequency parameter is set, by default, to 5 minutes. This means that a VM whose CPU usage triggers the activation of the alarm won't get reported again—assuming the condition or state is still true—for 5 minutes.

With all this information in mind, let's walk through another example of creating an alarm. This time we'll use a trigger to take advantage of duration and action frequency.

Perform the following steps to create an alarm that is triggered based on VM network usage:

  1. Launch the vSphere Web Client if it is not already running, and connect to a vCenter Server instance.
  2. Navigate to an inventory view, such as Hosts And Clusters or VMs And Templates.
  3. Select the datacenter object from the Navigator on the left.
  4. Select the Manage tab from the content area in the middle.
  5. Select the Alarm Definitions button just below the tab bar to show alarm definitions.
  6. Click the add/plus icon to create a new alarm.
  7. Supply an alarm name and description.
  8. Set the Monitor drop-down list to Virtual Machines.
  9. Select the radio button marked “Monitor For Specific Conditions Or State, For Example, CPU Usage,” and click Next.
  10. On the Triggers screen of the Alarm Definition dialog box, click the plus/add icon to add a new trigger.
  11. Add a Trigger of VM Network Usage (kbps) type.
  12. Set Condition to Is Above.
  13. Set the value of the Warning column to 500, and leave the Condition Length setting at 5 minutes.
  14. Set the value of the Alert column to 1000, and leave the Condition Length setting at 5 minutes.
  15. On the Actions tab, click the plus/add icon and add a “Send a notification email” action.
  16. For this newly created action, ensure that Normal images Warning is set at Once and Warning images Critical is set to Repeat.
  17. Finally, set Repeat Actions Every to 15 minutes.
  18. Click Finish to create the alarm.

This alarm will now send email alerts if the VM network usage goes above 500 kbps for more than 5 minutes, but only once. If the VM network usage goes above 1,000 kbps for more than 5 minutes, an email will be sent again and then every 15 minutes advising of this critical state until the alarm is reset to green manually or the usage drops below 1,000 kbps.

ALARMS ON OTHER VCENTER SERVER OBJECTS

Although the two alarms you've created so far have been specific to VMs, the process is similar for other types of objects within vCenter Server.

Alarms can have more than just one trigger condition. The alarms you've created so far had only a single trigger condition. For an example of an alarm that has more than one trigger condition, look at the built-in alarm for monitoring host connection and power state. Figure 13.5 shows the two trigger conditions for this alarm. Note that that ALL is selected in the “Trigger if” drop-down menu; it ensures that only powered-on hosts that are not responding will trigger the alarm.

FIGURE 13.5 You can combine multiple triggers to create more complex alarms.

images

It might seem obvious, but it's important to note that you can have more than one alarm for an object.

As with any new alarm, testing its functionality is crucial to make sure you get the desired results. You might find that the thresholds you configured are not optimized for your environment and either are not activating the alarm when they should or are activating the alarm when they shouldn't. In these cases, edit the alarm to set the thresholds and conditions appropriately. Or, if the alarm is no longer needed, right-click the alarm, and choose Remove to delete it.

You'll be able to edit or delete alarms only if two conditions are met. First, the user account with which you've connected to vCenter Server must have the appropriate permissions granted for you to edit or delete alarms. Second, you must be attempting to edit or delete the alarm from the object on which it was defined. Think back to the discussion on alarm scope and this makes sense. You can't delete an alarm from the datacenter object when that alarm was defined on the vCenter Server object. You must go to the object where the alarm was defined to edit or delete the alarm.

Now that you've seen some examples of creating alarms—and keep in mind that creating alarms for other objects within vCenter Server follows the same basic steps—let's take a look at managing alarms.

Managing Alarms

Several times so far in this chapter we've directed you to the Alarms Definitions tab within the vSphere Web Client. Until now, you've been working with the definitions, looking at defined alarms. There is, however, another view to the alarms, and that's the Triggered Alarms view. Figure 13.6 shows the Triggered Alarms view, which you access by selecting an object within the vCenter Web Client, then clicking the Monitor tab images Issues images Triggered Alarms.

FIGURE 13.6 The Triggered Alarms view shows the alarms that vCenter Server has activated.

images

The Monitor images Issues images Triggered Alarms area shows all the activated alarms for the selected object and all child objects, while on the right hand pane of the vSphere Web Client in the Global Alarm area, all alarms within vCenter are shown. In Figure 13.6, a Virtual Machine object is selected, so the Triggered Alarms view shows all activated alarms for this VM.

GETTING TO THE TRIGGERED ALARMS VIEW QUICKLY

The vSphere Web Client provides a handy view in the bottom-right corner that displays all the currently triggered alarms. Clicking on these alarms takes you to the Triggered Alarms view of the object on which the alarm is triggered. You can also acknowledge or reset the alarm to green from this panel.

However, if only the VM had been selected, the Triggered Alarms view on the Alarms tab for that VM would show only the two activated alarms for that particular VM. This makes it easy to isolate the specific alarms you need to address.

After you are in Triggered Alarms view for a particular object, a few actions are available to you for each of the activated alarms. For alarms that monitor resource consumption (that is, the alarm definition uses the Monitor For Specific Conditions Or State, For Example, CPU Usage, Power State setting selected under Alarm Type on the General tab), you have the option to acknowledge the alarm. To acknowledge the alarm, right-click the alarm and select Acknowledge.

When an alarm is acknowledged, vCenter Server records the time the alarm was acknowledged and the user account that acknowledged the alarm. As long as the alarm condition persists, the alarm will remain in the Triggered Alarms view but is grayed out. When the alarm condition is resolved, the activated alarm disappears.

For an alarm that monitors events (this would be an alarm that has the “Monitor For Specific Events Occurring On This Object, For Example, VM Powered On” option selected under Alarm Type on the General tab), you can either acknowledge the alarm, as described previously, or reset the alarm status to green. Figure 13.7 illustrates this option.

Resetting an alarm to green removes the activated alarm from the Triggered Alarms view, even if the underlying event that activated the alarm hasn't actually been resolved. This behavior makes sense if you think about it. Alarms that monitor events are merely responding to an event being logged by vCenter Server; whether the underlying condition has been resolved is unknown. So, resetting the alarm to green just tells vCenter Server to act as if the condition has been resolved. Of course, if the event occurs again, the alarm will be triggered again.

Now that we've looked at alarms for proactive performance monitoring, let's move on to using vCenter Server's performance charts to view even more information about the behavior of VMs and ESXi hosts in your vSphere environment.

FIGURE 13.7 For event-based alarms, you also have the option to reset the alarm status to green.

images

Working with Performance Charts

Alarms are a great tool for alerting administrators of specific conditions or events, but they don't provide the detailed information that administrators sometimes need, such as a resource being used that is still under a warning or critical state. This is where vCenter Server's performance charts come in. vCenter Server has many features for creating and analyzing charts. Without these charts, analyzing the performance of a VM would be nearly impossible. Installing agents inside a VM will not provide accurate details about the server's behavior or resource consumption because a VM is configured with virtual devices. Only the VMkernel knows the exact amount of resource consumption for any of those devices because it acts as the arbitrator between the virtual hardware and the physical hardware. In most virtual environments, the VM's virtual devices can outnumber the actual physical hardware devices, necessitating the complex sharing and scheduling abilities in the VMkernel.

By clicking the Monitor images Performance tab for a datacenter, cluster, host, or VM, you can learn a wealth of information. Before you use these charts to help analyze resource consumption, we need to talk about performance charts and legends. We'll start by covering the two different layouts available in performance charts: the Overview layout and the Advanced layout.

Overview Layout

The Overview layout is the default view when you access the Monitor images Performance tab. Figure 13.8 shows you the Overview layout of the Performance tab for an ESXi host. Note the scroll bars; there's a lot more information here than the vSphere Web Client can fit in a single screen.

FIGURE 13.8 The Overview layout provides information on a range of performance counters.

images

At the top of the Overview layout are options to change the view and the time range. The contents of the View drop-down list change depending on the object you select in the vSphere Web Client. Table 13.1 lists the different options available for each object.

TABLE 13.1: View options in the Overview area of the Performance tab

SELECTED OBJECT VIEW OPTIONS
Datacenter Clusters, Storage
Cluster Home, Resource Pools & Virtual Machines, Hosts
Resource pool Home, Resource Pools & Virtual Machines
Host Home, Virtual Machines
Virtual machine Home, Storage

Next to the View drop-down list is an option to change the time range for the data currently displayed in the various performance charts. This allows you to set the time range to real time, a day, a week, a month, a year, or a custom value. The Realtime time range setting displays the last hour of data and automatically refreshes every 20 seconds, while the other time range settings do not automatically refresh.

Below these controls are the actual performance charts. The layout and the charts that are included vary based on the object selected and the option chosen in the View drop-down list. While we don't have the room to list all of them, we show you two examples, in Figure 13.9 and Figure 13.10. I encourage you to explore and find the layouts that work best for your environment and, more important, layouts that clearly show you the performance information you require.

FIGURE 13.9 The Virtual Machines view of the Performance tab for an ESXi host in Overview layout offers both per-VM and summary information.

images

The Overview layout works well if you need a broad overview of the performance data for a datacenter, cluster, resource pool, host, or VM. But what if you need more specific data in a more customizable format? The Advanced layout is the answer, as you'll see in the next section.

Advanced Layout

Although it's called the Advanced layout, to begin with it looks somewhat simpler than the Overview layout. There is only a single chart within this view, but don't let this fool you because there are a significant number of configuration options for this lone performance chart.

Figure 13.11 shows the Advanced layout of the Performance tab for a cluster of ESXi hosts. Here, in the Advanced layout, is where the real power of vCenter Server's performance charts is made available to you.

FIGURE 13.10 The Storage view of the Performance tab for a VM in Overview layout displays a break-down of storage utilization.

images

FIGURE 13.11 The Advanced layout of the Performance tab provides extensive controls for viewing performance data.

images

At the right of the Advanced layout, you'll find a View drop-down list to quickly switch chart settings, followed by buttons to refresh the chart or to export the chart. The Refresh button refreshes the data, while the Export button allows you to export the chart as a JPEG, PNG graphic, or CSV document. We'll discuss this functionality in the section “Exporting Performance Graphs.” On each side of the chart are units of measurement. In Figure 13.11, the counters selected are measured in percentages and megahertz. Depending on the counters chosen, there may be only one unit of measurement, but there will be no more than two. Next, on the horizontal axis, is the time interval. Below that, the performance chart legend provides color-coded keys to help the user find a specific object or item of interest. This area also breaks down the chart into the object being measured; the measurement being used; the units of measure; and the Latest, Maximum, Minimum, and Average measurements recorded for that object.

Hovering the mouse pointer over the chart at a particular recorded interval of interest displays the data points at that specific moment in time. Another nice feature of the charts is the ability to emphasize a specific object so that you can more easily select this object from other objects. Clicking the specific item in the chart legend at the bottom will emphasize the object and its representative color.

Now that you have a feel for the Advanced layout, let's take a closer look at the Chart Options link. This link exposes vCenter Server's functionality in creating highly customized performance charts and is where all the nuts and bolts are configured for this feature. Figure 13.12 shows the Chart Options dialog box. This dialog box is the central place where you will come to customize vCenter Server's performance charts; you can also just double-click the chart to display this dialog box. From here, you select the counters to view, the time ranges, and the kind of chart (Line Graph or stacked graph) to display.

FIGURE 13.12 The Chart Options dialog box offers tremendous flexibility to create exactly the performance chart you need.

images

Because there is so much information available in the Chart Options dialog box, we've grouped the various options and types of information into the sections that follow.

CHOOSING A CHART METRICS AND COUNTERS

On the left side of the Chart Options dialog box (shown in Figure 13.12), you can choose which metric to monitor or analyze. All the available chart metrics are listed here, but only a subset of these are available, depending on the object selected to monitor:

  • CPU
  • Cluster Services
  • Datastore
  • Disk
  • Memory
  • Network
  • Power
  • Storage Adapter
  • Storage Path
  • System
  • vSphere Flash Module
  • Virtual Disk
  • Virtual Machine Operations
  • vSphere Replication

The actual selections available in this area change depending on the type of object that you have selected within the vCenter Web Client. That is, the options available when viewing the Monitor images Performance tab for an ESXi host are different from the options available when viewing the Monitor images Performance tab of a VM, a cluster, or a datacenter.

Within each of these resources, different objects and counters are available. Be aware that other factors affect what objects and counters are available to view; for example, in some cases the real-time interval shows more objects and counters than other intervals. A description field within the counters list explains what each counter represents. If this description does not fit within the Chart Options dialog box, simply mouse over it to view the full text. The next few sections list the various counters that are available for the different resource types in the Chart Options dialog box.

VIEWING CPU PERFORMANCE INFORMATION

If you select the CPU resource type in the Chart Options dialog box, you can choose what specific objects and counters you'd like to see in the performance chart. Note that the CPU resource type is not available when viewing the Performance tab of a datacenter object (DC). It is available for clusters (CL), ESXi hosts (ESXi), resource pools (RP), and individual virtual machines (VM).

Table 13.2 lists the most important objects and counters available for CPU performance information. A complete list of all CPU performance counters can be found at www.sybex.com/go/masteringvsphere.

TABLE 13.2: Available CPU performance counters

images

images

Quite a bit of CPU performance information is available. In the section “Monitoring CPU Usage,” we'll discuss how to use these CPU performance objects and counters to monitor CPU usage.

VIEWING MEMORY PERFORMANCE INFORMATION

If you select the Memory resource type in the Chart Options section of the Chart Options dialog box, you can display different objects and counters. The Memory resource type is not available when viewing the Performance tab of a datacenter object. It is available for clusters, ESXi hosts, resource pools, and individual VMs.

Table 13.3 lists the most important objects and counters available for memory performance information. A complete list of all memory performance counters can be found at www.sybex.com/go/masteringvsphere.

TABLE 13.3: Available memory performance counters

images

images

In the section “Monitoring Memory Usage,” you'll get the opportunity to use these different objects and counters to monitor how ESXi and VMs are using memory.

VIEWING DISK PERFORMANCE INFORMATION

Disk performance is another key area that vSphere administrators need to monitor. Table 13.4 shows you the most important objects and counters available for disk performance information. A complete list of all disk performance counters can be found at www.sybex.com/go/masteringvsphere.

TABLE 13.4: Available disk performance counters

images

Note that these counters aren't supported for datacenters, clusters, and resource pools, but they are supported for ESXi hosts and VMs. Not all counters are visible in all display intervals.

You'll use these counters in the section “Monitoring Disk Usage,” later in this chapter.

VIEWING NETWORK PERFORMANCE INFORMATION

To monitor network performance, the vCenter Server performance charts cover a wide collection of performance counters. Network performance counters are available only for ESXi hosts and VMs; they are not available for datacenter objects, clusters, or resource pools.

Table 13.5 shows the most important objects and counters available for network performance information. A complete list of all network performance counters can be found at www.sybex.com/go/masteringvsphere.

TABLE 13.5: Available network performance counters

images

You'll use these network performance counters in the section “Monitoring Network Usage” later in this chapter.

VIEWING SYSTEM PERFORMANCE INFORMATION

ESXi hosts and VMs also offer some performance counters in the System resource type. Datacenters, clusters, and resource pools do not support any system performance counters.

Table 13.6 lists the most important objects and counters available for system performance information. A complete list of all system performance counters can be found at www.sybex.com/go/masteringvsphere.

TABLE 13.6: Available system performance counters

images

The majority of these counters are valid only for ESXi hosts, and they all center on how resources are allocated or how the ESXi host itself is consuming CPU resources or memory.

VIEWING DATASTORE PERFORMANCE INFORMATION

Monitoring datastore performance allows you to see the performance of the whole datastore instead of using disk counters per VM. Datastore performance counters are available only for ESXi hosts and VMs; they are not available for datacenter objects, clusters, or resource pools.

Table 13.7 shows the most important objects and counters available for datastore performance information. A complete list of all datastore performance counters can be found at www.sybex.com/go/masteringvsphere.

TABLE 13.7: Available datastore performance counters

images

VIEWING STORAGE PATH PERFORMANCE INFORMATION

Storage Path is one of the new categories of performance counters. As the name suggests, these counters can help you troubleshoot storage path problems. Storage path counters are available only for ESXi; they are not available for datacenter objects, clusters, VMs, or resource pools.

Table 13.8 shows the objects and counters available for storage path performance information.

TABLE 13.8: Available storage path performance counters

images

VIEWING OTHER PERFORMANCE COUNTERS

The following performance counter types are also available:

  • ESXi hosts participating in a cluster also have a resource type of Cluster Services, with two performance counters: CPU Fairness and Memory Fairness. Both of these counters show the distribution of resources within a cluster.
  • The datacenter object contains a resource type marked as Virtual Machine Operations. This resource type contains performance counters that simply monitor the number of times a particular VM operation has occurred. These include VM Power-On Events, VM Power-Off Events, VM Resets, vMotion Operations, and Storage vMotion Operations.

SETTING A CUSTOM INTERVAL

Just as with the Overview layout, within each of the resource types, you have a choice of intervals to view. Some objects offer a Real-Time option; this option shows what is happening with that resource right now, with a historical view over the past hour, and the charts automatically refresh every 20 seconds. The others are self-explanatory in their time span, but note that they do not refresh automatically. The Custom option allows you to specify exactly what you'd like to see on the performance chart. For example, you could specify that you'd like to see performance data for the last 8 hours. Having all of these interval options allows you to choose exactly the right interval necessary to view the precise data you're seeking.

MANAGING CHART SETTINGS

There's one more area of the Chart Options dialog box that we'll discuss, and that's the Chart Options drop-down and Save Options As button along the top.

After you've gone through and selected the resource type, display interval, objects, and performance counters that you'd like to see in the performance chart, you can save that collection of chart settings using the Save Options As button. The vCenter Web Client prompts you to enter a name for the saved chart settings. After a chart setting is saved, you can easily access it again from the drop-down list at the top of the performance chart Advanced layout. Figure 13.13 shows the View drop-down list, where two custom chart settings—CPU-8hr View and MEM - Overhead—are shown. By selecting either of these from the View drop-down list, you can quickly switch to those settings. This allows you to define the performance charts that you need to see and then quickly switch between them.

FIGURE 13.13 You can access saved chart settings from the View drop-down list.

images

If you have a custom chart saved, the Chart Options dialog box allows you to delete chart settings you've saved but no longer need.

In addition to offering you the option of saving the chart settings, vCenter Server also allows you to save the chart.

EXPORTING PERFORMANCE CHARTS

When we first introduced you to the Advanced layout view of the Performance tab, we briefly mentioned the Export button. This button, found in the upper-right corner of the Advanced layout, allows you to save the results of the performance chart to an external file for long-term archiving, analysis, or reporting.

When you click the Export button, a standard Save dialog box appears. You have the option of choosing where to save the resulting file as well as the option of saving the chart either as a graphic file or as a comma-separated values (CSV) file. If you are going to perform any additional analysis, the option to save the chart data as an Excel spreadsheet is quite useful. The graphics options are useful when you need to put the performance data into a report of some sort.

There's a lot of information exposed via vCenter Server's performance charts. We'll revisit the performance charts again in the sections on monitoring specific types of resources later in this chapter. First we'll show you how vCenter Operations Manager can help you monitor and troubleshoot your vSphere environment.

Understanding vCenter Operations Manager

vCenter Operations (vC Ops) Manager is the primary component of the VMware vCenter Operations Management Suite. This suite is designed to go hand in hand with not just vSphere but VMware's whole vCloud suite of infrastructure products, providing additional management capabilities on the operational side of things. The full enterprise version of vCenter Operations Management Suite includes the following components:

  • vCenter Operations Manager
  • vCenter Configuration Manager
  • vFabric Hyperic
  • vCenter Infrastructure Navigator
  • vCenter Chargeback Manager

While we could write a whole book on the vC Ops suite, this book is about vSphere, so we will concentrate on just what's available to you for free without the need to purchase anything else.

Installing vC Ops

The vC Ops Manager is a two-VM vApp that comes as a bundled Open Virtualization Format (OVA) package and is a 1.4 GB download from VMware. All customers that have vSphere licensing are able to download and install this product into Foundation mode without the need for a license key. When operating in Foundation mode, vC Ops does not allow a lot of access to the advanced features such as custom dashboards, root cause analysis, and automated workflow triggers or compliance views, but it does extend the functionality of the vSphere Web Client somewhat. Next we'll show you how to install vC Ops, and then later we will explain the features that are available with the Foundation edition.

DEPLOYING THE VC OPS VAPP

This procedure is fairly straightforward and mirrors most appliance-based installs (like the one performed for the vCenter Virtual Appliance in Chapter 3, “Installing and Configuring vCenter Server”), but we'll step you through the process in case you skipped to this section. We will assume that you've already logged on to my.vmware.com and download the 1.4 GB OVA file. Here are the steps:

  1. Connect to a vCenter Server instance with the vSphere Web Client.
  2. Navigate to the Hosts And Clusters or VMs And Templates view.
  3. Right-click the cluster object that you wish to host the vC Ops vApp.
  4. Select Deploy OVF Template.
  5. Browse and select the downloaded file and then click Next.
  6. Review the deployment details, noticing the large disk requirements if thickly provisioned (344 GB). Click Next to continue through the wizard.
  7. Accept the license agreement and then click Next.
  8. Give your vC Ops installation vApp a name and select the datacenter and then the destination folder.
  9. Select the size of the configuration that best suits your environment. Click Next to continue.

    VCENTER OPERATIONS MANAGER SIZING

    When installing vC Ops, you have three settings to choose for the environment size: small, medium, or large. While we would normally recommend sticking to VMware's predetermined values, even the smallest installation requires 4 vCPUs and 16 GB of RAM. While you cannot change these values in the installer, you can modify the resources allocated to the VMs after they are deployed. For pilots, testing, or lab work, there is nothing stopping you from lowering the allocation, however, VMware may not support your installation if you deviate from its standard sizings.

  10. Choose the datastore and provisioning type, keeping in mind the larger requirement if the disk format is set to Thick. Click Next to continue.
  11. Ensure that all the relevant networking details are filled in and the vApp is connected to the correct network port group. Once this is complete, click Next to go to the final configuration page.
  12. Set the correct time zone for this server and then allocate both the Analytics and UI VMs with their IP addresses. Click Next to go to the review page.
  13. Ensure that all the details are correct before clicking Finish to deploy the vC Ops vApp. The vC Ops appliance VMs will now be deployed as a vApp and then can finally be powered on.

REGISTERING WITH VCENTER SERVER

Once the vC Ops appliance is deployed and powered on, you need to register it with the vCenter Server to allow for metric retrieval and also for vSphere Web Client integration. The following steps outline how to register:

  1. With the vApp already powered on and finished booting, open a web browser and point it to the IP address of the UI VM.
  2. Log in with the username admin and the password admin.
  3. Enter the details of your vCenter Server instance and click Next. You may be prompted to accept the SSL certificate; if so, click Yes.
  4. The vC Ops admin page will now prompt you to change both the default root and admin passwords. For reference, the default password is admin for the admin account and the default password is vmware for the root account. When they are both changed, click Next to continue.
  5. You will now be prompted to enter the details for the vCenter Server instance you wish to register with. Once all your details are entered, click Next to check the connection and then again to continue to the final screen.
  6. On the final screen, the vC Ops admin page checks for linked mode vCenter Servers. Click Finish to complete the vC Ops to vCenter registration process; this can take some time depending on the number of objects managed by vCenter.

Now that vC Ops is registered with vCenter, it's time to reload the vCenter Web Client and take a look at the changes that have taken place.

vC Ops Foundation Features

As stated earlier, vC Op has a large number of features, but most of them are disabled with the Foundation version. There are four key features and capabilities that are turned on: Proactive Smart Alerts, Intelligent Operations Groups, vSphere Health Monitoring, and Self-Learning Performance Analytics.

The alerts within Proactive Smart Alerts are different from the standard alerts discussed earlier in this chapter. This feature ties into the Self-learning Performance Analytics feature and will begin to alert you when things are outside their normal operating levels. The Self-Learning Performance Analytics feature will look at all the metrics captured over time and work out what is considered “normal.” For example, your environment might have relatively steady workloads during the day, but at 11 p.m., when the backups start, the storage and network utilization peak considerably higher than during the day. With a traditional alert, your threshold would need to take into consideration the peak for your backup window, but with self-learning analysis, vC Ops knows that it is “normal” for high storage and network IO after 11 p.m. and will not alert you. However, if during the day the IO peaked, an alert would still be generated.

Intelligent Operations Groups is a feature that allows you to create a group for monitoring based on a ruleset. You can then apply this ruleset automatically to ensure that all objects are added to the group according to the ruleset. After you add a group of objects to a group, you can monitor them as a whole instead of just individually. The other handy feature with groups is that you can nest groups within groups, allowing you to be as granular as necessary when categorizing objects for reporting.

vSphere Health Monitoring is the most visible feature that you will encounter with vC Ops Foundation, adding a colorful badge to the Summary tab of some objects within the vCenter Web Client. This badge will change color depending on the current health number displayed on the badge, with 100 being the healthiest and 0 being very unhealthy, as shown in Figure 13.14. But what is “health” you ask? Good question!

FIGURE 13.14 vCenter Operations Manager integrates with the vSphere Web Client to show health information about the currently selected object, in this case the vCenter server itself.

images

Health is a top-level indicator made up of the subcomponents faults, workload, and anomalies. Faults have the most weight when calculating the overall health badge because they are problems that are happening right now. Workload and anomalies are combined to assist with understanding current performance characteristics. As you can probably imagine, all of these components have their own subcomponents too, but how everything is calculated behind the scenes is not relevant for this discussion. The health badge gives an overall view of how an object within vCenter is performing and is calculated every 5 minutes. If the health monitor spots a problem, you can dive in to find out the cause. We'll now explain the last tool in the toolbox, resxtop, and then we'll show you how to combine all of the tools to keep your environment in top condition.

Working with resxtop

In addition to alarms and performance charts, VMware provides resxtop to help monitor performance and resource usage. In early ESX versions, several tools were available on the service console command line. Later, VMware released ESXi and limited the number of commands available directly on the host but developed a special virtual appliance that provides a command-line interface for managing ESX and ESXi hosts called the vSphere Management Assistant (vMA). You can use the vMA to run commands against the ESXi host as if they were run on the console. In ESXi 3.x and ESXi 4.0, access to the console was unsupported. Since ESXi 4.1, VMware has supported the console, but it is locked and therefore inaccessible by default. More commands are available on the console than with previous ESXi versions; however, VMware still advises using the vMA for running commands against ESXi hosts for a few reasons—one of which is to provide yet another means of centralized host management.

Using resxtop

You can monitor VM performance using a command-line tool named resxtop. A great reason to use resxtop is the immediate feedback it gives you. Using resxtop, you can monitor all four major resource types (CPU, disk, memory, and network) on a particular ESXi host. Figure 13.15 shows some sample output from resxtop.

FIGURE 13.15 resxtop shows real-time information on CPU, disk, memory, and network utilization.

images

The resxtop command is included with the vMA, which is deployed like all OVF packaged virtual appliances. Simply download the vMA from the my.vmware.com website and import to your vSphere environment. For more detailed instructions on deploying OVFs, see the Deploying the vC Ops vApp section earlier in this chapter. Before you can actually view real-time performance data, though, you first have to tell resxtop which remote server you want to use. To launch resxtop and connect to a remote server, first connect to the vMA and then enter this command:

resxtop --server esxi-03.lab.local

You'll want to replace esxi-03.lab.local with the appropriate hostname or IP address of the ESXi host to which you want to connect. When prompted, supply a username and password, and then resxtop will launch. Once resxtop is running, you can use single-letter commands to switch among the various views.

esxtop IS ONLY FOR VMWARE ESXI SHELL

It is still possible to run esxtop, which you might know from former ESX versions, in the VMware ESXi shell, but it is highly recommended to use only the VMware vMA.

Upon launch, resxtop defaults to showing CPU utilization, as illustrated in Figure 13.15. At the top of the screen are summary statistics; below that are statistics for specific VMs and VMkernel processes. To show only VMs, press V. Be aware that resxtop, like many Linux commands, is case sensitive, so you'll need to be sure to use an uppercase V in order to toggle the display of VMs only.

Monitoring CPU usage with c Two CPU counters of interest to view with resxtop are the CPU Used (%USED) and Ready Time (%RDY) counters. You can also see these counters in the VM charts, but with resxtop they are calculated as percentages. The %RDY counter is also helpful in determining whether you have overallocated CPU resources to the VM. This might be the case if, for example, you've allocated two vCPUs to a VM that really needs only a single vCPU. While in CPU mode, you can also press lowercase e to expand a VM's CPU statistics so that you can see the different components that are using CPU time on behalf of a VM. This is quite useful in determining what components of a VM may be taking up CPU capacity.

If you switch away to another resource, press C (uppercase or lowercase) to come back to the CPU counters display. At any time when you are finished with resxtop, you can simply press q (lowercase only) to exit the utility and return to the vMA command prompt.

resxtop SHOWS SINGLE HOSTS ONLY

Remember, resxtop shows only a single ESXi host. In an environment where vMotion, vSphere Distributed Resource Scheduler (DRS), and vSphere High Availability (HA) have been deployed, VMs may move around often. It is possible that while you are monitoring a VM, it is suddenly moved off the host by a vMotion action. Also be aware of this when capturing performance in batch mode.

Monitoring Memory Usage with m Memory is one of the most important components of your ESXi host because this resource is usually one of the first to get exhausted.

To monitor memory usage with resxtop, press m (lowercase only). This gives you real-time statistics about the ESXi host's memory usage in the top portion and the VM's memory usage in the lower section. As with CPU statistics, you can press V (uppercase only) to show only VMs. This helps you weed out VMkernel resources when you are trying to isolate a problem with a VM. The %ACTV counter, which shows current active guest physical memory, is a useful counter, as are the %ACTVS (slow-moving average for long-term estimates), %ACTVF (fast-moving average for short-term estimates), %ACTVN (prediction of %ACTV at next sampling), and SWCUR (current swap usage) counters.

Monitoring Network Statistics with n Networking in a vSphere environment is often taken for granted, but while your environment grows, you'll learn that keeping an eye on network performance is essential.

To monitor network statistics about the virtual machine network interface cards (vmnics), individual VMs, or VMkernel ports used for iSCSI, VMotion, and NFS, press n (lowercase only). The columns showing network usage include packets transmitted and received and megabytes transmitted and received for each vmnic or port. Also shown in the DNAME column are the vSwitches or dvSwitches and, to the left, what is plugged into them, including VMs, VMkernel, and Service Console ports. If a particular VM is monopolizing the vSwitch, you can look at the amount of network traffic on a specific switch and the individual ports to see which VM is the culprit. Unlike in other resxtop views, you can't use V (uppercase only) here to show only VMs.

Monitoring Disk I/O Statistics with d Memory and disk I/O are considered the most important components in your vSphere environment. While memory is important because it gets exhausted first, disk I/O is often overlooked even though bad disk performance will directly impact the VMs performance.

To monitor disk I/O statistics about each of the disk adapters, press d (lowercase only) and press u (lowercase only) for disk devices and v (lowercase only) for disk VM. As with some other views, you can press V (uppercase only) to show only VMs. The columns labeled READS/s, WRITES/s, MBREAD/s, and MBWRTN/s are most often used to determine disk loads. Those columns show loads based on reads and writes per second and megabytes read and written per second.

The resxtop command also lets you view CPU interrupts by pressing i. This command will show you the device(s) using the interrupt and is a great way to identify VMkernel devices, such as a vmnic, that might be sharing an interrupt with the Service Console. This sort of interrupt sharing can impede performance.

Capturing and Playing Back Performance Data with resxtop

Another great feature of resxtop is the ability to capture performance data for a short period of time and then play back that data. Using the command vm-support, you can set an interval and duration for the capture.

Perform the following steps to capture data to be played back on resxtop:

  1. Using PuTTY (Windows) or a terminal window (Mac OS X or Linux), open an SSH session to an ESXi host. Note that this requires enabling the ESXi Shell and SSH, both of which are disabled by default.
  2. Enter the su - command to assume root privileges.
  3. While logged in as root or after switching to the root user, change your working directory to /tmp by issuing the command cd /tmp.
  4. Enter the command vm-support -p -i 10 -d 180. This creates a resxtop snapshot, capturing data every 10 seconds, for the duration of 180 seconds.
  5. The resulting file is a tarball and is compressed with gzip. You must extract it with the command tar -xzf esx*.tgz. This creates a vm-support directory that is called in the next command.
  6. Run resxtop -R /vm-support* to replay the data for analysis.

Now that we've shown you the various tools (alarms, performance charts, vC Ops, and resxtop) that you will use to monitor performance in a vSphere environment, let's go through the four major resources—CPU, RAM, network, and disk—and see how to monitor the usage of these resources.

Monitoring CPU Usage

When monitoring a VM, it's always a good starting point to keep an eye on CPU consumption. Many VMs started out in life as underperforming physical servers. One of VMware's most successful sales pitches is being able to take all those lackluster physical boxes that are not busy and convert them to VMs. Once they are converted, virtual infrastructure managers tend to think of these VMs as simple, lackluster, and low-utilization servers with nothing to worry over or monitor. The truth, though, is quite the opposite.

When the server was physical, it had an entire box to itself. Now it must share its resources with many other workloads. In aggregate, they represent quite a load, and if some or many of them become somewhat busy, they contend with each other for the finite capabilities of the ESXi host on which they run. Of course, they don't know they are contending for resources because the VMkernel tries to make sure they get the resources they need. Virtual CPUs need to be scheduled, and ESXi does a remarkable job given that there are more VMs than physical processors most of the time. Still, the hypervisor can do only so much with the resources it has, and invariably there comes a time when the applications running in a VM need more CPU time than the host can give.

When this happens, it's usually the application owner who notices first and raises the alarm with the system administrators. Now the vSphere administrators have the task of determining why this VM is underperforming. Fortunately, vCenter Server provides a number of tools that make monitoring and analysis easier. These are the tools you've already seen: alarms, performance charts, vC Ops, and resxtop.

Let's begin with a hypothetical scenario. A help desk ticket has been submitted indicating that an application owner isn't getting the expected level of performance on a particular server, which in this case is a VM. As the vSphere administrator, you need to first delve deeper into the problem and ask as many questions as necessary to discover what the application owner needs to be satisfied with performance. Some performance issues are subjective, meaning some users might complain about the slowness of their applications, but they have no objective benchmark for such a claim. Other times, this is reflected in a specific benchmark, such as the number of transactions by a database server or throughput for a web server. In this case, our issue revolves around benchmarking CPU usage, so our application is CPU intensive when it does its job.

ASSESSMENTS, EXPECTATIONS, AND ADJUSTMENTS

If an assessment was done prior to virtualizing a server, there might be hard numbers to look at to give some details as to what was expected with regard to minimum performance or a service-level agreement (SLA). If not, the vSphere administrator needs to work with the application's owner to make more CPU resources available to the VM when needed.

vCenter Server's charts, which you have explored in great detail, are the best way to analyze usage, both short and long term. In this case, let's assume the help desk ticket describes a slowness issue in the last hour. As you've already seen, you can easily create a custom performance chart to show CPU usage over the last hour for a particular VM or ESXi host.

Perform the following steps to create a CPU chart that shows data for a VM from the last hour:

  1. Connect to a vCenter Server instance with the vSphere Web Client.
  2. Navigate to the Hosts And Clusters or VMs And Templates view.
  3. In the Navigator, select a virtual machine.
  4. Select the Monitor images Performance tab from the contents pane on the right, and then change the view to Advanced.
  5. Click the Chart Options link.
  6. In the Chart Options dialog box, select CPU from the resource type list. Select the Custom interval for the time span.
  7. Change the interval to Last 1 Hour(s).
  8. Set the chart type to Line Graph.
  9. Select the VM itself from the list of objects.
  10. From the list of counters, select CPU Usage In MHz (Average) and CPU Ready.
  11. Click OK to apply the chart settings.

CPU READY

CPU Ready shows how long a VM is waiting to be scheduled on a logical processor. A VM waiting many thousands of milliseconds to be scheduled on a processor might indicate that the ESXi host is overloaded, a resource pool has too tight a limit, or the VM has too few CPU shares (or, if no one is complaining, nothing at all). Be sure to work with the server or application owner to determine an acceptable amount of CPU Ready for any CPU-intensive VM.

This chart shows CPU utilization for the selected VM, but it won't necessarily help you get to the bottom of why this particular VM isn't performing as well as expected. In this scenario, we would fully expect the CPU Usage In MHz (Average) counter to be high; this simply tells you that the VM is using all the CPU cycles it can get. Unless the CPU Ready counters are also high, indicating that the VM is waiting on the host to schedule it onto a physical processor, you still haven't uncovered the cause of the slowness that triggered the help desk ticket. Instead, you'll need to move to monitoring host CPU usage.

Monitoring a host's overall CPU usage is fairly straightforward. Keep in mind that other factors usually come into play when looking at spare CPU capacity. Add-ons such as vMotion, vSphere DRS, and vSphere HA directly impact whether there is enough spare capacity on a server or a cluster of servers. Compared to previous versions of ESX, the VMkernel will usually not be as competitive for processor 0 because there are fewer processes to consume CPU time.

VMKERNEL STUCK ON O

In older ESX versions, the Service Console was stuck to processor 0 only. It wouldn't get migrated to other processors even in the face of heavy contention. In ESXi there is no Service Console anymore, but the VMkernel process is still stuck on processor 0.

Perform the following steps to create a real-time chart for a host's CPU usage:

  1. Launch the vSphere Web Client if it is not already running, and connect to a vCenter Server instance.
  2. Navigate to the Hosts And Clusters or VMs And Templates view.
  3. In the Navigator, select a host. This shows you the Summary tab.
  4. Click the Performance tab, and switch to Advanced view.
  5. Click the Chart Options link.
  6. In the Chart Options dialog box, select the CPU resource type and the Real-Time display interval.
  7. Set Chart Type to Stacked Graph (Per VM).
  8. Select all objects.

    You should see a separate object for each VM hosted on the selected ESXi host.

  9. Select the CPU Usage (Average) performance counter.
  10. Click OK to apply the chart settings and return to the Performance tab.

This chart shows the usage of all the VMs on the selected ESXi host in a stacked fashion. From this view, you should be able to determine whether there is a specific VM or group of VMs consuming abnormal amounts of CPU capacity.

VMKERNEL BALANCING ACT

Always remember that on an oversubscribed ESXi host, the VMkernel will load balance the VMs based on current loads, reservations, and shares represented on individual VMs and/or resource pools.

In this scenario, we identified the application within the VM as CPU bound, so these two performance charts should clearly identify why the VM isn't performing well. In all likelihood, the ESXi host on which the VM is running doesn't have enough CPU capacity to satisfy the requests of all the VMs. Your solution, in this case, would be to use the resource allocation tools described in Chapter 11, “Managing Resource Allocation,” to ensure that this specific application receives the resources it needs to perform at acceptable levels.

Monitoring Memory Usage

Monitoring memory usage, whether on a host or a VM, can be challenging. The monitoring itself is not difficult; it's the availability of the physical resource that can be a challenge. Of the four resources, memory can be oversubscribed without much effort. Depending on the physical form factor chosen to host VMware ESXi, running out of physical RAM is easy to do. Although the blade form factor creates a very dense consolidation effort, the blades are sometimes constrained by the amount of physical memory and network adapters that can be installed. But even with other regular form factors, having enough memory installed comes down to how much the physical server can accommodate and your budget.

If you suspect that memory usage is a performance issue, the first step is to isolate whether this is a memory shortage affecting the host (you've oversubscribed physical memory and need to add more memory) or whether this is a memory limit affecting only that VM (meaning you need to allocate more memory to this VM or change resource allocation policies). Normally, if the ESXi host is suffering from high memory utilization, the predefined vCenter Server alarm will trigger and alert the vSphere administrator. However, the alarm doesn't allow you to delve deeper into the specifics of how the host is using memory. For that, you'll need a performance chart.

Perform the following steps to create a real-time chart for a host's memory usage:

  1. Connect to a vCenter Server instance with the vSphere Web Client.
  2. Navigate to Hosts And Clusters view.
  3. In the Navigator, click an ESXi host. This shows you the Summary tab.
  4. Click the Performance tab, and switch to Advanced view.
  5. Click the Chart Options link.
  6. In the Chart Options dialog box, select the Memory resource type and the Real-Time display interval.
  7. Select Line Graph as the chart type. The host will be selected as the only available object.
  8. In the Counters area, select the Memory Usage (Average), Memory Overhead (Average), Memory Active (Average), Memory Consumed (Average), Memory Used by VMkernel, and Memory Swap Used (Average) counters.

    This should give you a fairly clear picture of how much memory the ESXi host is using.

  9. Click OK to apply the chart options and return to the Performance tab.

COUNTERS, COUNTERS, AND MORE COUNTERS

As with VMs, you can utilize a plethora of counters with a host to monitor memory usage. Which ones you select will depend on what you're looking for. It is common to monitor straight memory usage, but don't forget that there are other counters that could be helpful, such as Ballooning, Unreserved, VMkernel Swap, and Shared, just to name a few. The ability to assemble the appropriate counters for finding the right information comes with experience and depends on what is being monitored.

These counters, in particular the Memory Swap Used (Average) counter, will give you an idea of whether the ESXi host is under memory pressure. If the ESXi host is not suffering from memory pressure and you still suspect a memory problem, then the issue likely lies with the VM.

Perform the following steps to create a real-time chart for a VM's memory usage:

  1. Use the vSphere Web Client to connect to a vCenter Server instance.
  2. Navigate to either the Hosts And Clusters or the VMs And Templates view.
  3. In the Navigator, click a virtual machine. This shows you the Summary tab.
  4. Click the Performance tab, and switch to the Advanced view.
  5. Click the Chart Options link.
  6. In the Chart Options dialog box, select the Memory resource type and the Real-Time display interval.
  7. Select Line Graph as the chart type.
  8. In the list of counters, select to show the Memory Usage (Average), Memory Overhead (Average), Memory Consumed (Average), and Memory Granted (Average) counters. This shows memory usage, including usage relative to the amount of memory configured for the VM.
  9. Click OK to apply the chart options and return to the Performance tab.

From this performance chart, you will be able to tell how much of the memory configured for the VM is actually being used. This might reveal to you that the applications running inside that VM need more memory than the VM has been assigned and that adding more memory to the VM—assuming that there is sufficient memory at the host level—might improve performance.

Memory, like CPU, is just one of several different factors that can impact VM performance. Network usage is another area that can impact performance, especially perceived performance.

Monitoring Network Usage

vCenter Server's charts provide a wonderful tool for measuring the network usage of a VM or a host.

Monitoring network usage requires a slightly different approach than monitoring CPU or memory. With either CPU or memory, reservations, limits, and shares can dictate how much of these two resources can be consumed by any one VM. Network usage cannot be constrained by these mechanisms. Because VMs plug into a VM port group, which is part of a vSwitch on a single host, how the VM interacts with the vSwitch can be manipulated by the virtual switch's or port group's policy. For instance, if you need to restrict a VM's overall network output, you would configure traffic shaping on the port group to restrict the VM to a specific amount of outbound bandwidth. Unless you are using vSphere Distributed Switches or the Nexus 1000V third-party distributed virtual switch, there is no way to restrict VM inbound bandwidth on ESXi hosts.

VM ISOLATION

Certain VMs may indeed need to be limited to a specific amount of outbound bandwidth. Servers such as FTP, file and print, web and proxy servers, or any server whose main function is to act as a file repository or connection broker may need to be limited or the traffic may need to be shaped to an amount of bandwidth that allows it to meet its service target but not monopolize the host it runs on. Isolating any of these VMs to a vSwitch of its own is more likely a better solution, but it requires the appropriate hardware configuration.

To get an idea of how much network traffic is actually being generated, you can measure outgoing and incoming network traffic from a VM or host using the charts in vCenter Server. The charts can provide accurate information on the actual usage or ample information that a particular VM is monopolizing a virtual switch, especially using the Stacked Graph chart type.

Perform the following steps to create a real-time chart for a stacked graph of transmitted network usage by each VM on an ESXi host:

  1. Launch the vSphere Web Client if it is not already running, and connect to a vCenter Server instance.
  2. Navigate to either the Hosts And Clusters view or the VMs And Templates view.
  3. In the Navigator, click an ESXi host. This shows you the Summary tab.
  4. Click the Performance tab, and switch to Advanced view.
  5. Click the Chart Options link.
  6. From the Chart Options dialog box, select the Network resource type and the Real-Time display interval in the Chart Options area.
  7. Select a chart type of Stacked Graph (Per VM).
  8. In the objects list, be sure all the VMs are selected.
  9. In the list of counters, select the Network Data Transmit Rate counter.

    This gives you an idea of how much network bandwidth each VM is consuming outbound on this ESXi host.

  10. Click OK to apply the changes and return to the Performance tab.

What if you wanted a breakdown of traffic on each of the network interface cards (NICs) in the ESXi host instead of by VM? That's fairly easily accomplished by another trip back to the Chart Options dialog box.

Perform the following steps to create a real-time chart for a host's transmitted network usage by NIC:

  1. Connect to a vCenter Server instance with the vSphere Web Client.
  2. Navigate to the Hosts And Clusters view.
  3. In the Navigator, select an ESXi host. This will show you the Summary tab in the content area to the right.
  4. Select the Monitor images Performance subsection, and switch to Advanced view.
  5. Click the Chart Options link.
  6. Under Chart Options in the Chart Options dialog box, select the Network resource type and the Real-Time display interval.
  7. Set the chart type to Line Graph.
  8. In the objects list, select the ESXi host as well as all the specific NICs.
  9. Select the Network Data Transmit Rate and Network Packets Transmitted counters.
  10. Click OK to apply the changes and return to the Performance tab.

Very much as with the previous example for a VM, these two counters will give you a window into how much network activity is occurring on this particular host in the outbound direction for each physical NIC. This is especially relevant if you want to see different rates of usage for each physical network interface, which, by definition, represent different virtual switches.

Now that you've examined how to monitor CPU, memory, and network usage, there's only one major area left: monitoring disk usage.

Monitoring Disk Usage

Monitoring a host's controller or VM's virtual disk usage is similar in scope to monitoring net-work usage. This resource, which represents a controller or the storing of a VM's virtual disk on a type of supported storage, isn't restricted by CPU or memory mechanisms like reservations, limits, or shares. The only way to restrict a VM's disk activity is to assign shares on the individual VM, which in turn may have to compete with other VMs running from the same storage volume. vCenter Server's charts come to our aid again in showing actual usage for both ESXi hosts and VMs.

Perform the following steps to create a host chart showing disk controller utilization:

  1. Use the vSphere Web Client to connect to a vCenter Server instance.
  2. Navigate to the Hosts And Clusters view.
  3. In the Navigator, select an ESXi host.

    This shows you the Summary tab in the Details section on the right.

  4. Select the Performance tab, and switch to the Advanced view.
  5. Click the Chart Options link. This opens the Chart Options dialog box.
  6. Under Chart Options, choose the Real-Time display interval for the Disk resource type.
  7. Set the chart type to Line Graph.
  8. Selecting an object or objects—in this case a controller—and a counter or counters lets you monitor for activity that is interesting or necessary to meet service levels. Select the objects that represent the ESXi host and one of the disk controllers.
  9. In the counters list, select Disk Read Rate, Disk Write Rate, and Disk Usage (Average/Rate) to get an overall view of the activity for the selected controller.
  10. Click OK to return to the Performance tab.

This performance chart will give you an idea of the activity on the selected disk controller. But what if you want to see disk activity for the entire host by each VM? In this case, a Stacked Graph view can show you what you need.

STACKED VIEWS

A stacked view is helpful in identifying whether one particular VM is monopolizing a volume. Whichever VM has the tallest stack in the comparison may be degrading the performance of other VMs' virtual disks.

Now let's switch to the virtual machine view. Looking at individual VMs for insight into their disk utilization can lead to some useful conclusions. File and print VMs, or any server that provides print queues or database services, will generate some disk-related I/O that needs to be monitored. In some cases, if the VM is generating too much I/O, it may degrade the performance of other VMs running out of the same volume. Let's take a look at a VM's chart.

Perform the following steps to create a VM chart showing real-time disk controller utilization:

  1. Launch the vSphere Web Client if it is not already running, and connect to a vCenter Server instance.
  2. Navigate to either the Hosts And Clusters view or the VMs And Templates view.
  3. In the Navigator, click a virtual machine.

    This shows you the Summary tab in the Details section on the right.

  4. Select the Performance tab, and switch to Advanced view.
  5. Click the Chart Options link to open the Chart Options dialog box.
  6. Under Chart Options, select the Virtual Disk resource type and the Real-Time display interval.
  7. Set the chart type to Line Graph.
  8. Set both objects listed in the list of objects.
  9. In the list of counters, select Read Rate, Write Rate (Average/Rate).
  10. Click OK to apply these changes and return to the Performance tab.

With this chart, you should have an informative picture of this VM's disk I/O behavior. This VM is busy generating reads and writes for its application. Does the chart show enough I/O to meet a service-level agreement, or does this VM need some help? The charts allow administrators to make informed decisions, usually working with the application owners, so that any adjustments to improve I/O will lead to satisfied VM owners.

In addition, by looking at longer intervals of time to gain a historical perspective, you may find that a VM has become busier or fallen off its regular output. If the amount of I/O is just slightly impaired, then adjusting the VM's shares may be a way to prioritize its disk I/O ahead of other VMs sharing the volume. The administrator may be forced to move the VM's virtual disk(s) to another volume or LUN if share adjustments don't achieve the required results. You can use Storage VMotion, described in Chapter 6, “Creating and Configuring Storage Devices,” to perform this sort of LUN-based load balancing without any disruption to the end users.

PERFORMANCE MONITORING FROM THE INSIDE AND THE OUTSIDE

It's important to remember that the very nature of how virtualization operates means that it is impossible to use performance metrics from within a guest OS as an indicator of overall resource utilization. Here's why.

In a virtualized environment, each guest OS “sees” only its slice of the hardware as presented by the VMkernel. A guest OS that reports 100 percent CPU utilization isn't reporting that it's using 100 percent of the physical server's CPU but rather that it's using 100 percent of the CPU capacity given to it by the hypervisor. A guest OS that is reporting 90 percent RAM utilization is really only using 90 percent of the RAM made available to it by the hypervisor.

Does this mean that performance metrics gathered from within a guest OS are useless? No, but these metrics cannot be used to establish overall resource usage—only relative resource usage. You must combine any performance metrics gathered from within a guest OS with matching metrics gathered from outside the guest OS. By combining the metrics from within the guest OS with metrics from outside the guest OS, you can create a more complete view of how a guest OS is using a particular type of resource and therefore get a better idea of what steps to take to resolve any resource constraints.

For example, if a guest OS is reporting high memory utilization but the vCenter Server resource management tools are showing that the physical system has plenty of memory available, this tells you that the guest OS is using everything available to it and might perform better with more memory allocated to it.

Monitoring resources can be tricky, and it requires a good knowledge of the applications running in the VMs in your environment. If you are a new vSphere administrator, it's worth spending some time using vCenter Server's performance charts to establish some baseline behaviors. This helps you become much more familiar with the normal operation of the VMs so that when something unusual or out of the ordinary does occur, you'll be more likely to spot it.

The Bottom Line

Use alarms for proactive monitoring. vCenter Server offers extensive alarms for alerting vSphere administrators to excessive resource consumption or potentially negative events. You can create alarms on virtually any type of object found within vCenter Server, including datacenters, clusters, ESXi hosts, and VMs. Alarms can monitor for resource consumption or for the occurrence of specific events. Alarms can also trigger actions, such as running a script, migrating a VM, or sending a notification email.

Master It What are the questions a vSphere administrator should ask before creating a custom alarm?

Work with performance charts. vCenter Server's detailed performance charts are the key to unlocking the information necessary to determine why an ESXi host or VM is performing poorly. The performance charts expose a large number of performance counters across a variety of resource types, and vCenter Server offers functionality to save customized chart settings, export performance graphs as graphic figures or Excel workbooks, and view performance charts in a separate window.

Master It You find yourself using the Chart Options link in the Advanced layout of the Performance tab to set up the same chart over and over again. Is there a way to save yourself some time and effort so that you don't have to keep re-creating the custom chart?

Understanding vCenter Operations Manager. vCenter Operations Manager adds to the functionality of vSphere by assisting in both performance monitoring and troubleshooting. The vC Ops analytics VM collates metrics from hosts and vCenter to calculate badges. The Web UI VM provides a standalone web interface and also integrates with the vSphere Web Client.

Master It All standard vCenter server licenses come with vCenter Operations Manager Foundation. This version is limited to four main features. What are they?

Gather performance information using command-line tools. VMware supplies a few command-line tools that are useful in gathering performance information. For VMware ESXi hosts, resxtop provides real-time information about CPU, memory, network, or disk utilization. You should run resxtop from the VMware vMA. Finally, the vm-support tool can gather performance information that can be played back later using resxtop.

Master It Know how to run resxtop from the VMware vMA command line.

Monitor CPU, memory, network, and disk usage by ESXi hosts and VMs. Monitoring usage of the four key resources—CPU, memory, network, and disk—can be difficult at times. Fortunately, the various tools supplied by VMware within vCenter Server can lead the vSphere administrator to the right solution. In particular, using customized performance charts can expose the right information that will help a vSphere administrator uncover the source of performance problems.

Master It A junior vSphere administrator is trying to resolve a performance problem with a VM. You've asked this administrator to see if it is a CPU problem, and the junior administrator keeps telling you that the VM needs more CPU capacity because the CPU utilization is high within the VM. Is the junior administrator correct, based on the information available to you?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.14.172.93