Resource Monitoring with SyMON Software

The SyMON software can be used in several ways to monitor the resource usage of systems, this can be useful in both manual and automatically managed resource control environments. This section illustrates how to configure and use the SyMON software to perform basic monitoring operations and to discover the resources that are available in a system. The system used in the examples is an Ultra Enterprise 4000 configured with two 167MHz CPUs and about 300 disks.

The SyMON Health Monitor

The SyMON software includes a system health monitoring module that you can use in a resource management scenario to see if a system has enough resources to run comfortably. For example, if the CPU state is reported as red, that system probably needs either less work or more CPU power. Similarly, if the memory rule reports red, then the system may need more memory.

The health monitor is based on a set of complex rules developed over several years by Adrian Cockcroft, one of the authors of this book. The rules have become known as the “virtual adrian” rules as this is the name of the SE Toolkit script that first implemented them. The health monitor is not enabled by default when you first install the SyMON software because only the basic modules are loaded into the agent.

To load the health monitor module, start the SyMON program with the default administrative domain. Select the system and popup a menu. Then, select the Load Module option from the menu. In the example shown in FIGURE 12-2, a single red alert is present because one of the file systems on this machine is 98 percent full.

Figure 12-2. The SyMON Software Console


Next, scroll down and choose the Health Monitor module. It may already be loaded if the SyMON software has been pre-configured. Otherwise, select it and press the OK button.

Figure 12-3. Load Health Monitor Module


Now any Health Monitor alerts will be logged for this system. We could drill down to the subsystem that caused the alert, but we don't expect any Health Monitor alerts yet. Since there is already an unrelated disk space alert on this system, we will select the system and bring up the detailed view. The Details menu option was shown in FIGURE 12-2. When you select the Details option, a second window opens that is specific to the system being monitored.

The Browser tab of the host Details window shows the modules that are loaded. Under local applications (which opens up if you click on the bullet next to it) you will find the Health Monitor module. Inside it, you find the eight rules that are implemented based on virtual adrian. Each rule shows a few variables. The RAM rule that is displayed in FIGURE 12-4 shows that the current scan rate is zero so the rule value is a white box. If the ratio of scan rate to handspread went too high and the page residence time dropped below the pre-set threshold, this box would turn red and the red state would propagate up the hierarchy. To view and edit the rule attributes and thresholds, pop up a menu over the rule value.

Figure 12-4. Host Details Window


The best way to use these rules is to increase the thresholds on a system that is performing well until in normal use there are no warnings. Then as the load increases, you will get warnings that indicate which subsystem is likely to be the bottleneck. If you have a system that is not performing well to start with, then these rules can help you eliminate some problem areas and indicate which subsystems to concentrate on.

This browser mode can be used to explore all the operating system measurements supplied by the kernel reader for this system, including CPU usage, paging rates, and disk utilization.

Handling Alarms in SyMON Software

When a simple rule or one of the more complex health monitoring rules generates an alarm, it is logged by SyMON software. At the domain level console, the worst alarm state for each system being monitored is counted. This means that with only one system being monitored only one alarm will be indicated. In this case, it is in the red state.

If you click on the red indicator, shown in FIGURE 12-5 with a '1' next to it, a new window opens that shows all the alarms for all the systems in this domain.

Figure 12-5. The SyMON Software Domain Console


FIGURE 12-6 displays only systems or other network components that are in the red (critical) state. In this case, one of the file system contains too many small files, so it is almost out of capacity.

Figure 12-6. Domain Status Details Window


If you either double-click on the alarm or select the alarm and press the Details… button the Details window for that system opens with its alarm display tab selected as shown in FIGURE 12-7. This shows that in fact there are three alarms on this system, with only the most important one (red) being shown at the domain level.

Figure 12-7. Alarm Details Window


Next, select one or all of the alarms and acknowledge them by pressing the Acknowledge button. It is best to select all the alarms by clicking on the first one and sliding the cursor down, then acknowledge them all at once. It takes some time to perform the acknowledgment because it involves communicating all the way back to the agent on the server being monitored. Once an alarm is acknowledged, a tick mark appears by it as shown in FIGURE 12-8

Figure 12-8. Acknowledged Alarms


If you close the Details window and return to the Domain Status window, it may not have changed. Press the Refresh Now button, and the alarm entry will go away as shown. You can now close the Domain Status window as well. Looking back at the Domain Console as shown in FIGURE 12-9, you will see that the server no longer has a red marker on it, and the Domain Status summary is all zeroes.

Figure 12-9. Refreshed Domain Status Details Window


Figure 12-10. The SyMON Software Domain Console with No Alarms


Process Monitoring with SyMON Software

You can use the SyMON software to monitor processes and groups of processes that form workloads. This is described in detail in Chapter 5.

Browsing Configuration Information

The SyMON software includes detailed knowledge of the configuration of Sun hardware. This includes color pictures of components, so you can look inside a large server system to see exactly how the system boards are configured. This is a useful availability feature because you do not have to turn off a system and take it apart to find out what components are present. Any failed components are clearly indicated and the errors are logged to assist in problem diagnosis, which saves time and reduces the chance of accidentally changing the wrong component.

From the host details window, select the configuration tab. The initial view, FIGURE 12-11, shows a list of the main hardware resources in this Enterprise Server system.

Figure 12-11. Detailed Configuration Window


On the left, there are three options: Resources, Physical View and Logical View. Select the Logical View and wait while the configuration information is loaded. This takes longer on larger and more complex systems.

The right pane changes to show the device hierarchy as shown in FIGURE 12-12. If you press the Show Details button, a second pane appears to show details of the properties of each device. A portion of the device tree shows a slot containing an IO board, that has an SBus with a fiberchannel serial optical controller (soc) plugged into it, and a SPARCstorage™ Array (SUNW,pln) containing 18 drives. The first drive is selected and the property shows that this is device ssd96, which is also known as controller two, target zero, device zero, c2t0d0.

Figure 12-12. Logical View of Hardware Configuration Tree


You can also view the physical layout of the system. Selecting Physical View on the left changes the pane on the right to show a picture of the front view of the system cabinet. A menu option allows you to change to the rear view of the system. Pressing the Show Details button displays level of details shown in FIGURE 12-13.

Figure 12-13. Physical View of Rear of System


Move the cursor over a component to highlight it. Its properties are displayed in the right pane. In FIGURE 12-13, one of the SBus card slots is highlighted to show that it is the soc card we saw in the Logical View.

Note the Dynamic Reconfiguration button. Dynamic Reconfiguration makes it possible to unconfigure an I/O board, so that it can be unplugged from a running system without a reboot. Conversely, an additional I/O board can be added. With the Solaris 7 software, it is also possible to add and remove CPU or memory boards on these midrange Enterprise Servers. The high end Sun Enterprise 10000 server also uses Dynamic Reconfiguration, which is described in Chapter 8.

Click on the I/O board end plate itself to see a view of the board and the components that are plugged into it as shown in FIGURE 12-14. Unfortunately, the SyMON software does not yet have explicit support for all the possible storage options, so the physical view stops here. If a Sun StorEdge A5000 enclosure were connected instead of the SPARCstorage Array, then its physical configuration would have been shown.

Figure 12-14. Board Level Physical View


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.44.23