Chapter 4. Troubleshooting Cisco Unified Communications Manager Availability Issues

Cisco Unified Communications Manager (CUCM) is an IP-based private branch exchange (PBX). Its core function is call control. However, it is also the interface to other applications and services that augment the traditional voice functionality to add media resources, applications, integrations, and much more.

CUCM is now entirely virtualized, as are most of the applications and services that it uses to provide additional functionality. This includes CUCM Instant Messaging and Presence (CUCM IM&P), Cisco Unity Connection (CUC), Cisco Unified Contact Center Enterprise (UCCE), Cisco Unified Contact Center Express (UCCX), Cisco Video Communications Server (VCS), Cisco Expressway, Cisco Conductor, Cisco TelePresence Server, and the list goes on.

When CUCM becomes unavailable, it can cause major issues. This chapter focuses on the scenarios in which CUCM may not be responsive and how to troubleshoot these issues.

Chapter Objectives

Upon completing this chapter, you will be able to

• Describe the possible causes and recommended actions to take when the CUCM system is not responding

• Describe the possible causes and recommended actions to take when the CUCM Administration web page is not displayed

• Describe the possible causes and recommended actions to take when the CUCM server response is slow

How to Troubleshoot When Cisco Unified Communications Manager Is Not Responding

When CUCM is not responding, you need to ask a number of questions and check several things. First, how is it not responding? Is it reachable? Is it powered on? Is it simply not acting as a call control agent? Is it reachable but phones are not registered to it? This section explores a number of these causes.

Endpoints Failing to Register to Primary CUCM Node

CUCM is designed to provide multiple call control options to every endpoint. This redundancy is specified by the Unified CM Group (formerly known as the CallManager Group) specified in the device pool of the endpoint. Up to four CUCM nodes can be specified in the Unified CM Group for call control. Figure 4-1 shows the Unified CM Group Configuration page.

Figure 4-1. Unified CM Group Configuration Page

Image

The Unified CM Group has two fields regarding CM Group membership. These fields specify CUCM nodes available and CUCM nodes selected for membership in the group. The order of the nodes listed in the Selected Cisco Unified Communications Managers box (marked by a rectangle in the figure) determines the order of call control priority each node will take for devices associated with this group. The Unified CM Group is specified in the device pool configuration, as is the Survivable Remote Site Telephony (SRST) instance to which phones should fail over as a last resort. In this case, there are three CUCM nodes, which is the maximum number that can populate a single Unified CM Group. In larger deployments, all three nodes will be Subscriber nodes because the Publisher node may be configured to not participate in call control activities.

The three CUCM nodes specified in the Unified CM Group plus the SRST reference in the device pool provide four layers of call control resilience for phones and other devices. This allows a great degree of flexibility in terms of where the CUCM nodes are placed and how the device load is spread across the available CUCM nodes. In Figure 4-1, two subscribers are listed: cucmsub and cucmsub2. Half of the phones in the deployment can use this Unified CM Group, whereas the second half uses another group specifying cucmsub2, cucmsub as the preferred order. That spreads the entire device load equally across the available Subscriber nodes. Each has the potential to carry the full load only when there is a failure. So, the nodes should be sized with that potential in mind.

Troubleshooting issues related to registration to a particular node begin with device pool assignment because that is where the Unified CM Group is chosen. Obviously, the Unified CM Group(s) must be configured ahead of time. With the layers of resilience built in, it is unlikely that the user community will report an issue of a phone, or other device, not registering to its primary call control node. That is, of course, by design. You want the system to maintain unfettered functionality even when failures are occurring behind the scenes.

Using the Real Time Monitoring Tool (RTMT), you will see that registrations are failing for a given node. RTMT was covered, in some detail, in Chapter 3, “Using Troubleshooting and Monitoring Tools.” However, it is worth revisiting some of the topics discussed there. Figure 4-2 shows the Device Summary page in the Voice/Video category of the RTMT.

Figure 4-2. RTMT Device Summary Page

Image

Notice that two devices currently are registered to each of the CUCM nodes: 172.16.100.2 and 172.16.100.8. If one of these nodes experiences a failure, the number of registrations associated with both will change. Figure 4-3 shows what happens when an active call control node goes inactive.

Figure 4-3. RTMT Device Summary Page, Continued

Image

Notice, in Figure 4-3, that the graph has changed. The total number of devices has not been altered, but the node registration graph has changed. They went from each serving half of the load to one serving the full load (172.16.100.2) and one serving none of the load (172.16.100.8). This is visible both on the graph and in the numerical table below the graphs. There would have also been alerts on the Alert Central page and, if configured, e-mail notifications sent to the appropriate individuals. Aside from the loss of the critical service, there would likely have been errors reported by the other CUCM nodes that communication was lost. Figure 4-4 illustrates the RTMT Alert Central alarms showing SDLLinkOutOfService. Most likely, those are the actionable alerts that will get the attention of the collaboration administration personnel.

Figure 4-4. RTMT Alert Central Page Reporting SDLLinkOutOfService

Image

You should manage CUCM nodes in a proactive manner. Managing them reactively means that your users have noticed the outage. At that point, it becomes a much larger issue with added pressure on multiple fronts. It is best if the user community is not aware of a failure because the design and resilience of the collaboration system architecture are such that there are layers of redundancy for all services.

Due to a redundant call control design, no service has been impacting outages. The reason is that there are still two functioning nodes in the Unified CM Group. Those nodes are already noted in the device configuration files of all devices utilizing those Unified CM Groups.

The first order of business is to figure out what happened and ascertain how it might impact additional call control nodes. What would cause a call control node to disappear suddenly? Options include the following:

Server Hardware Failure: Although CUCM is a virtual appliance, it still depends on hardware architecture, such as Cisco Unified Computing Systems (UCS) and Cisco Tested Reference Configurations (TRC) utilizing that UCS hardware. If other virtual servers are co-resident with the downed CUCM node, a quick check of their reachability and health is in order. If they’re operational, it’s probably not a failure on that level.

Server Software Crash: The CUCM server could have experienced an issue that sent it into a Code Yellow and then crashed it. The crash could be due to a central processing unit (CPU) panic, a core dump, sun spots, extraneous electromagnetic interference, gremlins, or even a software bug. At this point, it is time to collect the log files and contact the Technical Assistance Center (TAC) to investigate the cause.

Cisco CallManager Service Stopped: Whether due to human interaction or software-related causes, the service can stop unexpectedly. Proceed to the Cisco Unified Serviceability Control Center – Feature Services page and check the status of the Cisco CallManager Service on that node. If it’s stopped, select the radio button next to it and then click Start. It is important to know why the service terminated. Check the RTMT logs and the audit log. One or both will point to (or rule out) potential causes (at least as far as determining whether it was caused by a human or by software). If the service shows as Started, often a restart of the service will remedy issues.

CPU Hog: A process on that node could have consumed all available CPU resources. A restart of the node may be in order, if it didn’t restart itself, of course.

Memory Hog: A process might not be coexisting with its peer processes. If a process is consuming memory and not properly releasing it back to the resource pool, the system will eventually run out of available memory and experience issues. This is called a memory leak, which also causes excessive paging to the hard drive. This process tends to be much slower than paging in and out of memory. This, in turn, causes a congestive situation that hinders or crashes the server.

CUCM is based on Red Hat Enterprise Linux. However, no access to the underlying shell is possible. So, it is an appliance, by all definitions. You can view some things from the command-line interface (CLI), however. To view the system log, open the CLI on the CUCM node in question (either via the VMware Console or Secure Shell [SSH], preferably SSH) and enter file view activelog /syslog/messages. It is much easier to view the output from RTMT. If the Cisco CallManager service crashed, the log will show the following message:

The Cisco CallManager service terminated unexpectedly. It has done this 1 time. The following corrective action will be taken in 60000 ms. Restart the service.

When a connection attempt to the Cisco CallManager service times out, the message will read as follows:

Timeout 3000 milliseconds waiting for Cisco CallManager service to connect.

If the service attempted to start or restart but failed, the log will show this message:

The service did not respond to the start or control request in a timely fashion.

Another useful and quick way to view the status of various services on the system is to use the utils service list command on the CLI. Retrieving the status in this way, assuming the node is reachable and functioning to some degree, tends to be a bit faster than opening a browser to the Serviceability page. Example 4-1 shows the service list on the CUCM node that ceased processing calls.

Example 4-1. CLI Service Status


admin:
admin:utils service list

Requesting service status, please wait...
System SSH [STARTED]
Cluster Manager [STARTED]
Name Service Cache [STARTED]
Entropy Monitoring Daemon [STARTED]
Cisco SCSI Watchdog [STARTED]
Service Manager [STARTED]
HTTPS Configuration Download [STOPPED] Service Activated
Service Manager is running
Getting list of all services
>> Return code = 0
A Cisco DB[STARTED]
A Cisco DB Replicator[STARTED]
Cisco AMC Service[STARTED]
Cisco AXL Web Service[STARTED]
Cisco Audit Event Service[STARTED]
Cisco CAR DB[STOPPED]  Commanded Out of Service
Cisco CAR Scheduler[STOPPED]  Commanded Out of Service
Cisco CDP[STARTED]
Cisco CDP Agent[STARTED]
Cisco CDR Agent[STARTED]
Cisco CDR Repository Manager[STOPPED]  Commanded Out of Service
Cisco CTIManager[STARTED]
Cisco CallManager[STOPPED]  Commanded Out of Service
Cisco CallManager Admin[STARTED]
Cisco CallManager SNMP Service[STARTED]
Cisco CallManager Serviceability[STARTED]
Cisco CallManager Serviceability RTMT[STARTED]
Cisco Certificate Change Notification[STARTED]
Cisco Certificate Expiry Monitor[STARTED]
Cisco Change Credential Application[STARTED]
Cisco DRF Local[STARTED]
Cisco DRF Master[STOPPED]  Commanded Out of Service
Cisco Database Layer Monitor[STARTED]
Cisco Dialed Number Analyzer[STARTED]
Cisco Dialed Number Analyzer Server[STARTED]
Cisco Directory Number Alias Lookup[STARTED]
Cisco E911[STARTED]
Cisco ELM Client Service[STARTED]
Cisco Extended Functions[STARTED]
Cisco Extension Mobility[STARTED]
Cisco Extension Mobility Application[STARTED]
Cisco IP Voice Media Streaming App[STARTED]
Cisco License Manager[STOPPED]  Commanded Out of Service
Cisco Location Bandwidth Manager[STARTED]
Cisco Log Partition Monitoring Tool[STARTED]
Cisco Prime LM Admin[STARTED]
Cisco Prime LM DB[STARTED]
Cisco Prime LM Server[STARTED]
Cisco RIS Data Collector[STARTED]
Cisco RTMT Reporter Servlet[STARTED]
Cisco SOAP - CallRecord Service[STOPPED]  Commanded Out of Service
Cisco Serviceability Reporter[STARTED]
Cisco Syslog Agent[STARTED]
Cisco Tomcat[STARTED]
Cisco Tomcat Stats Servlet[STARTED]
Cisco Trace Collection Service[STARTED]
Cisco Trace Collection Servlet[STARTED]
Cisco Trust Verification Service[STARTED]
Cisco UXL Web Service[STARTED]
Cisco User Data Services[STARTED]
Cisco WebDialer Web Service[STARTED]
Host Resources Agent[STARTED]
MIB2 Agent[STARTED]
Platform Administrative Web Service[STARTED]
SNMP Master Agent[STARTED]
SOAP - Diagnostic Portal Database Service[STARTED]
SOAP -Log Collection APIs[STARTED]
SOAP -Performance Monitoring APIs[STARTED]
SOAP -Real-Time Service APIs[STARTED]
System Application Agent[STARTED]
Cisco Bulk Provisioning Service[STOPPED]  Service Not Activated
Cisco CAR Web Service[STOPPED]  Service Not Activated
Cisco CTL Provider[STOPPED]  Service Not Activated
Cisco Certificate Authority Proxy Function[STOPPED]  Service Not Activated
Cisco DHCP Monitor Service[STOPPED]  Service Not Activated
Cisco DirSync[STOPPED]  Service Not Activated
Cisco Directory Number Alias Sync[STOPPED]  Service Not Activated
Cisco IP Manager Assistant[STOPPED]  Service Not Activated
Cisco Intercluster Lookup Service[STOPPED]  Service Not Activated
Cisco Prime LM Resource API[STOPPED]  Service Not Activated
Cisco Prime LM Resource Legacy API[STOPPED]  Service Not Activated
Cisco SOAP - CDRonDemand Service[STOPPED]  Service Not Activated
Cisco TAPS Service[STOPPED]  Service Not Activated
Cisco Tftp[STOPPED]  Service Not Activated
Cisco Unified Mobile Voice Access Service[STOPPED]  Service Not Activated
Self Provisioning IVR[STOPPED]  Service Not Activated
Primary Node =false

admin:

Notice that the Cisco CallManager service not only shows that it is stopped but also shows why the service was terminated. It was commanded out of service. This indicates that the service was stopped through human interaction. To start the service, you need to go to the Serviceability page. In the same way that some services may be stopped only from the CLI, only a few commands can be started from the CLI. Example 4-2 shows the result of trying to start the Cisco CallManager service from the CLI, as well as other services that can be started from the CLI.

Example 4-2. Starting Services from the CLI


admin:utils service start Cisco CallManager

Executed command unsuccessfully
Invalid service name for start/stop, valid names are:
    System SSH
    Cluster Manager
    Name Service Cache
    Entropy Monitoring Daemon
    Cisco SCSI Watchdog
    Service Manager
    HTTPS Configuration Download
    Service Manager
    A Cisco DB
    Cisco CallManager Serviceability
    Cisco CallManager Serviceability RTMT
    Cisco CAR DB
    Cisco Database Layer Monitor
    Cisco Directory Number Alias Lookup
    Cisco Directory Number Alias Sync
    Cisco DRF Local
    Cisco DRF Master
    Cisco Prime LM DB
    Cisco Prime LM Resource API
    Cisco Prime LM Resource Legacy API
    Cisco Prime LM Server
    Cisco Tomcat
    Platform Administrative Web Service
    SNMP Master Agent

admin:

Remember that when you are starting, stopping, or restarting services from the CLI, the names are case sensitive and must be entered exactly as they appear in the list.

It’s not always the fault of the Cisco CallManager service when phones unregister or simply do not register. If a node is experiencing issues and lacks resources to function normally, those resources issued can be identified via RTMT. Figure 4-5 shows the RTMT CPU and Memory page.

Figure 4-5. RTMT CPU and Memory Page

Image

In Figure 4-5, you can see a couple of memory spikes. Because they are spikes and not sustained excessive utilization, they really don’t cause much concern. If the spikes maintained very high utilization for an extended period of time, they may be indicative of an imminent crash or similar service impact. The nodes spiking in Figure 4-5 are the CUCM IM&P node and the CUCM publisher.

Sustained excessive memory utilization would also be cause for concern. In Figure 4-5, the CPU spiked, but the memory utilization remained the same. When excessive buffering or paging occurs, it is not uncommon to see increasing utilization of both CPU and memory. If the upward trend continues for a significant amount of time without adjusting, a system impact is quite probable after one or both resources are exhausted.

Much of this information is available from the CLI. CPU information is available in the output of a number of commands. They include the following:

show stats io: Displays information regarding average CPU usage and disk input/output (IO)

show perf query class Processor: Displays perfmon counters regarding CPU usage

show perf query counter Process “% CPU Time”: Shows a snapshot of all processes running and the percentage of CPU they are using

show process load: Displays current load on the system including CPU, memory, swap file, and disk utilization

show status: Shows a snapshot of the nodes’ CPU and memory utilization, as well as the hostname, date, time, full product version, and uptime

show process using-most cpu: Shows the process using the most CPU cycles at the time the command is entered.

show process using-most memory: Shows the process using the most memory at the time the command is entered.

utils diagnose test: Runs all diagnostic commands but does not attempt to fix any errors found.

Example 4-3 shows the output of the show stats io command.

Example 4-3. show stats io Command Output


admin:show stats io

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          10.42    0.01    9.84    0.25    0.00   79.49

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda              35.59        24.47       597.24   31263287  763131606
admin:

In the Example 4-3 output, the CPU was 79.49 percent idle when the command was entered. Use this command to build a general baseline during both working and nonworking hours so that you have a good idea of what the general utilization of the CPU should be during a given 24-hour period.

Example 4-4 shows the output of the show perf query class Processor command.

Example 4-4. show perf query class Processor Command Output


admin:show perf query class Processor
==>query class :

 - Perf class (Processor) has instances and values:
    0               -> % CPU Time                     = 49
    0               -> IOwait Percentage              = 5
    0               -> Idle Percentage                = 51
    0               -> Irq Percentage                 = 0
    0               -> Nice Percentage                = 3
    0               -> Softirq Percentage             = 1
    0               -> Steal Percentage               = 0
    0               -> System Percentage              = 21
    0               -> User Percentage                = 18
    _Total          -> % CPU Time                     = 49
    _Total          -> IOwait Percentage              = 5
    _Total          -> Idle Percentage                = 51
    _Total          -> Irq Percentage                 = 0
    _Total          -> Nice Percentage                = 3
    _Total          -> Softirq Percentage             = 1
    _Total          -> Steal Percentage               = 0
    _Total          -> System Percentage              = 21
    _Total          -> User Percentage                = 18

admin:

The command in Example 4-4 shows average CPU usage over time. Using it is a good way of determining whether high CPU utilization is a factor in any current issue. This example shows average idle utilization at 51 percent. So, the CPU has not been generally stressed.

Example 4-5 shows the output of the show perf query counter Process “% CPU Time” command.

Example 4-5. show perf query counter Process “% CPU Time” Command Output


admin:show perf query counter Process "% CPU Time"

 - Perf class Process(% CPU Time) has values:
    AuditEventRespo -> % CPU Time                     = 0
    AuditLog        -> % CPU Time                     = 0
    BPS             -> % CPU Time                     = 0
    CCMDirSync      -> % CPU Time                     = 0
    CTIManager      -> % CPU Time                     = 1
    CiscoDRFLocal   -> % CPU Time                     = 0
    CiscoDRFMaster  -> % CPU Time                     = 0
    CiscoLicenseMgr -> % CPU Time                     = 0
    CiscoSyslogSubA -> % CPU Time                     = 0
    ElmSrvr         -> % CPU Time                     = 0
    LpmTool         -> % CPU Time                     = 0
    PnPLauncher     -> % CPU Time                     = 0
   RisDC       -> % CPU Time               = 3
    TAPS            -> % CPU Time                     = 0
    UserSyncService -> % CPU Time                     = 0
    _pluto_adns     -> % CPU Time                     = 0
    _plutoload      -> % CPU Time                     = 0
    _plutorun#1     -> % CPU Time                     = 0
    _plutorun       -> % CPU Time                     = 0
    acpid           -> % CPU Time                     = 0
    aio/0           -> % CPU Time                     = 0
    amc             -> % CPU Time                     = 2
    arpmond         -> % CPU Time                     = 0
    async/mgr       -> % CPU Time                     = 0
    ata_aux         -> % CPU Time                     = 0
    ata_sff/0       -> % CPU Time                     = 0
    audispd         -> % CPU Time                     = 1
    auditd          -> % CPU Time                     = 1
    bdi-default     -> % CPU Time                     = 0
    caroninit#1     -> % CPU Time                     = 0
    caroninit#10    -> % CPU Time                     = 0
    caroninit#11    -> % CPU Time                     = 0
    caroninit#12    -> % CPU Time                     = 0
    caroninit#13    -> % CPU Time                     = 0
    caroninit#14    -> % CPU Time                     = 0
    caroninit#15    -> % CPU Time                     = 0
    caroninit#16    -> % CPU Time                     = 0
    caroninit#2     -> % CPU Time                     = 0
    caroninit#3     -> % CPU Time                     = 1
    caroninit#4     -> % CPU Time                     = 0
    caroninit#5     -> % CPU Time                     = 0
    caroninit#6     -> % CPU Time                     = 0
    caroninit#7     -> % CPU Time                     = 0
    caroninit#8     -> % CPU Time                     = 0
    caroninit#9     -> % CPU Time                     = 0
    caroninit       -> % CPU Time                     = 0
    carschlr        -> % CPU Time                     = 1
    ccm             -> % CPU Time                     = 1
    ccmAgt          -> % CPU Time                     = 0
    cdpAgt          -> % CPU Time                     = 0
    cdpd            -> % CPU Time                     = 0
    cdragent        -> % CPU Time                     = 0
    cdrrep          -> % CPU Time                     = 0
    cef             -> % CPU Time                     = 0
    certM           -> % CPU Time                     = 0
    certSync        -> % CPU Time                     = 0
    cgroup          -> % CPU Time                     = 0
    cliscript.sh    -> % CPU Time                     = 0
    clm             -> % CPU Time                     = 0
    cmoninit#1      -> % CPU Time                     = 0
    cmoninit#10     -> % CPU Time                     = 0
    cmoninit#11     -> % CPU Time                     = 0
    cmoninit#12     -> % CPU Time                     = 0
    cmoninit#13     -> % CPU Time                     = 0
    cmoninit#14     -> % CPU Time                     = 0
    cmoninit#15     -> % CPU Time                     = 0
    cmoninit#16     -> % CPU Time                     = 0
    cmoninit#17     -> % CPU Time                     = 0
    cmoninit#18     -> % CPU Time                     = 0
    cmoninit#19     -> % CPU Time                     = 0
    cmoninit#2      -> % CPU Time                     = 0
    cmoninit#20     -> % CPU Time                     = 0
    cmoninit#3      -> % CPU Time                     = 0
    cmoninit#4      -> % CPU Time                     = 0
    cmoninit#5      -> % CPU Time                     = 0
    cmoninit#6      -> % CPU Time                     = 0
    cmoninit#7      -> % CPU Time                     = 0
    cmoninit#8      -> % CPU Time                     = 0
    cmoninit#9      -> % CPU Time                     = 0
    cmoninit        -> % CPU Time                     = 2
    crond           -> % CPU Time                     = 0
    crypto/0        -> % CPU Time                     = 0
    ctftp           -> % CPU Time                     = 1
    dbcfs#1         -> % CPU Time                     = 0
    dbcfs#2         -> % CPU Time                     = 0
    dbcfs#3         -> % CPU Time                     = 0
    dbcfs#4         -> % CPU Time                     = 0
    dbcfs#5         -> % CPU Time                     = 0
    dbcfs#6         -> % CPU Time                     = 0
    dbcfs           -> % CPU Time                     = 0
    dblrpc          -> % CPU Time                     = 0
    dbmon           -> % CPU Time                     = 1
    dbus-daemon     -> % CPU Time                     = 0
    deferwq         -> % CPU Time                     = 0
    dnaserver       -> % CPU Time                     = 0
    events/0        -> % CPU Time                     = 0
    ext4-dio-unwrit#1 -> % CPU Time                     = 0
    ext4-dio-unwrit#2 -> % CPU Time                     = 0
    ext4-dio-unwrit#3 -> % CPU Time                     = 0
    ext4-dio-unwrit -> % CPU Time                     = 0
    flush-8:0       -> % CPU Time                     = 2
    hald            -> % CPU Time                     = 0
    hald-addon-acpi -> % CPU Time                     = 0
    hald-addon-inpu -> % CPU Time                     = 0
    hald-runner     -> % CPU Time                     = 0
    haproxy         -> % CPU Time                     = 0
    host_agent.pl   -> % CPU Time                     = 0
    hostagt         -> % CPU Time                     = 0
    ilsd            -> % CPU Time                     = 1
    init            -> % CPU Time                     = 0
    ipprefsd        -> % CPU Time                     = 0
    iproduct_impl#1 -> % CPU Time                     = 0  invalid
    iproduct_impl   -> % CPU Time                     = 0  invalid
    ipvmsd          -> % CPU Time                     = 1
    java            -> % CPU Time                     = 1
    jbd2/sda1-8     -> % CPU Time                     = 0
    jbd2/sda2-8     -> % CPU Time                     = 0
    jbd2/sda3-8     -> % CPU Time                     = 0
    jbd2/sda6-8     -> % CPU Time                     = 2
    kacpi_hotplug   -> % CPU Time                     = 0
    kacpi_notify    -> % CPU Time                     = 0
    kacpid          -> % CPU Time                     = 0
    kauditd         -> % CPU Time                     = 0
    kblockd/0       -> % CPU Time                     = 1
    kdmremove       -> % CPU Time                     = 0
    khelper         -> % CPU Time                     = 0
    khubd           -> % CPU Time                     = 0
    khugepaged      -> % CPU Time                     = 0
    khungtaskd      -> % CPU Time                     = 0
    kintegrityd/0   -> % CPU Time                     = 0
   kipvmsMixer  -> % CPU Time              = 5
    kipvmsd         -> % CPU Time                     = 0
    kpsmoused       -> % CPU Time                     = 0
    kseriod         -> % CPU Time                     = 0
    ksmd            -> % CPU Time                     = 0
    ksoftirqd/0     -> % CPU Time                     = 0
    kstriped        -> % CPU Time                     = 0
    ksuspend_usbd   -> % CPU Time                     = 0
    kswapd0         -> % CPU Time                     = 0
    kthreadd        -> % CPU Time                     = 0
    kthrotld/0      -> % CPU Time                     = 0
    lbm             -> % CPU Time                     = 1
    linkwatch       -> % CPU Time                     = 0
    logger          -> % CPU Time                     = 0
    master          -> % CPU Time                     = 0
    md/0            -> % CPU Time                     = 0
    md_misc/0       -> % CPU Time                     = 0
    mib2_agent.pl   -> % CPU Time                     = 0
    mib2agt         -> % CPU Time                     = 0
    migration/0     -> % CPU Time                     = 0
    mingetty#1      -> % CPU Time                     = 0
    mingetty#2      -> % CPU Time                     = 0
    mingetty#3      -> % CPU Time                     = 0
    mingetty#4      -> % CPU Time                     = 0
    mingetty#5      -> % CPU Time                     = 0
    mingetty        -> % CPU Time                     = 0
    mpt/0           -> % CPU Time                     = 0
    mpt_poll_0      -> % CPU Time                     = 0
    nbslogpd        -> % CPU Time                     = 0
    netns           -> % CPU Time                     = 0
    ntp_start.sh#1  -> % CPU Time                     = 0
    ntp_start.sh    -> % CPU Time                     = 0
    ntpd            -> % CPU Time                     = 0
    pciehpd         -> % CPU Time                     = 0
    pickup          -> % CPU Time                     = 0
    picli           -> % CPU Time                     = 2
    pluto           -> % CPU Time                     = 0
    pm              -> % CPU Time                     = 0
    portreserve     -> % CPU Time                     = 0
    postmaster#1    -> % CPU Time                     = 0
    postmaster#2    -> % CPU Time                     = 0
    postmaster#3    -> % CPU Time                     = 0
    postmaster#4    -> % CPU Time                     = 0
    postmaster#5    -> % CPU Time                     = 0
    postmaster#6    -> % CPU Time                     = 0
    postmaster#7    -> % CPU Time                     = 0
    postmaster      -> % CPU Time                     = 0
    qmgr            -> % CPU Time                     = 0
    rngd            -> % CPU Time                     = 0
    rsyslogd        -> % CPU Time                     = 0
    rtmtreporter    -> % CPU Time                     = 0
    sapp_agent.pl   -> % CPU Time                     = 0
    sappagt         -> % CPU Time                     = 0
    scsi-watchdog.s#1 -> % CPU Time                     = 0
    scsi-watchdog.s -> % CPU Time                     = 0
    scsi_eh_0       -> % CPU Time                     = 0
    scsi_eh_1       -> % CPU Time                     = 0
    scsi_eh_2       -> % CPU Time                     = 0
    sedispatch      -> % CPU Time                     = 0
    servM           -> % CPU Time                     = 0
    sh              -> % CPU Time                     = 0
    sleep           -> % CPU Time                     = 0
    snmp_master_age -> % CPU Time                     = 0
    snmpdm          -> % CPU Time                     = 0
    sshd#1          -> % CPU Time                     = 0
    sshd#2          -> % CPU Time                     = 0
    sshd#3          -> % CPU Time                     = 0
    sshd#4          -> % CPU Time                     = 0
    sshd#5          -> % CPU Time                     = 0
    sshd#6          -> % CPU Time                     = 0
    sshd#7          -> % CPU Time                     = 0
    sshd#8          -> % CPU Time                     = 0
    sshd            -> % CPU Time                     = 0
    startcliscript. -> % CPU Time                     = 0
    stopper/0       -> % CPU Time                     = 0
    sudo#1          -> % CPU Time                     = 0
    sudo#10         -> % CPU Time                     = 0
    sudo#11         -> % CPU Time                     = 0
    sudo#12         -> % CPU Time                     = 0
    sudo#13         -> % CPU Time                     = 0
    sudo#14         -> % CPU Time                     = 0
    sudo#15         -> % CPU Time                     = 0
    sudo#16         -> % CPU Time                     = 0
    sudo#17         -> % CPU Time                     = 0
    sudo#18         -> % CPU Time                     = 0
    sudo#2          -> % CPU Time                     = 0
    sudo#3          -> % CPU Time                     = 0
    sudo#4          -> % CPU Time                     = 0
    sudo#5          -> % CPU Time                     = 0
    sudo#6          -> % CPU Time                     = 0
    sudo#7          -> % CPU Time                     = 0
    sudo#8          -> % CPU Time                     = 0
    sudo#9          -> % CPU Time                     = 0
    sudo            -> % CPU Time                     = 0
    sync_supers     -> % CPU Time                     = 0
    tomcat#1        -> % CPU Time                     = 0
    tomcat          -> % CPU Time                     = 0
    tracecollection -> % CPU Time                     = 0
    tvs             -> % CPU Time                     = 0
    udevd#1         -> % CPU Time                     = 0
    udevd#2         -> % CPU Time                     = 0
    udevd           -> % CPU Time                     = 0
    usbhid_resumer  -> % CPU Time                     = 0
    vmmemctl        -> % CPU Time                     = 0
    vmtoolsd        -> % CPU Time                     = 0
    watchdog/0      -> % CPU Time                     = 0

admin:

The command in Example 4-5 is an excellent candidate for showing how SSH is a better access methodology for the CLI than the VMware console. The output is rather lengthy by design. It is a detailed look at what system processes are running and using CPU resources. This command helps in narrowing down which process or processes are using more than their share of the available CPU resources. In this output, nothing is particularly misbehaving. The RisDC (real-time information service data collector) is using 3 percent and the kipvmsMixer is using 5 percent.

Example 4-6 shows the output of the show process load command.

Example 4-6. show process load Command Output


admin:show process load
top - 17:28:39 up 16 days,  2:01,  1 user,  load average: 0.81, 0.80, 0.89
Tasks: 236 total,   1 running, 235 sleeping,   0 stopped,   0 zombie
Cpu(s): 10.4%us,  9.4%sy,  0.0%ni, 79.5%id,  0.2%wa,  0.0%hi,  0.4%si,  0.0%st
Mem:   3925820k total,  3780808k used,   145012k free,    31492k buffers
Swap:  4095996k total,  1152884k used,  2943112k free,   536608k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
14616 certbase  20   0  675m  68m 3828 S  1.9  1.8  37:56.85 certSync
14672 drf       20   0  671m  60m 2132 S  1.9  1.6  37:36.41 CiscoDRFLocal
15570 ccmservi  20   0  145m  51m 7000 S  1.9  1.3 269:45.90 CTIManager
16590 ccmservi  20   0  419m  50m 3028 S  1.9  1.3  33:40.45 UserSyncService
16771 root      20   0  243m  17m 6224 S  1.9  0.4 278:33.57 ilsd
32558 admin     30  10  425m  64m  11m S  1.9  1.7   0:06.20 java
    1 root      20   0 19496 1548 1128 S  0.0  0.0   1:15.69 init
    2 root      20   0     0    0    0 S  0.0  0.0   0:00.00 kthreadd
    3 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 migration/0
    4 root      20   0     0    0    0 S  0.0  0.0   2:42.89 ksoftirqd/0
admin:

In the command shown in Example 4-6, the information of most relevance is presented in the %CPU and %MEM columns. Comparing them to your baseline is a good indication as to whether something is out of line for the behavior of your particular system.

Example 4-7 shows the output of the show status command. It is useful for gaining a quick snapshot overview of CPU and memory utilization for the node on which it is issued.

Example 4-7. show status Command Output


admin:show status

Host Name          : cucmpub
Date               : Mon Jan 4, 2016 17:38:18
Time Zone          : Central Standard Time (America/Chicago)
Locale             : en_US.UTF-8
Product Ver        : 11.0.1.20000-2
Unified OS Version : 6.0.0.0-2

Uptime:
 17:38:21 up 18 days,  2:11,  1 user,  load average: 5.88, 2.21, 1.16

CPU Idle:   82.80%  System:   07.53%    User:   08.61%
  IOWAIT:   00.00%     IRQ:   00.00%    Soft:   01.08%

Memory Total:        3925820K
        Free:         154756K
        Used:        3771064K
      Cached:         570276K
      Shared:         187332K
     Buffers:          31176K

                        Total            Free            Used
Disk/active         14511984K        1569372K       12793848K (90%)
Disk/inactive       14512048K        1573904K       12789380K (90%)
Disk/logging        50858016K       19463000K       28804840K (60%)


admin:

In Example 4-7, the hostname of the node, its product version, the CPU usage overview, and the memory usage overview are highlighted. Also visible are the current statistics for the disk utilization, including total available space, free space, and used space.

Example 4-8 shows the output of the show process using-most cpu command. This command provides a detailed look at the information currently utilizing the bulk of the CPU cycles compared to other processes.

Example 4-8. show process using-most cpu Command Output


admin:show process using-most cpu
PCPU PID CPU NICE STATE CPUTIME  ARGS
%CPU   PID CPU  NI S     TIME COMMAND
 6.4 14664   -   0 S 06:15:44 /home/tomcat/tomcat -user tomcat -home
---output truncated---
 2.3 14460   -   - S 10:12:04 [kipvmsMixer]
 2.0   227   -   0 S 08:50:11 [flush-8:0]
 1.3 14834   -   0 S 05:59:56 /usr/local/cm/bin/RisDC
 1.3 22499   -  10 S 00:00:06 java -DConsoleRows=31 -DConsoleColumns=123 -DCommonFileSystem="disk_full=false,inode_full=false,no_write=false,internal_error=false" -DJvmStartTime=1451950659 -XX:-UseSplitVerifier sdMain name=admin priv=4 master

admin:

In Example 4-8, a few processes are listed. They include the Tomcat service, kipvmsMixer (for the IP Media Streaming Application Service), and the RisDC. Each line includes the current CPU utilization for the process, a process identifier, total CPU time, and the process name.

Example 4-9 shows a similar picture in terms of memory utilization with the output of the show process using-most memory command.

Example 4-9. show process using-most memory Command Output


admin:show process using-most memory
MEM(K) PID  ARGS
102640 14845 /usr/local/cm/bin/amc /usr/local/cm/conf/amc/amcCfg.xml
108844 15379 /usr/local/cm/bin/PnPLauncher /usr/local/cm/conf/PnPcfg.xml
120988 15016 /usr/local/cm/bin/carschlr /usr/local/cm/conf/car/carschlrCfg.xml
129692 22192 /usr/local/cm/bin/cmoninit -w
1560372 14664 /home/tomcat/tomcat -user tomcat -home /usr/local/thirdparty/java/j2sdk ----output truncated---

admin:

The command output contains the memory utilized by the top user process, process identifiers, and process name or config file in use. The output of this command can be piped to a file by specifying a filename as an optional command parameter at the end of the command upon entry.

Example 4-10 shows the output of the utils diagnose test command. This command runs a battery of tests on the system, including disk space, service validation, network validation, and so on.

Example 4-10. utils diagnose test Command Output


admin:utils diagnose test

Log file: platform/log/diag1.log

Starting diagnostic test(s)
===========================
test - disk_space          : Passed (available: 1533 MB, used: 12494 MB)
skip - disk_files          : This module must be run directly and off hours
test - service_manager     : Passed
test - tomcat              : Passed
test - tomcat_deadlocks    : Passed
test - tomcat_keystore     : Passed
test - tomcat_connectors   : Passed
test - tomcat_threads      : Passed
test - tomcat_memory       : Passed
test - tomcat_sessions     : Passed
skip - tomcat_heapdump     : This module must be run directly and off hours
test - validate_network    : Passed
test - raid                : Passed
test - system_info         : Passed (Collected system information in diagnostic log)
test - ntp_reachability    : Warning
The host 204.235.61.9 is not reachable, or its NTP service is down.
The host 173.49.198.27 is not reachable, or its NTP service is down.

Some of the configured external NTP servers are not reachable.
It is recommended that for better time synchronization all of
the NTP servers be reachable.

Please use the OS Admin GUI to add/remove NTP servers.

test - ntp_clock_drift     : Passed
test - ntp_stratum         : Failed
The reference NTP server is a stratum 5 clock.
NTP servers with stratum 5 or worse clocks are deemed unreliable.
Please consider using an NTP server with better stratum level.

Please use OS Admin GUI to add/delete NTP servers.

skip - sdl_fragmentation   : This module must be run directly and off hours
skip - sdi_fragmentation   : This module must be run directly and off hours

Diagnostics Completed


 The final output will be in Log file: platform/log/diag1.log


 Please use 'file view activelog platform/log/diag1.log' command to see the output

admin:

In the output, a few of the tests are highlighted. The disk space test passed and provided available and used disk space numbers. The Tomcat service tests passed, as did network validation. The NTP test resulted in a warning due to a couple of the configured NTP servers not being reachable. It also suggested a fix for those issues. Finally, it created a log file with the output from these tests. This log is useful if you’re viewing this information from the VMware console. The VMware console does not allow for paging, so all you can usually see is the last page of output. Having the file available for viewing makes this a moot issue. The command, when issued, takes time to run. For each test, it provides a countdown to completion.

The CLI is quite extensive. Though there is no escaping it entirely, you can avoid it for the majority of the services by using the Cisco Unified Serviceability tool. It was discussed in some detail in Chapter 3, “Using Troubleshooting and Monitoring Tools.” However, the focus for this portion of the discussion is endpoint registration. If endpoints cannot register, key services and parameters must be checked. They include the Cisco CallManager service and the Cisco TFTP service. The Cisco CallManager service provides call control. The Cisco TFTP service provides configuration files for each device. Those configuration files include the information in the Unified CM Group. The Unified CM Group specifies the CUCM node(s) to use for primary, secondary, and tertiary call control as well as SRST, if enabled. The TFTP server address is typically specified by the Option 150 parameter in the DHCP scope serving the voice VLAN, or it can be manually configured on a per-device basis.

To check the Cisco CallManager and Cisco TFTP service status, proceed to the Cisco Unified Serviceability page. Figure 4-6 shows the Cisco Unified Serviceability Service Activation page.

Figure 4-6. Cisco Unified Serviceability Service Activation Page

Image

In Figure 4-6, the boxes indicating the Cisco CallManager service and the Cisco TFTP service both show that the services are activated. However, activation does not necessarily mean running. Figure 4-7 shows the Cisco Unified Serviceability Control Center – Feature Services page.

Figure 4-7. Cisco Unified Serviceability Control Center – Feature Services Page

Image

Figure 4-7 shows that the Cisco CallManager and Cisco TFTP services are started. If one of the services were to be stopped, click the radio button next to it and then click the Start button.

If the services are started, and the phones are still unable to register, verify the TFTP server address being provided by the DHCP scope for the voice VLAN in use by the phone(s) in question. Example 4-11 shows the configuration for DHCP on a Cisco router.

Example 4-11. Cisco IOS DHCP Configuration


!
ip dhcp pool VOICE
 network 172.16.0.0 255.255.0.0
 default-router 172.16.1.1
 dns-server 172.16.100.10 8.8.8.8
 option 150 ip 172.16.100.1
!

The option 150 ip 172.16.100.1 command specifies that the Cisco TFTP server is located at 172.16.100.1. Of course, it might also be that the phones are not getting their IP addresses. That may be indicative of a router configuration error. If the phones must cross a router or Layer 3 switch to reach the DHCP server, an ip helper <dhcp server ip address> command may be needed on the local Layer 3 interface serving the phones. DHCP requests are sent out as Layer 3 broadcasts. Routers do not forward these broadcasts. The ip helper command provides an address for a select few broadcasts, including DHCP.

How to Troubleshoot When Cisco Unified Communications Manager Administration Web Page Is Not Displayed

Few issues are more annoying than being in the midst of an outage and unable to reach the CUCM administrative interface to troubleshoot the issue. A number of symptoms arise when reaching the CCMAdmin page becomes difficult.

The first issue to rule out is a bad bookmark or a mistyped URL. The URL should be

https://<cucm IP Address>/ccmadmin

If the browser returns a message, such as “The page cannot be displayed” (Internet Explorer) or “There was no response. The server could be down or is not responding” (Firefox), there are some common causes to explore.

Certainly, check the usual suspects when it comes to connectivity verification.

1. Clear the browser’s cache and restart it. Then try the URL again.

2. Check the hosts file on the local workstation to ensure that it doesn’t contain an invalid entry for the CUCM (c:windowssystem32driversetchosts on PC, etchosts on Mac/Linux).

3. Use nslookup to verify that DNS resolves the short and long names using your configured DNS server.

4. Enter the URL with the IP address rather than the DNS name.

5. Ping the CUCM publisher.

6. Traceroute to the CUCM publisher (see where the last working hop is recorded to narrow troubleshooting scope).

7. Check to see if the CUCM is listening on port 443 by issuing a telnet command to the CUCM on port 443: telnet 172.16.100.1 443. The response will be a blank screen with a blinking cursor rather than an error message.

8. Check to ensure that no access lists are configured to block HTTPS traffic between your workstation and the CUCM.

9. Understand that excessively high CPU utilization will slow, or even preclude, CUCM’s response to HTTPS traffic.

Assuming network connectivity is verified, some issues will keep CUCM from responding to web requests. SSH to CUCM, or open the VMware console if it won’t respond to SSH, and enter the utils service list command as shown in Example 4-12.

Example 4-12. utils service list Command Output


admin:utils service list

Requesting service status, please wait...
System SSH [STARTED]
Cluster Manager [STARTED]
Name Service Cache [STOPPED] Service Not Activated
Entropy Monitoring Daemon [STARTED]
Cisco SCSI Watchdog [STARTED]
Service Manager [STARTED]
HTTPS Configuration Download [STARTED]
Service Manager is running
Getting list of all services
>> Return code = 0
A Cisco DB[STARTED]
A Cisco DB Replicator[STARTED]
Cisco AMC Service[STARTED]
Cisco AXL Web Service[STARTED]
---output truncated---
Cisco CallManager[STARTED]
Cisco CallManager Admin[STARTED]
Cisco CallManager SNMP Service[STARTED]
Cisco CallManager Serviceability[STARTED]
Cisco CallManager Serviceability RTMT[STARTED]
---output truncated---
Cisco Tftp[STARTED]
Cisco Tomcat[STOPPED] Commanded out of Service
Cisco Tomcat Stats Servlet[STARTED]
Cisco Trace Collection Service[STARTED]
Cisco Trace Collection Servlet[STARTED]
Primary Node =true
admin:
admin:
admin:
admin:
admin:
admin:utils service start cisco tomcat

Executed command unsuccessfully
Invalid service name for start/stop, valid names are:
    System SSH
    Cluster Manager
    Name Service Cache
    Entropy Monitoring Daemon
    Cisco SCSI Watchdog
    Service Manager
    HTTPS Configuration Download
    Service Manager
    A Cisco DB
    Cisco CallManager Serviceability
    Cisco CallManager Serviceability RTMT
    Cisco CAR DB
    Cisco Database Layer Monitor
    Cisco Directory Number Alias Lookup
    Cisco Directory Number Alias Sync
    Cisco DRF Local
    Cisco DRF Master
    Cisco Prime LM DB
    Cisco Prime LM Resource API
    Cisco Prime LM Resource Legacy API
    Cisco Prime LM Server
   Cisco Tomcat
   Platform Administrative Web Service
   SNMP Master Agent

admin:utils service start Cisco Tomcat
Service Manager is running
Cisco Tomcat[STARTING]
Cisco Tomcat[STARTING]
Cisco Tomcat[STARTED]
admin:

Look at the status of the Cisco Tomcat service. If it is not [STARTED], try to start it by entering utils service start Cisco Tomcat. Remember, the service names are case sensitive. After the service shows [STARTING], the CCMAdmin page may begin responding. You may see something akin to what is shown in Figure 4-8.

Figure 4-8. CCMAdmin Page while Cisco Tomcat Is Starting

Image

Even though the service shows as started, it may take a few minutes to begin fully responding.

How to Troubleshoot Slow Response of Cisco Unified Communications Manager Server

Sometimes CUCM is slow to respond to requests. It won’t likely manifest as dropped calls, one-way audio, or anything really impacting calls in progress. When users may notice it, the symptoms may be delayed dialing, delayed or failed Extension Mobility login, call transfer difficulties, or slow response time of CUCM web-based applications. The dominant issues come along as signaling-related problems when users try to set up new calls or implement mid-call features.

The reason for these problems is largely that CUCM is involved only in call setup, teardown, media resource requests, or other things not directly associated with the endpoint-to-endpoint media flow.

When users report dialing delays, it is important to understand the nature of them. Are the delays occurring while the users are dialing a number or after the last digit of that number has been dialed? The same questions are asked when there are dialing failures. But this is different. When numbers are dialed, digit stimulus is sent to CUCM and interpreted. If the digits are slow to dial or process, the speed could be indicative of either a delay with CUCM processing or a network latency issue. If the number dialing is responsive, but the processing thereafter is slow, it is most likely a CUCM-related delay as it goes through route path selection, gateway selection, media resource invocation, and so on. It is beneficial if you can replicate the issue or see the problem firsthand with the user reporting it. This way, you get the most complete understanding of what is going on and whether the end user is using the correct or relevant terms to describe the event.

In cases in which CUCM is determined to be the cause of the delay, investigate the CUCM node acting as call control for the phone(s) in question. You can see this by accessing the settings on the phone. Through the CUCM administrative interface, you can see the node to which the phone is registered. There are two aspects to investigate: the node to which the phone is registered and the node to which the phone is supposed to be registered. If these two things don’t match, it is very likely a CUCM problem. If a CUCM node has failed and the secondary call control node is struggling to manage the additional load, the situation can become significantly more difficult in rapid fashion. To see the node to which phones are registered, open CCMAdmin and click Device -> Phone -> Find. Alternatively, enter a query for the phone you wish to investigate. Figure 4-9 shows an example of the CUCM Find and List Phones page.

Figure 4-9. CUCM Find and List Phones Page

Image

In Figure 4-9, you can see that phones are registered to CUCM nodes 172.16.100.2 and 172.16.100.8, as it should be based on the Unified CM Group configuration. If all the phones were registered to one node or the other, it would be clear that a CUCM node failure had occurred, and the failure must be investigated. Check the Cisco CallManager service state on that node, if it’s reachable. Check RTMT for alerts and other relevant information regarding the failure.

If users have latency issues, a timely resolution may be critical. Multitasking skills come in handy at this point. You will want to monitor the CPU and Memory utilization of the functional node while simultaneously troubleshooting the cause of the issue(s) on the nonfunctioning node.

Another question to consider is, “Have there been any changes?” If so, what changed, and how did those changes come about? Network changes may be part of the problem. If port settings were updated on the switches to which the VMware hosts are connected, there could be major ramifications. This is especially true if the changes were speed and/or duplex related. If a speed or duplex setting was altered, it is possible that every virtual server on the VMware host in question is experiencing difficulties and delays. The need for multitasking may have just expanded to troubleshooting multiple services.

Luckily, duplex errors make themselves known very quickly and quite often. A mismatch will introduce some amount of latency, certainly. The LAN switch itself will generate errors every few seconds on the console, by default, and to any syslog or SNMP services configured to monitor it. Speed mismatch won’t always be so easily found, because it’s not really an error per se, if both sides are set to auto or even if just one side is set to auto. It’s simply a reduction in available bandwidth. If both sides are set to differing values, there is a big problem. Example 4-13 shows the interface status when there is a speed mismatch. It’s simply down.

Example 4-13. Show Interface Status—Speed Mismatch


UCSwitch01-3560CG# show interface gig 0/3 status

Port      Name               Status       Vlan       Duplex  Speed Type
Gi0/3     ***Uplink to BE600 notconnect   1            auto    100 10/100/1000BaseTX


!
interface GigabitEthernet0/3
 description ***Uplink to BE6000S Gig 0/0 Interface***
 speed 100
!

In Example 4-13, the speed is forced to 100 Mbps on the interface. The status shows as “notconnect.” A look at the interface of its peer device attached to interface Gig0/3 is shown in Example 4-14.

Example 4-14. Interface Speed Mismatch


!
interface GigabitEthernet0/0
 description ***Uplink to UCSwitch01-3560CG Int Gig0/3***
 ip address 192.168.1.240 255.255.255.0
 duplex full
 speed 1000
!

In Example 4-14, the speed is forced to 1000 Mbps. Because the two interfaces could not negotiate speed, neither would change state to up. Setting one, or both, to auto remedies the issue, as would setting them both to a common speed setting. Again, no errors are generated other than the interface changing state to down when the change causing the speed mismatch occurred.

Duplex mismatch is slightly different. The interfaces will still change state to up with a duplex mismatch; however, the throughput becomes hindered. Errors are also generated at regular intervals to report a duplex mismatch. Example 4-15 shows the interface configuration causing a duplex mismatch and the resulting error generated.

Example 4-15. Duplex Mismatch


!
interface GigabitEthernet0/0
 description ***Uplink to UCSwitch01-3560CG Int Gig0/3***
 ip address 192.168.1.240 255.255.255.0
 duplex half
 speed 1000
!
000250: Jan  5 15:46:47.754: %CDP-4-DUPLEX_MISMATCH: duplex mismatch discovered on GigabitEthernet0/0 (not full duplex), with UCSwitch01-3560CG GigabitEthernet0/3 (full duplex).
000251: Jan  5 15:47:36.870: %CDP-4-DUPLEX_MISMATCH: duplex mismatch discovered on GigabitEthernet0/0 (not full duplex), with UCSwitch01-3560CG GigabitEthernet0/3 (full duplex).
000252: Jan  5 15:48:34.946: %CDP-4-DUPLEX_MISMATCH: duplex mismatch discovered on GigabitEthernet0/0 (not full duplex), with UCSwitch01-3560CG GigabitEthernet0/3 (full duplex).

In Example 4-15, it becomes clear that the duplex mismatch error is quite persistent and difficult to ignore. Any change in speed or duplex on the switch interface will result in the interface going down momentarily and then attempting to re-establish communication and come back up. So, there should be a sufficient audit trail in finding out when the changes were made and what changes were made, assuming that some manner of TACACS or RADIUS service is in place.

From the perspective of CUCM, you can check the status of the network interface most readily on the CLI. Example 4-16 shows the output of the show network eth0 command.

Example 4-16. show network eth0 Command Output


admin:show network eth0
Ethernet 0
DHCP         : disabled           Status     : up
IP Address   : 172.16.100.1       IP Mask    : 255.255.000.000
Link Detected: yes                Mode       : Auto disabled, Full, 10000 Mbits/s
Duplicate IP : no

DNS
Primary      : 172.16.100.10      Secondary  : 8.8.8.8
Options      : timeout:5 attempts:2
Domain       : uclab.us
Gateway      : 172.16.1.1 on Ethernet 0


admin:

In Example 4-16, you can see the speed, duplex, and other relevant settings for the network interface on the CUCM node. One other place to check, in terms of networking, is the VMware vSwitch settings. Open vCenter or vSphere and take a look at the host’s network configuration screen. This is shown in Figure 4-10.

Figure 4-10. vSphere Network Configuration Screen

Image

In Figure 4-10, network interface parameters of the configured vSwitch are visible. You invoke the pop-up box by hovering the mouse pointer over the small dialog box to the right of the physical adapter—in this case, vmnic0.

Chapter Summary

Troubleshooting CUCM and the associated services and devices can be a daunting task. It is important to stay calm and think through the issues at hand. Context is exceedingly critical when dealing with the issues that can arise in a collaboration system architecture. A small thing such as whether a failure occurred during or after dialing makes all the difference. The more detail that you can accumulate regarding an event, error, or issue, the better chance you have to find the root cause.

Being able to view logs, collect traces, isolate errors, track CPU and memory usage, and determine when there is an issue, as well as its severity or potential impact, comes with knowing your particular deployment. Size, methodology, peer systems, additional functionality, and user (types and volume) all factor into the big picture. Understanding both the collaboration system deployment and the network infrastructure is key to being able to troubleshoot and narrow down causes of slow responsiveness, dialing issues, and even server crashes.

References

For additional information, refer to the following:

• CUCM Troubleshooting Guides:

http://www.cisco.com/c/en/us/support/unified-communications/unified-communications-manager-callmanager/products-troubleshooting-guides-list.html

• CUCM Troubleshooting TechNotes:

http://www.cisco.com/c/en/us/support/unified-communications/unified-communications-manager-callmanager/products-tech-notes-list.html

• CUCM Maintain and Operate Guides (by version, includes CLI Reference Guides):

http://www.cisco.com/c/en/us/support/unified-communications/unified-communications-manager-callmanager/products-maintenance-guides-list.html

Review Questions

Use these questions to review what you’ve learned in this chapter. The answers appear in Appendix A, “Answers to Chapter Review Questions.”

1. Call control redundancy is defined using which of the following configurations?

a. Call Routing

b. Unified CM Group

c. CUCM CLI

d. Media Resources

2. The SRST instance to be utilized by IP phones and other endpoints is specified where?

a. Unified CM Group

b. Device Pool

c. Which of the following tools is used for system monitoring and trace collection?

a. RTMT

b. Dialed Number Analyzer

c. Cisco Unified Serviceability

d. Syslog Server

4. Within the Real Time Monitoring Tool, where will system alarms be listed and detailed?

a. AuditLog Viewer

b. Performance Log Viewer

c. Alert Central

d. Analysis Manager

5. Which CUCM CLI command will show services and the status of each?

a. utils service list

b. show status

c. utils service show

d. show service list

6. Which CUCM CLI command displays average CPU and disk IO?

a. show stats io

b. show per query class Processor

c. show process load

d. show status

7. Which CUCM CLI command initiates the running of multiple system tests, including disk space, service validation, and network validation?

a. show status

b. show system statistics

c. utils diagnose test

d. utils system restart

8. Which DHCP option provides the TFTP server to Cisco IP phones?

a. Option 150

b. Option 63

c. Option 56

d. Option 67

9. Which service provides access the CUCM Web Administration tool?

a. Cisco CallManager

b. Cisco TFTP

c. Cisco Tomcat

d. Cisco CallManager

10. Which CUCM CLI command shows the speed, duplex, and other network-related information for CUCM?

a. show network status

b. show network eth0

c. show network route

d. show network cluster

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.104.250