Chapter 10. Troubleshooting Audio and Video Quality Issues in Cisco Collaboration Solutions

In any network troubleshooting effort, it is imperative that the underlying technologies and infrastructure be understood. The same holds true for audio and video quality issues. Understanding the end-to-end call signaling and media flows, voice gateway and endpoint operations, and media resources within the Cisco collaboration solution is a must. This chapter examines voice quality issues as well as how to isolate and address them.

Chapter Objectives

Upon completing this chapter, you will be able to

• Identify voice quality issues in Cisco collaboration systems

• Identify and isolate voice and video quality problems

• Troubleshoot Layer 2 quality problems

• Troubleshoot voice quality issues on a gateway

• Troubleshoot quality issues at endpoints

• Identify one-way audio and video issues

Voice Quality Issues in Cisco Collaboration Systems

Voice quality issues can happen for a number of reasons. Data networks can be bursty and somewhat unpredictable. Sometimes it seems that the traffic flows are trying to explode beyond the confines of what the physical transmission medium can handle. On other occasions, the traffic flows are well behaved. With the proper application of quality of service (QoS), traffic flows of varying times and criticality can be managed and prioritized. It is still common to encounter the opinion that a “big, fat pipe” is a waiver for any need to configure QoS.


Note

Bandwidth over LAN is a significantly lower-cost resource than WAN. To manage and conserve WAN bandwidth, an organization may deploy a number of mechanisms (beyond the scope of this book), and at times, the improper planning to deploy and the configuration of these mechanisms can lead to voice and video quality issues. To read more about bandwidth management tools, refer to Implementing Cisco IP Telephony and Video, Part 1 and 2.


Five general areas contribute to the potential for problems in a converged network infrastructure:

• Insufficient Bandwidth

• Fixed Delay

• Variable Delay (Jitter)

• Packet Loss

• Traffic Prioritization

Let’s take a closer look at each of these areas because they are general in scope and impact.

Insufficient Bandwidth

So, a big, fat pipe isn’t the answer? What is the answer then? Well, the big, fat pipe concept is part of the answer, just not entirely. Consider the types of traffic traversing a typical data network. It might include graphics files of significant size, multimedia of various sorts, file transfer, backup/restore, voice/video signaling, and media; the list can go on and on.

It’s a common misconception to look at bandwidth as a single, all-inclusive element of the network. In fact, it is composed of many elements end-to-end between the source and destination of a given traffic type. In short, the available bandwidth is that which is available on the slowest or most utilized segment between source and destination. Think of the idiom “a chain is only as strong as its weakest link.” The weakest may be that which is under the most load or simply lacks sufficient structural integrity to hold under load. Figure 10-1 shows an example of how end-to-end bandwidth can impact traffic flows.

Figure 10-1. End-to-End Bandwidth Illustration

Image

In Figure 10-1, the network topology shows that there is indeed a weakest link, in terms of maximum bandwidth, between the source and destination. What is not shown is the load of each link. It is feasible that the 512 kbps link is heavily utilized and is actually weaker, in terms of available capacity, than the 256 kbps link. Congestion on wide area network (WAN) or local area network (LAN) interfaces can cause delayed and/or dropped traffic. WAN loss is generally produced by tail drops, whereas LAN loss can occur as a result of congested input or output buffers on an interface. Some traffic types can handle such delays and drops. This is not the case with voice and video traffic. While the signaling traffic is Transmission Control Protocol (TCP) based, the media traffic is User Datagram Protocol (UDP) based.

Routers may also drop packets for other reasons, such as the following:

Input queue drop: The input queue is full, causing a traffic drop.

Ignore: The buffer space is full, causing packets to be ignored.

Overrun: The central processing unit (CPU) is congested, and it can’t assign a free buffer to new packets.

Frame errors (cyclic redundancy check [CRC], runts, giants): Hardware detected an error in the frame.

WAN links: There are frame slips or carrier discards.

Properly configured QoS is the mechanism that protects traffic on the LAN and WAN links. It provides priority on input buffers, processing, and output buffers on the routing and switching interfaces. The big, fat pipe concept doesn’t protect the traffic in any way. It leaves all traffic types to fend for themselves as they traverse the devices comprising the network.

Properly designing the network, including the implementation of best practice QoS deployment, protects vital traffic types. The vital (business-critical) traffic may not always have voice and interactive video at the highest levels. Some traffic types are vital to operation of the business and may be configured to override voice/video traffic. Preventing the dropping of sensitive applications includes the following:

• Increase link capacity to eliminate and/or prevent congestion and utilize low-latency queuing (LLQ).

• Configure predictable packet dropping for less critical applications or drop-insensitive traffic flows using weighted random early detection (WRED) to drop traffic before a congestive condition can occur.

• Confirm proper trunk sizing and QoS throughout the network (LAN and WAN).

• Use traffic shaping on WAN links to minimize loss of voice and video packets in the network and prevent a particular traffic type from monopolizing interface resources.

Fixed Delay

Delay is not an issue that is subject to a single definition. In concept, it can be defined. But the sources of the delay vary significantly. The degree of control availability also varies accordingly.

Some sources of delay can be mitigated. Others simply are what they are, and there is no way around them. This is the case for fixed delay. Fixed delay is that which is inherent in every traffic flow despite the best efforts of architectural design. Fixed delay includes serialization delay, propagation delay, and to some degree, processing and queuing delay. Figure 10-2 shows fixed delay in concept.

Figure 10-2. Fixed Delay Illustration

Image

Propagation delay is caused by physics. The speed of light is simply inexorably limiting. The speed of transmission across physical media tends to be around 6 microseconds per kilometer over copper media. Propagation delay is generally overlooked because it seems rather insignificant. However, it does build. The distance from New York to Los Angeles is around 4500 kilometers. That equates to roughly 27,000 microseconds, or 27 ms, one-way. Suddenly, it is significant when design parameters for voice dictate an acceptable maximum one-way delay of 150 ms. Consider Cisco Unified Communications Manager (CUCM) clustering over the WAN with a maximum one-way delay of 40 ms (80 ms two-way) between nodes. In terms of cluster architecture, it just became very significant.

Serialization delay is the time it takes to process framing, line coding, and clocking. In other words, all of the bits have to be placed in the correct order before they can be transmitted. This is a fixed value based on the available bandwidth of the link.

Processing and queuing delay within a router or switch are caused by a wide variety of conditions. These are somewhat hybridized in terms of fixed versus variable delay. They are fixed in that delay exists regardless of the transmission type. They are not so fixed in that the extent of delay experienced is variable per traffic type, depending on how the queueing structure is configured.

Variable Delay (Jitter)

Delay is a fact of life. It can be predicted—sometimes more accurately than others. However, conditions may cause the delay in a particular traffic flow to fluctuate. Variable delay causes problems in voice and video traffic flows. Figure 10-3 shows the concept of jitter.

Figure 10-3. Variable Delay (Jitter) Illustration

Image

In Figure 10-3, the traffic flow is left to right. The left side is well ordered and predictable. The right side is less so. The packets are unevenly spaced (timed) in their departure and therefore arrival. This can occur because of network congestion, improper queuing, or configuration errors. Figure 10-3 represents a potentially erroneous concept in terms of the media flow for voice/video traffic. If multiple equal-cost paths exist through the network, UDP datagrams can arrive out of order. Each path has its own set of delay variables in play. Recall that TCP segments flow in a session. That is, they all take the same path between source and destination. UDP datagrams are independently routed on their own merits. So, the delay variation can be significant in that packets are not only delayed for differing amounts of time but also can arrive out of order. They then have to be properly reordered, which takes time.

In seeking out jitter, the place to begin is on router interfaces. These are the most typical places where jitter is introduced. Additionally, for WAN interfaces, there is a great deal of control over these interfaces in terms of traffic management and shaping. Alternately, jitter can be displayed on the source and destination endpoints by viewing the call information while a session is active. The display shows codec, Tx (transmit) and Rx (receive) packets, jitter, and other call statistics.

On WAN-facing interfaces, make use of LLQ configuration. Queuing is usually not the cause of jitter, unless there is a misconfiguration or misclassification of traffic flow. In cases in which jitter becomes significant, the use of de-jitter buffering may become a necessity to smooth out the traffic flow. On slower interfaces (768 kbps or less), utilize link fragmentation and interleaving to help break up the bigger traffic and minimize jitter.

Digital signal processors (DSPs) are the call handlers. They are in charge of voice packetization and can handle some degree of jitter, but they also can be overrun by it. Of course, this results in low, or degraded, voice/video quality.

Packet Loss

Packet loss occurs for numerous reasons. Typically, it is a result of congestion on an interface. Most applications utilizing TCP experience a slowdown because of the protocol adjusting to network resources. As TCP flows accelerate and the window size opens, an occasional (and intentional) packet drop forces the TCP window to close. This is more a flow control mechanism than an actual problem. Other intentional packet drops are performed with a WRED configuration for congestion avoidance. UDP-based applications cannot adjust to the conditions of the network in this manner. TCP simply retransmits the lost information. UDP does not retransmit. In fact, from a voice/video perspective, retransmissions cause more problems due to the latency involved.

When packets must be dropped, it’s typically considered better to control what is dropped and when. Preferably, the drops will occur with noncritical traffic types and before a congestive condition exists.

Sometimes packet drops occur under circumstances beyond your control. This might be on the carrier side of the connection. Such drops can occur when traffic flow exceeds the agreed-upon transmission speeds or when the carrier is simply having issues. All in all, it’s generally agreed that being in control of what gets dropped is the best course of action.

Traffic Prioritization

QoS enables network control and predictability under a varied array of situations and traffic profiles. It allows control of routing and switching resources including bandwidth, equipment, WAN facilities, and such by traffic type. QoS is erroneously seen as an output queuing mechanism. It is less understood that it is an input, processing, and output mechanism. It ensures that WAN resources are properly utilized (traffic shaping/traffic policing).

More importantly, it allows prioritization of mission-critical application traffic, such as voice and video signaling and media. Minimizing delay for these traffic types is crucial, as discussed earlier. In prioritizing the critical traffic, the remaining traffic types are also ensured fair treatment and access to available resources so as to minimize delay.

When deciding on the manner in which QoS should be deployed, consider the business or application that needs addressing or solving. Each of the three types of service offered in a QoS configuration is appropriate for certain conditions. They include

Best Effort Service: A single service model in which an application transmits as its needs dictate. This is essentially a first in, first out (FIFO) service.

Integrated Service: Multiple traffic definitions allow applications to request a specific kind of service prior to transmission. This is the mechanism utilized by the Resource Reservation Protocol (RSVP).

Differentiated Service: A model meeting the needs of multiple traffic flows and QoS requirements but no prior permission or resource request is utilized. This is used in most environments for providing mission-critical end-to-end QoS.

QoS is a somewhat in-depth topic all by itself. It encompasses a number of mechanisms, including traffic classification, congestion management, congestion avoidance, traffic policing, traffic shaping, and more. For more information regarding QoS, check out the Quality of Service home page on CCO:

http://www.cisco.com/c/en/us/products/ios-nx-os-software/quality-of-service-qos/index.html

Identify and Isolate Voice and Video Quality Problems

The interesting aspect (or not depending on the situation) is that voice quality is entirely subjective. What sounds fine to one person may sound terrible to another. Reproducing the problem can sometimes be an exercise in futility, but it must be tried. It is the first step in any troubleshooting exercise. User reports simply lack accuracy at times.

When working to identify quality issues, think about the needs of the call versus the actual ability of the network infrastructure to provide for those needs. Consider the following factors:

• Understand the codec characteristics and bandwidth required by that codec per call.

• Understand the network topology and WAN technologies in use.

• Optimally deploy QoS techniques that allow voice and video to be identified and prioritized.

• Use link fragment interleave (LFI) and compressed RTP (cRTP) on slow WAN links when voice traffic is sharing the link with other traffic flows.

• Look for the following when troubleshooting:

• Interface drops on switches and routers

• Buffer drops

• Policy-map drops

• Interface congestion

Bandwidth calculations can get interesting, at times, where voice and video traffic are concerned. You should keep in mind multiple parameters aside from the codec rate. In the calculations, you must take into account the payload size, transport layer overhead (UDP/RTP headers), whether or not cRTP is in use, network layer overhead (Internet Protocol [IP] header), and data link layer overhead (framing). In terms of transported payload, the overhead for voice over IP (VoIP) is exceedingly high. This requires an intricate, end-to-end perspective because the per-call bandwidth varies as the underlying data link layer technology changes.

Frame Relay presents particularly interesting challenges due to the diversity of service-level agreements available from various providers. A Frame Relay circuit with a set committed information rate (CIR) and no power to burst presents an interesting scenario. If the link is running at full utilization, voice may be challenged in getting across. However, there is the capability to implement QoS on the link and prioritize voice traffic. If the link has some capacity to burst over CIR, you should take care to ensure that voice traffic does not exceed the CIR threshold. The reason is that any traffic exceeding that rate is flagged as discard eligible (DE). Voice traffic should not be configured to cross any Frame Relay connection purchased with CIR=0. This is often done as a cost-saving methodology, but it results in all traffic being flagged as DE. That isn’t acceptable for voice, especially during peak traffic times.

When working with slower links, consider using LFI. Both PPP and Frame Relay have fragmentation options. For Multilink Point-to-Point Protocol (MLPPP), LFI can be enabled on Point-to-Point (P2P) links. Frame Relay links make use of the FRF.12 specification for fragmentation. This allows not only for prioritization, but when the larger data packets get transmission time, they can be broken up and interspersed with more priority traffic.

When troubleshooting comes to the point of digging down into the network infrastructure in detail, most of the other tools available have usually been exhausted. For this level of digging, it’s going to require a hop-by-hop, manual analysis. Examine the following:

• Interface drop (both inbound and outbound)

• Buffer drops (inbound/outbound)

• Policy map drops

• Interface congestion (inbound/outbound)

• Link congestion end-to-end

Use tools like Cisco’s extended ping to transmit pings on RTP ports and set the size to that which matches the voice payload for the codec in question. A regular ping may give you an idea of latency, but it is simply data traffic and will not flow like voice traffic. Use the extended ping capabilities. Ensure that the one-way latency does not exceed 150 ms. If it does, it’s time to find out why. Follow the path hop-by-hop to find the culprit.

When troubleshooting anything reported by users, you must ask standard questions aside from “Did you turn it off and turn it back on?” These questions might include

• Has it ever worked?

• What is your phone number and the phone number you called?

• What were the exact digits you dialed?

• When did it fail—when dialing or after dialing the entire number?

• Have you ever called this number before?

• Are you allowed to call this number?

• Were any error messages displayed on the phone?

• Did you hear any error messages (annunciator) on the phone?

• What time did this occur?

• Can you call other destinations similar to this one? If it was local/long-distance/international, can you dial other local/long-distance/international numbers?

• Which phone were you using (desk phone versus soft phone versus mobile client)?

Following are some questions not necessarily applicable to the end user but valuable to answer nonetheless:

• Are you having problems with the site or subnet?

• Is a network outage or other convergence event in progress?

• Is this a reachability issue?

• Have changes been made within the network, dial plan, or elsewhere in the overall architecture?

Collect as much information as possible from the users. Even if the user is not necessarily accurate in description, you can listen for key concepts and phrases. Keep asking questions until you have at least some mental picture of the direction to pursue.

With all of this in mind, including network diagrams, a particularly useful troubleshooting methodology is to draw out the call flow with translations and transformations on a whiteboard or note pad to better visualize the end-to-end path.

Gather available information and try to reproduce the issue both using the dialed number analyzer and the actual user’s phone. Being able to reproduce the issue will give you a better feel for where the issue might lie. Of course, make use of the other tools discussed throughout the book:

• IP Phone:

• Status button to show RTP statistics

• Quality Report Tool

• Cisco TelePresence Endpoints:

• Call Status

• Switches:

• Show commands

• Console messages

• Debug output

• Routers:

• Show commands

• Console messages

• Debug output

• CUCM:

• RTMT Trace Files

• RTMT System Logs

• RTMT Quality Report Tool

• RTMT Alarms

• Call Detail Records (CDR)

Echo

Perhaps the most interesting, and possibly most subjective, issue encountered is echo. Echo is only a problem when it is annoying, and that depends on the ear of the beholder. The key in the discussion is the question of what makes echo annoying. There are two factors: volume (amplitude) and delay. As both increase, so do the annoyance factors. If both are very low, the echo might not even be noticeable.

Echo typically occurs at time-division multiplex (TDM) touch points. It occurs most on gateways where analog and/or digital circuits provide ingress/egress for calls. Echo is caused by an impedance mismatch in the two-/four-wire hybrid in a phone. In other words, it is leakage in the path between transmit and receive paths. It is clear why echo generally occurs only in TDM touch points. It tends to be rather difficult to have leakage between transmit and receive paths in Real-time Transport Control Protocol (RTCP) streams. IP Phone–to–IP Phone echo really only occurs when the phones are too close together and the sound generates feedback between the speakers and mics of the phones. You call that feedback, not echo. Technically, it is echo.

You can deal with echo in a number of ways. Echo suppression and cancelation mechanisms are built in to Cisco internetwork operating system (IOS) gateway interfaces. They are configurable according to needs of the cancelation level required.

The two types of echo are talker-echo and listener-echo. Talker-echo is the more prevalent. It’s essentially when the speaking party hears his or her voice reverberating in the phone’s handset. When timed properly, no issue occurs. However, once the echo interval exceeds 25 ms, it starts to become distracting.

For MGCP gateways, echo issues can usually be corrected in the gateway settings in CUCM. For H.323 and SIP gateways, manual intervention is necessary.

In Cisco IOS, there are quite a few options in terms of tweaking the echo canceler settings on the gateway. First, it is important to understand the settings and what is relevant to tweaking them. The Cisco IOS echo canceler has a number of configurable options, including coverage and suppression. The coverage interval is a buffer. It creates a replica of the outbound wave form shifted 180 degrees out of phase (precisely inverted) and overlays it in order to cancel any repetition of that wave form. Echo suppression is a function of the echo canceler that is invoked during the first 2–3 seconds of a call prior to convergence. It covers the echo while the echo canceler is building its wave form and getting ready to monitor for echo.

To see what is happening on the voice port, issue the show call active voice command. Example 10-1 shows an example of this command.

Example 10-1. show call active voice Command Output


DAL_HQ_GW# show call active voice
Telephony call-legs: 1
SIP call-legs: 0
H323 call-legs: 1
MGCP call-legs: 0
Total call-legs: 2

GENERIC:

---Truncated---

VOIP:
ConnectionId[0xE3980CAC 0x14F511CC 0x8014B209 0x2A4ADF68]
IncomingConnectionId[0xE3980CAC 0x14F511CC 0x8014B209 0x2A4ADF68]
RemoteIPAddress=11.1.1.200
RemoteUDPPort=19394
RemoteSignallingIPAddress=11.1.1.200
RemoteSignallingPort=11009
RemoteMediaIPAddress=11.1.1.200
RemoteMediaPort=19394
RoundTripDelay=0 ms
SelectedQoS=best-effort

---Truncated---

TELE:
ConnectionId=[0xE3980CAC 0x14F511CC 0x8014B209 0x2A4ADF68]
IncomingConnectionId=[0xE3980CAC 0x14F511CC 0x8014B209 0x2A4ADF68]
TxDuration=367010 ms
VoiceTxDuration=7690 ms
FaxTxDuration=0 ms
CoderTypeRate=g711ulaw
NoiseLevel=-61
ACOMLevel=16
OutSignalLevel=-43
InSignalLevel=-53
InfoActivity=2
ERLLevel=16
SessionTarget=
ImgPages=0
CallerName=
CallerIDBlocked=False
OriginalCallingNumber=6804
OriginalCallingOctet=0x80
OriginalCalledNumber=6702
OriginalCalledOctet=0x81

---Truncated---

The highlighted pieces of the output are most relevant to echo. Echo is measured as echo return loss (ERL). The performance of the echo canceler is measured in echo return loss enhancement (ERLE). Acombined (ACOM), the total signal loss of the echo is the sum of the ERL and ERLE. In the example, ACOMLevel= 16, OutSignalLevel= –43, InSignalLevel= –53, and ERLLevel= 16. What does all of that mean?

In looking at the OutSignal and InSignal, the difference is 10. That is, the ERL should be 10, but it’s measuring as 16. This seems to mean that the echo canceler has not yet converged. When the difference between the in and out signal is equal to the ERL, convergence is complete. That’s technically the way it should work. However, remember this is a bleed of signal across audio Tx/Rx paths. It may vary somewhat. If ERL is too low, the echo canceler won’t be able to fully cancel the echo. So, generally the ERL should be at least 15 (16 in the example), preferably closer to 20 to fully suppress echo under all conditions. Additionally, if the ERL is too low, the signal that returns may be too loud (within 6 dB of the talker signal). This causes the echo canceler to consider it as double-talk instead of echo. Consequently, the signal is not canceled. The ERL needs to be around 6dB higher in order for the echo canceler to engage.

Adjusting the signal levels on the gateway voice ports allows the numbers to be tweaked. Adjust output attenuation using positive values and input gain using negative values. Input gain is adjusted before the echo canceler processes the echo signal where output attenuation is performed after. Also, remember that echo is caused by an impedance mismatch. So, tweaking the impedance settings on the voice port is also a potential remedy. The default impedance is 600 ohms real. It is consistent with most public switched telephone network (PSTN) connections and private branch exchanges (PBX). So, it generally won’t need to be changed, but it is there as an option. If you happen to alter any of these settings, it is necessary to issue shutdown/no shutdown commands at the voice port for them to take effect.

Again, this is by no means an exhaustive, or even a marginally complete, analysis. It merely gives an idea of what values are important in dealing with echo.

Troubleshoot Layer 2 Quality Problems

Layer 2, or the data link layer, of the OSI model represents an interesting twist to the bigger picture. The reason is largely that there are so many possibilities in terms of LAN and WAN options. The number of options has come down greatly over the past couple of decades. That trend is likely to continue in years to come.

In a campus environment, the Layer 2 topology is generally Ethernet, be it 10 Mbps (yes it’s still out there), 100 Mbps, 1 Gbps, 10 Gbps, or faster. This means that bandwidth is usually not at a premium or difficult to come by. However, that is not always the case. Figure 10-4 shows a couple of instances in which bandwidth becomes an issue.

Figure 10-4. Ethernet Bandwidth Issues

Image

In Figure 10-4, it quickly becomes evident why QoS is so important, even in the presence of high bandwidth connectivity. Both scenarios in Figure 10-4 show an overload in the making. When it comes down to it, the voice traffic is going to have to be protected and allowed into, across, and out of the switch regardless of how much load is on the interface. QoS prioritizes the input buffers, processor time, output buffers, and more. It’s not simply a need on the WAN. And, lots of bandwidth doesn’t buy you priority across the buffers and CPU of the switch fabric. Only properly applied QoS does that. This is the exact situation that should be discussed when the “big, fat pipe” types start going on about the lack of any real need for QoS. When you have 47-gigabit interfaces on a 48-port switch all vying for egress out a single gigabit interface, there is going to be a good bit of contention. Bandwidth is generally not a huge concern in LAN environments. However, buffer congestion is a significant issue.

On LAN switches at the access and distribution layers, take a look at the ingress and egress queueing configuration. The same needs for traffic prioritization exist at the core layer. But the bulk of the issues tend to show at the access layer and on uplinks to the distribution layer. The inbound and outbound queueing mechanisms vary from switch to switch, so be sure to know the mechanisms in use on the particular switch that is the target of the troubleshooting efforts. Verify the following:

• The configuration ensures low latency for voice and video traffic (signaling and media).

• Voice and video traffic is mapped to a queue threshold that minimizes drops.

• Outgoing voice and video are mapped to the correct outgoing marker.

• If rate limiting is used, ensure that its impact on voice/video is minimal.

Poor quality is actually less likely on the LAN than the WAN. It’s a simple question of resource availability. However, that doesn’t excuse the configuration of QoS. Voice and video traffic must be serviced with priority, regardless of bandwidth. During congestive periods, queues fill up and sometimes overflow. When this happens, frames are dropped. WRED can be used to avoid this in similar manner to that discussed on WAN links. If drops must occur, choose the traffic that will be dropped. Also, drop that traffic before a congestive situation can emerge to endanger other traffic types. QoS should be deployed in a like manner on every switch, especially the uplinks, in the network to protect voice/video traffic.

Troubleshoot Voice Quality Issues on a Gateway

Gateway troubleshooting is a frequent topic within this book. The means and methods of troubleshooting voice quality issues are largely the same, in terms of show and debug commands on the gateway When voice becomes choppy, clipped, or otherwise degraded, the troubleshooting shifts from call flow, dial plan, and such to the network and current conditions. The reason is that the call actually sets up an established media flow. So, dial plan and digit manipulation can be ruled out. Going with the common theme of this chapter so far, it is time to begin seeking the cause of the issue.

Common issues include congestion on the LAN and/or WAN. Insufficient bandwidth availability (inadequate QoS) may also be to blame. Why mention inadequate QoS in terms of insufficient bandwidth? Shouldn’t the connection admission control (CAC) have dealt with that? No. CAC only deals with the amount of bandwidth available between sites, as configured, and monitors its usage. It does not have visibility into the utilization of the links or that the configured bandwidth was actually made available through QoS reservation. The QoS configuration still has to be there to protect the traffic. CAC only keeps excessive calls from setting up and exceeding the allocated bandwidth.

Low-latency queuing (LLQ) has the capability to distinguish between up to 64 traffic types in the configuration. It also has the capability to carve out bandwidth for each, if desired. One or more of the traffic classes may have a priority queue assigned for its use. In troubleshooting issues associated with bandwidth not being properly allocated, check the LLQ configuration used to classify voice/video traffic and provide that bandwidth on the egress interface.

Other common issues arise with duplex and speed mismatches on LAN ports. This causes packet loss. Duplex mismatch occurs when one side is set for full duplex and the other side is set for half duplex. It’s becoming less common, but it still occurs. Check the LAN switch connections between source and destination. It will be quite clear when the correct switch is found, if there is a mismatch. The switch will be squawking about it every few seconds.

There has been significant discussion of LFI and that it should be utilized on slower WAN links (768 kbps or less). It allows voice traffic to be interspersed with larger data packets that have been fragmented. If this is not utilized, voice/video traffic may experience excessive delays, thereby compromising voice quality. In choosing a fragmentation size, whether on MLPPP or Frame Relay, pick a fragment size that results in serialization delay of 10–20 ms. As a rule, divide the link speed in bits per second by 800 to obtain the fragmentation size. For example, if the link speed is 64,000 bps, the fragment size that results in 10 ms latency is 64,000/800 = 800 bits = 80 bytes. If you use the same formula, a 768,000 bps link results in a 120-byte fragmentation size.

The Layer 2 framing makes a significant difference as well. On Frame Relay links, the less bursting (Bc), the better. In fact, with voice/video traffic, eliminate excess burst (Be) altogether. Make sure CIR = MinCIR. Also, never deploy voice over CIR=0 circuits. If voice becomes flagged as DE traffic, it can, and will, be discarded if the carrier network gets too busy. When monitoring a serial interface for load and congestion, use the show interface command as shown in Example 10-2.

Example 10-2. show interface Command Output


BE6000S# sh interface serial 0/1/0
Serial0/1/0 is up, line protocol is up (spoofing)
  Hardware is DSX1
  MTU 1500 bytes, BW 256 Kbit/sec, DLY 20000 usec,
     reliability 255/255, txload 243/255, rxload 180/255
  Encapsulation Frame Relay, crc 16, loopback not set
  Keepalive set (10 sec)
  Last input 00:00:05, output 00:00:05, output hang never
  Last clearing of "show interface" counters never
  Input queue: 0/75/0/0 (size/max/drops/flushes); Total output drops: 0
  Queueing strategy: class-based queueing
  Output queue: 897/1000/963 (size/max total/drops)
  5 minute input rate 0 bits/sec, 0 packets/sec
  5 minute output rate 0 bits/sec, 0 packets/sec
     45969 packets input, 137907 bytes, 0 no buffer
     Received 0 broadcasts (0 IP multicasts)
     15 runts, 0 giants, 0 throttles
     95 input errors, 95 CRC, 0 frame, 0 overrun, 0 ignored, 2 abort
     46064 packets output, 138192 bytes, 0 underruns
     0 output errors, 0 collisions, 1 interface resets
     0 unknown protocol drops
     0 output buffer failures, 0 output buffers swapped out
     1 carrier transitions
     DCD=up   DSR=up   DTR=up  RTS=up   CTS=up
BE6000S#

In Example 10-2, it is clear that the interface is in distress. The load is 243 of 255 (95.3 percent utilization). In addition, 963 packets have been dropped. That certainly indicates an interface under distress. If those drops include voice packets, voice quality suffers as a result.

When using LLQ, check for policy map drops using the show policy-map interface interface command. Example 10-3 shows the output of that command.

Example 10-3. show policy-map interface Command Output


Class-map: VOICE (match-all)
    5987462 packets, 419122340 bytes
    5 minute offered rate 98010 bps, drop rate 45 bps
    Match: ip dscp ef (46)
    Priority: 106 kbps, burst bytes 2650, b/w/ exceed drops 32
    Compress:
       Header ip rtp
       UDP/RTP (compression on, Cisco, RTP)
          Sent:   1020 total, 979 compressed
                      41957 bytes saved, 17983 bytes sent
                      Rate 5000 bps

In Example 10-3, drops are shown in the output. The class of traffic called VOICE is configured with “strict priority” and is allowed to use up to 106 kbps of the configured bandwidth. The bandwidth priority has been exceeded as well. It is conceivable that there is nonvoice traffic entering the queue (misconfiguration) or that there is simply too much voice traffic in the mix. The priority queue enables a strict priority queue within the class-based weighted fair queuing (CBWFQ) class. With LLQ, delay-sensitive traffic (voice/video) is dequeued and sent first. The priority queue is policed to ensure that the fair queuing mechanism is not compromised or starved for bandwidth. When the interface reaches a congestive state, the priority queue is serviced until the load reaches the configured kbps value in the priority statement. Excess traffic is dropped to avoid starving lower-priority queues. Using default or older queuing mechanisms can result in quality issues through packet loss or jitter. To verify that LLQ is configured properly on the egress interface, use the show policy-map interface interface command and look for priority.

When using cRTP and LFI, verify that fragmentation is occurring by issuing the show frame-relay fragment command. Example 10-4 shows the output of this command.

Example 10-4. show frame-relay fragment Command Output


BE6000S# show frame-relay fragment
interface         dlci    frag-type    size    in-frag   out-frag   dropped-frag
Se0/1/0.100       101     end-to-end   320     14        22            0
Se0/1/0.101       102     end-to-end   640     38        42            0
BE6000S#

In Example 10-4, you can see the fragmentation size for each of the permanent virtual circuits (PVC) configured on the interface. Obviously, they are configured for differing speeds. The higher the fragmentation, the higher the bandwidth. Fragmentation should be configured on slow-speed WAN links—that is, those 768 kbps or less. Additionally, cRTP is recommended on such links. Break the traffic flow into three to five models, at most, and assign voice to the high-priority queue. Video is generally not recommended over slow links due to the higher bandwidth requirements that come with it.

A medium-speed link is one between 768 kbps and 1.544 Mbps (T1) or 2.048 (E1). Voice can be assigned to a low-latency queue not to exceed 33 percent of the total available bandwidth of the interface. LFI and cRTP are optional on these interfaces. They generally have sufficient speed to handle the traffic load.

High-speed links, above T1/E1 speeds, do not need LFI, and cRTP is not recommended due to the diminished returns principle. The speed of the link is significantly faster. So, the cost in terms of CPU and latency is detrimental on a higher-speed link than simply letting the traffic flow. High-speed links can handle 5 to 11 traffic classes, generally.

The two LFI mechanisms in widespread use on Cisco IOS platforms are

Multilink PPP (MLPPP): Most commonly used LFI. Used on PPP links.

FRF.12: Frame Relay Forum 12 specification used with Frame Relay virtual circuits.

On a PPP link, the debug ppp multilink fragments command is a good one to use when monitoring MLPPP LFI. Example 10-5 shows the output from this command.

Example 10-5. debug ppp multilink fragments Command Output


BE6000S# debug ppp multilink fragments
Multilink fragments debugging is on

Oct 17 20:03:08.995: Se0/0 MLP-FS: I seq C0004264 size 70
Oct 17 20:03:09.015: Se0/0 MLP-FS: I seq 80004265 size 160
Oct 17 20:03:09.035: Se0/0 MLP-FS: I seq 4266 size 160
Oct 17 20:03:09.075: Se0/0 MLP-FS: I seq 4267 size 160
Oct 17 20:03:09.079: Se0/0 MLP-FS: I seq 40004268 size 54
Oct 17 20:03:09.091: Se0/0 MLP-FS: I seq C0004269 size 70
Oct 17 20:03:09.099: Se0/0 MLP-FS: I seq C000426A size 70
Oct 17 20:03:09.103: Mu1 MLP: Packet interleaved from queue 24
Oct 17 20:03:09.107: Se0/0 MLP-FS: I seq C000426B size 70
Oct 17 20:03:09.119: Se0/0 MLP-FS: I seq C000426C size 70
Oct 17 20:03:09.123: Mu1 MLP: Packet interleaved from queue 24
Oct 17 20:03:09.131: Mu1 MLP: Packet interleaved from queue 24
Oct 17 20:03:09.135: Se0/0 MLP-FS: I seq C000426D size 70
Oct 17 20:03:09.155: Se0/0 MLP-FS: I seq C000426E size 70

The outbound serial interface and the fragmentation sizes are visible in the output.

Troubleshoot Quality Issues at Endpoints

During a call, you can view the call statistics. The means of viewing this information varies by phone model. On Cisco 7900 series phones, you view it by pressing the “i” button twice in rapid succession. On Cisco 9900/8800/7800 series phones, call information is available by pressing the Settings button > Administrative Settings > Call Statistics while the call is active. Figure 10-5 shows the call statistic screens of phones with calls in progress.

Figure 10-5. Call Statistics

Image

In Figure 10-5, the call properties are visible. This includes the codec in use for the call, payload size, Rx packets, Tx packets, jitter, mean opinion score (MoS), and more. MoS is an objective measure of human perception of quality. It is based on the opinions of individuals presented with playback of audio utilizing various codecs. If web access is enabled on the phones, this information is seen there as well.

In terms of troubleshooting, this information is useful because it shows transmit and receive packets, but also packet loss and jitter. If call detail recording (CDR) is enabled, reports are available for retrieval of call statistics once the call has ended. CDR also logs call failures, such as busy signals, fast busy, and so on. If enabled in the Cisco CallManager Service Parameters page in CCMAdmin, CDR can also log calls with a duration of less than one second.

One-Way Audio and Video Issues

One-way audio presents an interesting scenario. It is characterized by a call successfully set up, but one in which only one party can hear the other. These are generally short calls, for obvious reasons. The possibilities are few, in terms of what could cause such issues. They include

• IP reachability is one-way.

• cRTP is not configured on both ends of the link.

• NAT is in the path blocking voice/video.

• An access list is blocking traffic in one direction.

In terms of IP reachability, recall that voice/video media is UDP based. That is, each UDP datagram is routed on its own merits. It is conceivable that the traffic may successfully route from source to destination, but not in the reverse path. In cases such as this, make sure that the relevant IP subnets are being advertised for both sides of the call. A quick ping from the source switch to the destination switch allows for validation of the bidirectional network path.

If RTP header compression is enabled, make sure it’s enabled on both sides of all WAN links between the source and destination. If not, one-way audio can result.

Often, Network Address Translation (NAT) borders or firewalls exist between the phones and their call control nodes. This may also be the case between source and destination phones. If the RTP/RTCP traffic is not permitted bidirectionally, it may be dropped in one direction or the other. Network Address Translation with Port Address Translation (PAT) is especially troublesome with voice traffic. Skinny Client Control Protocol (SCCP), for example, embeds the IP addresses in the payload of the call in order to signal the IP address to which RTP packets should be sent. If the device performing NAT/PAT is not voice-aware, one-way audio may result.

Additionally, firewalls can be a problem if the traffic flows for signaling and media are not open bidirectionally. The fixup protocol command makes sure that UDP port numbers are passed through the firewall.

In terms of troubleshooting tools at this level, a packet capture software package is definitely the preferred means of troubleshooting. Some available freeware packet analyzers, such as Wireshark, excel at troubleshooting voice/video traffic flows.

Chapter Summary

Voice quality issues are caused by insufficient bandwidth, excessive delay, jitter, and/or echo. When you are building a delay budget for voice calls, be aware of fixed sources of delay and variable sources of delay. Fixed sources include packetization/serialization delay and propagation delay. Variable delay sources include compression and other processing mechanisms. One-way delay for voice traffic should be 150 ms or less end-to-end.

In LAN environments, QoS is vital. QoS is the mechanism that prioritizes voice on the input buffers, across the processor, and through the output buffers on the LAN switch. A big, fat pipe is simply not sufficient in this day and age. Issues can occur relating to speed and duplex. These can cause packet loss and should be corrected quickly.

WAN links present a special set of issues related to voice. On slower speed links, enable LFI and cRTP (on both ends) to combat latency constraints and reduce overhead. The size of the fragmentation should be based on the speed of the link in question. cRTP must be enabled on both sides of the WAN link; otherwise, one-way audio may result.

NAT/PAT and firewalls must be voice-aware to function properly with voice/video networks. They collectively represent the most common causes for one-way and no-way audio.

References

For additional information, refer to the following:

• Troubleshooting and Debugging VoIP Call Basics

http://www.cisco.com/c/en/us/support/docs/voice/h323/14081-voip-debugcalls.html

• Cisco IOS Voice Troubleshooting and Monitoring Guide

http://docwiki.cisco.com/wiki/Cisco_IOS_Voice_Troubleshooting_and_Monitoring_Guide

• Wireshark Training and Use

https://www.wireshark.org/#learnWS

• How to Troubleshoot Voice Quality Issues in a UCM Environment

https://supportforums.cisco.com/document/101961/how-troubleshoot-voice-quality-issues-ucm-environment-bad-sound-no-audio

Review Questions

Use these questions to review what you’ve learned in this chapter. The answers appear in Appendix A, “Answers to Chapter Review Questions.”

1. Which of the following contribute to voice quality issues?

a. Excessive bandwidth

b. Proper QoS configuration

c. Variable delay

d. Best practice design

2. A packet may be dropped by network devices for a number of reasons. Which of the following are ways to prevent or minimize such drops?

a. Decrease bandwidth availability

b. Implement traffic shaping and/or policing policies on WAN links

c. Input queue size reduction

d. Increase MTU size

3. Which is an example of fixed delay in a network?

a. Jitter

b. Weighted fair queuing

c. Propagation delay

d. Framing error

4. From a design perspective, what is the maximum one-way delay threshold for acceptable voice quality?

a. 80 ms

b. 40 ms

c. 1500 ms

d. 150 ms

5. In terms of QoS/traffic prioritization design, which of the following are valid service categories for various traffic types? (Choose three.)

a. Best Effort

b. Integrated

c. Differentiated

d. Vital

6. Which of the following are means of minimizing or eliminating echo? (Choose two.)

a. Echo suppression

b. Echo cancelation

c. Echo enhancement

d. Echo perception

7. Which of the following might be a source of voice quality issues at Layer 2 of the OSI model?

a. Insufficient bandwidth

b. Buffer congestion

c. Codec selection

d. Voice gateway selection

8. On what speed links would Link Fragmentation and Interleaving (LFI) be recommended for use?

a. 1024 Kbps and higher

b. 768 Kbps and higher

c. 768 Kbps and lower

d. 1.544 Mbps and lower

9. Which Cisco IOS command shows drops on an output queue?

a. show interface

b. show diag

c. show inventory

d. show cdp neighbor

10. One-way audio can be caused by which of the following?

a. cRTP configured on both ends of a WAN link

b. SCCP phone connecting to an SIP phone

c. cRTP configured on only one end of a WAN link

d. Codec mismatch

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.130.232