Chapter 19
Network Troubleshooting

THE FOLLOWING COMPTIA NETWORK+ EXAM OBJECTIVES ARE COVERED IN THIS CHAPTER:

✓ 3.2 Compare and contrast business continuity and disaster recovery concepts.

  • NIC teaming

✓ 5.1 Explain the network troubleshooting methodology

  • Identify the problem
    • Gather information
    • Duplicate the problem, if possible
    • Question users
    • Identify symptoms
    • Determine if anything has changed
    • Approach multiple problems individually
  • Establish a theory of probable cause
    • Question the obvious
    • Consider multiple approaches
      • Top-to-bottom/bottom-to-top
      • OSI model
      • Divide and conquer
  • Test the theory to determine the cause
    • Once the theory is confirmed, determine the next steps to resolve the problem
    • If the theory is not confirmed, reestablish a new theory or escalate
  • Establish a plan of action to resolve the problem and identify potential effects
  • Implement the solution or escalate as necessary
  • Verify full system functionality and, if applicable, implement preventive measures
  • Document findings, actions, and outcomes

✓ 5.3 Given a scenario, troubleshoot common wired connectivity and performance issues

  • Attenuation
  • Latency
  • Jitter
  • Crosstalk
  • EMI
  • Open/short
  • Incorrect pin-out
  • Incorrect cable type
  • Bad port
  • Transceiver mismatch
  • TX/RX reverse
  • Duplex/speed mismatch
  • Damaged cables
  • Bent pins
  • Bottlenecks
  • VLAN mismatch
  • Network connection LED status indicators

✓ 5.4 Given a scenario, troubleshoot common wireless connectivity and performance issues

  • Reflection
  • Refraction
  • Absorption
  • Latency
  • Jitter
  • Attenuation
  • Incorrect antenna type
  • Interference
  • Incorrect antenna placement
  • Channel overlap
  • Overcapacity
  • Distance limitations
  • Frequency mismatch
  • Wrong SSID
  • Wrong passphrase
  • Security type mismatch
  • Power levels
  • Signal-to-noise ratio

✓ 5.5 Given a scenario, troubleshoot common network service issues

  • Names not resolving
  • Incorrect gateway
  • Incorrect netmask
  • Duplicate IP addresses
  • Duplicate MAC addresses
  • Expired IP address
  • Rogue DHCP server
  • Untrusted SSL certificate
  • Incorrect time
  • Exhausted DHCP scope
  • Blocked TCP/UDP ports
  • Incorrect host-based firewall settings
  • Incorrect ACL settings
  • Unresponsive service
  • Hardware failure

There is no way around it. Troubleshooting computers and networks is a combination of art and science, and the only way to get really good at it is by doing it—a lot! So it’s practice, practice, and practice with the basic yet vitally important skills you’ll attain in this chapter. Of course, I’m going to cover all the troubleshooting topics you’ll need to sail through the Network+ exam, but I’m also going to add some juicy bits of knowledge that will really help you to tackle the task of troubleshooting successfully in the real world.

First, you’ll learn to check quickly for problems in the “super simple stuff” category, and then we’ll move into a hearty discussion about a common troubleshooting model that you can use like a checklist to go through and solve a surprising number of network problems. We’ll finish the chapter with a good briefing about some common troubleshooting resources, tools, tips, and tricks to keep up your sleeve and equip you even further.

I won’t be covering any new networking information in this chapter because you’ve gotten all the foundational background material you need for troubleshooting in the previous chapters. But no worries. I’ll go through each of the issues described in this chapter’s objectives, one at a time, in detail, so that even if you’ve still got a bit of that previous material to nail down yet, you’ll be good to get going and fix some networks anyway.

To find Todd Lammle CompTIA videos and practice questions, please see www.lammle.com/network+.

Narrowing Down the Problem

When initially faced with a network problem in its entirety, it’s easy to get totally overwhelmed. That’s why it’s a great strategy to start by narrowing things down to the source of the problem. To help you achieve that goal, it’s always wise to ask the right questions. You can begin doing just that with this list of questions to ask yourself:

  • Did you check the super simple stuff (SSS)?
  • Is hardware or software causing the problem?
  • Is it a workstation or server problem?
  • Which segments of the network are affected?
  • Are there any cabling issues?

Did You Check the Super Simple Stuff?

Yes—it sounds like a snake’s hiss (appropriate for a problem, right?), but exactly what’s on the SSS list that you should be checking first, and why? Well, as the saying goes, “All things being equal, the simplest explanation is probably the correct one,” so you probably won’t be stunned and amazed when I tell you that I’ve had people call me in and act like the sky is falling when all they needed to do was check to make sure their workstation was plugged in or powered on. (I didn’t say “super simple stuff” for nothing!) Your SSS list really does include things that are this obvious—sometimes so obvious no one thinks to check for them. Even though anyone experienced in networking has their own favorite “DUH” events to tell about, almost everyone can agree on a few things that should definitely be on the SSS list:

  • Check to verify login procedures and rights.
  • Look for link lights and collision lights.
  • Check all power switches, cords, and adapters.
  • Look for user errors.

The Correct Login Procedure and Rights

You know by now that if you’ve set up everything correctly, your network’s users absolutely have to follow the proper login procedure to the letter (or number, or symbol) in order to successfully gain access to the network resources they’re after. If they don’t do that, they will be denied access, and considering that there are truly tons of opportunities to blow it, it’s a miracle, or at least very special, that anyone manages to log in to the network correctly at all.

Think about it. First, a user must enter their username and password flawlessly. Sounds easy, but as they say, “in a perfect world . . .” In this one, people mess up, don’t realize it, and freak out at you about the “broken network” or the imaginary IT demon that changed their password on them while they went to lunch and now they can’t log in. (The latter could be true—you may have done exactly that. If you did, just gently remind them about that memo you sent about the upcoming password-change date and time that they must have spaced about due to the tremendous demands on them.)

Anyway, it’s true. By far, the most common problem is bad typing—people accidentally enter the wrong username or password, and they do that a lot. With some operating systems, a slight brush of the Caps Lock key is all it takes: The user’s username and password are case sensitive, and suddenly, they’re trying to log in with what’s now all in uppercase instead—oops.

Plus, if you happen to be running one of the shiny new operating systems around today, you can also restrict the times and conditions under which users can log in, right? So, if your user spent an unusual amount of time in the bathroom upon returning from lunch, or if they got distracted and tried to log in from their BFF’s workstation instead of their own, the network’s operating system would’ve rejected their login request even though they still can type impressively well after two martinis.

And remember—you can also restrict how many times a user can log in to the network simultaneously. If you’ve set that up, and your user tries to establish more connections than you’ve allowed, access will again be denied. Just know that most of the time, if a user is denied access to the network and/or its resources, they’re probably going to interpret that as a network problem even though the network operating system is doing what it should.

Real World Scenario

Can the Problem Be Reproduced?

The first question to ask anyone who reports a network or computer problem is, “Can you show me what ‘not working’ looks like?” This is because if you can reproduce the problem, you can identify when it happens, which may give you all the information you need to determine the source of the problem and maybe even solve it in a snap. The hardest problems to solve are those of the random variety that occur intermittently and can’t be easily reproduced.

Let’s pause for a minute to outline the steps to take during any user-oriented network problem-solving process:

  1. Make sure the username and password is being entered correctly.
  2. Check that Caps Lock key.
  3. Try to log in yourself from another workstation, assuming that doing this doesn’t violate the security policy. If it works, go back to the user-oriented login problems, and go through them again.
  4. If none of this solves the problem, check the network documentation to find out whether any of the aforementioned kinds of restrictions are in place; if so, find out whether the user has violated any of them.

Remember, if intruder detection is enabled on your network, a user will get locked out of their account after a specific number of unsuccessful login attempts. If this happens, either they’ll have to wait until a predetermined time period has elapsed before their account will unlock and give them another chance or you’ll have to go in and manually unlock it for them.

Network Connection LED Status Indicators

The link light is that little light-emitting diode (LED) found on both the network interface card (NIC) and the switch. It’s typically green and labeled Link or some abbreviation of that. If you’re running 100BaseT, a link light indicates that the NIC and switch are making a logical (Data Link layer) connection. If the link lights are lit up on both the workstation’s NIC and the switch port to which the workstation is connected, it’s usually safe to assume that the workstation and switch are communicating just fine.

The link lights on some NICs don’t activate until the driver is loaded. So, if the link light isn’t on when the system is first turned on, you’ll just have to wait until the operating system loads the NIC driver. But don’t wait forever!

The collision light is also a small LED, but it’s typically amber in color, and it can usually be found on both Ethernet NICs and hubs. When lit, it indicates that an Ethernet collision has occurred. If you’ve got a busy Ethernet network on which collisions are somewhat common, understand that this light is likely to blink occasionally; if it stays on continuously, though, it could mean that there are way too many collisions happening for legitimate network traffic to get through. Don’t assume this is really what’s happening without first checking that the NIC, or other network device, is working properly because one or both could simply be malfunctioning.

Don’t confuse the collision light with the network-activity or network-traffic light (which is usually green) because the latter just indicates that a device is transmitting. This particular light should be blinking on and off continually as the device transmits and receives data on the network.

The Power Switch

Clearly, to function properly, all computer and network components must be turned on and powered up first. Obvious, yes, but if I had a buck for each time I’ve heard, “My computer is on, but my monitor is all dark,” I’d be rolling in money by now.

When this kind of thing happens, just keep your cool and politely ask, “Is the monitor turned on?” After a little pause, the person calling for help will usually say, “Ohhh . . . ummmm . . . thanks,” and then hang up ASAP. The reason I said to be nice is that, embarrassing as it is, this, or something like it, will probably happen to you, too, eventually.

Most systems include a power indicator (a Power or PWR light). The power switch typically has an On indicator, but the system or device could still be powerless if all the relevant power cables aren’t actually plugged in—including the power strip.

Remember that every cable has two ends, and both must be plugged into something. If you’re thinking something like, “Sheesh—a four-year-old knows that,” you’re probably right. But again, I can’t count the times this has turned out to be the root cause of a “major system failure.”

The best way to go about troubleshooting power problems is to start with the most obvious device and work your way back to the power-service panel. There could be a number of power issues between the device and the service panel, including a bad power cable, bad outlet, bad electrical wire, tripped circuit breaker, or blown fuse, and any of these things could be the actual cause of the problem that appears to be device-death instead.

Operator Error

Or, the problem may be that you’ve got a user who simply doesn’t know how to be one. Maybe you’re dealing with someone who doesn’t have the tiniest clue about the equipment they’re using or about how to perform a certain task correctly—in other words, the problem may be due to something known as operator error (OE). Here’s a short list of the most common types of OEs and their associated acronyms:

  • Equipment exceeds operator capability (EEOC)
  • Problem exists between chair and keyboard (PEBCAK)
  • ID Ten T error (an ID10T)

A word of caution here, though—assuming that all your problems are user related can quickly make an ID10T error out of you.

Although it can be really tempting to take the easy way out and blow things off, remember that the network’s well-being and security are ultimately your responsibility. So, before you jump to the operator-error conclusion, ask the user in question to reproduce the problem in your presence, and pay close attention to what they do. Understand that doing this can require a great deal of patience, but it’s worth your time and effort if you can prevent someone who doesn’t know what they’re doing from causing serious harm to pricey devices or leaving a gaping hole in your security. You might even save the help desk crew’s sanity from the relentless calls of a user with the bad habit of flipping off the power switch without following proper shutdown procedures. You just wouldn’t know they always do that if you didn’t see it for yourself, right?

And what about finding out that that pesky user was, in fact, trained really badly by someone and that they aren’t the only one? This is exactly the kind of thing that can turn the best security policy to dust and leave your network and its resources as vulnerable to attack as that goat in Jurassic Park.

The moral here is, always check out the problem thoroughly. If the problem and its solution aren’t immediately clear to you, try the procedure yourself, or ask someone else at another workstation to do so. Don’t just leave the issue unsettled or make the assumption that it is user error or a chance abnormality because that’s exactly what the bad guys out there are hoping you’ll do.

This is only a partial list of super simple stuff. No worries. Rest assured you’ll come up with your own expanded version over time.

Is Hardware or Software Causing the Problem?

A hardware problem often rears its ugly head when some device in your computer skips a beat and/or dies. This one’s pretty easy to discern because when you try to do something requiring that particular piece of hardware, you can’t do it and instead get an error telling you that you can’t do it. Even if your hard disk fails, you’ll probably get warning signs before it actually kicks, like a Disk I/O error or something similar.

Other problems drop out of the sky and hit you like something from the wrong end of a seagull. No warning at all—just splat! Components that were humming along fine a second ago can and do suddenly fail, usually at the worst possible time, leaving you with a mess of lost data, files, everything—you get the idea.

Solutions to hardware problems usually involve one of three things:

  • Changing hardware settings
  • Updating device drivers
  • Replacing dead hardware

If your hardware has truly failed, it’s time to get out your tools and start replacing components. If this isn’t one of your skills, you can either send the device out for repair or replace it. Your mantra here is “back up, back up, back up,” because in either case, a system could be down for a while—anywhere from an hour to several days—so it’s always good to keep backup hardware around. And I know everyone and your momma has told you this, but here it is one more time: Back up all data, files, hard drive, everything, and do so on a regular basis.

Software problems are muddier waters. Sometimes you’ll get General Protection Fault messages, which indicate a Windows or Windows program (or other platform) error of some type, and other times the program you’re working in will suddenly stop responding and hang. At their worst, they’ll cause your machine to randomly lock up on you. When this type of thing happens, I’d recommend visiting the manufacturer’s support website to get software updates and patches or searching for the answer in a knowledge base.

Sometimes you get lucky and the ailing software will tell the truth by giving you a precise message about the source of the problem. Messages saying the software is missing a file or a file has become corrupt are great because you can usually get your problem fixed fast by providing that missing file or by reinstalling the software. Neither solution takes very long, but the downside is that whatever you were doing before the program hosed will probably be at least partially lost; so again, back up your stuff, and save your data often.

Please reread Chapter 17, “Troubleshooting Tools,” and Chapter 18, “Software and Hardware Tools,” and use the software and hardware tools discussed in those two chapters to help you troubleshoot network problems.

It’s time for you to learn how to troubleshoot your workstations and servers.

Is It a Workstation or a Server Problem?

The first thing you’ve got to determine when troubleshooting this kind of problem is whether it’s only one person or a whole group that’s been affected. If the answer is only one person (think, a single workstation), solving the issue will be pretty straightforward. More than that, and your problem probably involves a chunk of the network, like a segment. A clue that the source of your grief is the latter case is if there’s a whole bunch of users complaining that they can’t discover neighboring devices/nodes.

So either way, what do you do about it? Well, if it’s the single-user situation, your first line of defense is to try to log in from another workstation within the same group of users. If you can do that, the problem is definitely the user’s workstation, so look for things like cabling faults, a bad NIC, power issues, and OSs.

But if a whole department can’t access a specific server, take a good, hard look at that particular server, and start by checking all user connections to it. If everyone is logged in correctly, the problem may have something to do with individual rights or permissions. If no one can log in to that server, including you, the server probably has a communication problem with the rest of the network. And if the server has totally crashed, either you’ll see messages telling you all about it on the server’s monitor or you’ll find its screen completely blank—screaming indicators that the server is no longer running. And keep in mind that these symptoms do vary among network operating systems.

Which Segments of the Network Are Affected?

Figuring this one out can be a little tough. If multiple segments are affected, you may be dealing with a network-address conflict. If you’re running Transmission Control Protocol/Internet Protocol (TCP/IP), remember that IP addresses must be unique across an entire network. So, if two of your segments have the same static IP subnet addresses assigned, you’ll end up with duplicate IP errors—an ugly situation that can be a real bear to troubleshoot and can make it tough to find the source of the problem.

If all of your network’s users are experiencing the problem, it could be a server everyone accesses. Thank the powers that be if you nail it down to that because if not, other network devices like your main router or hub may be down, making network transmissions impossible and usually meaning a lot more work on your part to fix.

Adding wide area network (WAN) connections to the mix can complicate matters exponentially, and you don’t want to go there if you can avoid it, so start by finding out if stations on both sides of a WAN link can communicate. If so, get the champagne—your problem isn’t related to the WAN—woo hoo! But if those stations can’t communicate, it’s not a happy thing: You’ve got to check everything between the sending station and the receiving one, including the WAN hardware, to find the culprit. The good news is that most of the time, WAN devices have built-in diagnostics that tell you whether a WAN link is working okay, which really helps you determine if the failure has something to do with the WAN link itself or with the hardware involved instead.

Is It Bad Cabling?

Back to hooking up correctly . . . Once you’ve figured out whether your plight is related to one workstation, a network segment, or the whole tamale (network), you must then examine the relevant cabling. Are the cables properly connected to the correct port? More than once, I’ve seen a Digital Subscriber Line (DSL) modem connection to the wall cabled all wrong—it’s an easy mistake to make and an easy one to fix.

And you know that nothing lasts forever, so check those patch cables running between a workstation and a wall jack. Just because they don’t come with expiration dates written on them doesn’t mean they don’t expire. They do go bad—especially if they get moved, trampled on, or tripped over a lot. (I did tell you that it’s a bad idea to run cabling across the office floor, didn’t I?) Connection problems are the tell here—if you check the NIC and there is no link light blinking, you may have a bad patch cable to blame.

It gets murkier if your cable in the walls or ceiling is toast or hasn’t been installed correctly. Maybe you’ve got a user or two telling you the place is haunted because they only have problems with their workstations after dark when the lights go on. Haunted? No . . . some genius probably ran a network cable over a fluorescent light, which is something that just happens to produce lots of electromagnetic interference (EMI), which can really mess up communications in that cable.

Next on your list is to check the medium dependent interface/medium dependent interface-crossover (MDI/MDI-X) port setting on small, workgroup hubs and switches. This is a potential source of trouble that’s often overlooked, but it’s important because this port is the one that’s used to uplink to a switch on the network’s backbone.

First, understand that the port setting has to be set to either MDI or MDI-X depending on the type of cable used for your hub-to-hub or switch-to-switch connection. For instance, the crossover cables I talked about way back in Chapter 3, “Networking Topologies, Connectors, and Wiring Standards,” require that the port be set to MDI, and a standard network patch cable requires that the port be set to MDI-X. You can usually adjust the setting via a regular switch or a dual inline package (DIP) switch, but to be sure, if you’re still using hubs, check out the hub’s documentation. (You did keep that, right?)

Other Important Cable Issues You Need to Know About

They may be basic, but they’re still vital—an understanding of the physical issues that can happen on a network when a user is connected via cable (usually Ethernet) is critical information to have in your troubleshooting repertoire.

Because many of today’s networks still consist of large amounts of copper cable, they suffer from the same physical issues that have plagued networking since the very beginning. Newer technologies and protocols have helped to a degree, but they haven’t made these issues a thing of the past yet. Some physical issues that still affect networks are listed and defined next:

Incorrect Pinout/TX/RX Reverse/Damaged Cable The first things to check when working on cabling are the cable connectors to make sure they haven’t gone bad. After that, look to make sure the wiring is correct on both ends by physically checking the cable pinouts. Important to remember is that if you have two switches, you need a crossover cable where you cross pins 1 and 2 with 3 and 6. On the other hand, if you have a PC going into a switch, you need a straight-through cable where pins 1 and 2 correspondingly connect to pins 1 and 2 on each side—the same with 3 and 6. Finally, make sure the termination pins on both ends are the correct type for the kind of cable you’re using.

Bad Port In some cases, the issue is not the cable but the port into which the cable is connected. On many devices ports have LEDs that can alert you to a bad port. For example, a Cisco router or switch will have an LED for each port and the color of the LED will indicate its current state. In most cases, a lack of any light whatsoever indicates an issue with the port. Loopback plugs can be used to test the functionality of a port. These devices send a signal out and then back in the port to test it.

Transceiver Mismatch Interfaces that send and receive are called transceivers. When a NIC is connected to a port, the two transceivers must have certain settings the same or issues will occur. These settings are the duplex and the speed setting. If the speed settings do not match, there will be no communication. If the duplex settings are incorrect, there may be functionality but the performance will be poor.

Crosstalk Again, looking back to Chapter 3, remember that crosstalk is what happens when there’s signal bleed between two adjacent wires that are carrying a current. Network designers minimize crosstalk inside network cables by twisting the wire pairs together, putting them at a 90-degree angle to each other. The tighter the wires are twisted, the less crosstalk you have, and newer cables like Cat 6 cable really make a difference. But like I said, not completely—crosstalk still exists and affects communications, especially in high-speed networks. This is often caused by using the wrong category of cable or by mismatching the member of one pair with a member of another when terminating a cable.

Near-End/Far-End Crosstalk Near-end crosstalk is a specific type of crosstalk measurement that has to do with the EMI bled from a wire to adjoining wires where the current originates. This particular point has the strongest potential to create crosstalk issues because the crosstalk signal itself degrades as it moves down the wire. If you have a problem with it, it’s probably going to show up in the first part of the wire where it’s connected to a switch or a NIC. Far-end crosstalk is the interference between two pairs of a cable measured at the far end of the cable with respect to the interfering transmitter.

This condition is often caused by improperly terminating a cable. For example, it’s important to maintain the twist right up to the punch-down or crimp connector. In the case of crimp connectors, it’s critical to select the correct grade of connector even though one grade may look identical to another.

Attenuation/Db Loss/Distance Limitation As a signal moves through any medium, the medium itself will degrade the signal—a phenomenon known as attenuation that’s common in all kinds of networks. True, signals traversing fiber-optic cable don’t attenuate as fast as those on copper cable, but they still do eventually. You know that all copper twisted-pair cables have a maximum segment distance of 100 meters before they’ll need to be amplified, or repeated, by a hub or a switch, but single-mode fiber-optic cables can sometimes carry signals for miles before they begin to attenuate (degrade). If you need to go far, use fiber, not copper. Although there is attenuation/Db loss in fiber, it can go much farther distances than copper cabling can before being affected by attenuation.

Latency Latency is the delay typically incurred in the processing of network data. A low-latency network connection is one that generally experiences short delay times, while a high-latency connection generally suffers from long delays. Many security solutions may negatively affect latency. For example, routers take a certain amount of time to process and forward any communication. Configuring additional rules on a router generally increases latency, thereby resulting in longer delays. An organization may decide not to deploy certain security solutions because of the negative effects they will have on network latency.

Auditing is a great example of a security solution that affects latency and performance. When auditing is configured, it records certain actions as they occur. The recording of these actions may affect the latency and performance.

Jitter Jitter occurs when the data flow in a connection is not consistent; that is, it increases and decreases in no discernable pattern. Jitter results from network congestion, timing drift, and route changes. Jitter is especially problematic in real-time communications like IP telephony and videoconferencing.

Collisions A network collision happens when two devices try to communicate on the same physical segment at the same time. Collisions like this were a big problem in the early Ethernet networks, and a tool known as Carrier Sense Multiple Access with Collision Detection (CSMA/CD) was used to detect and respond to them in Ethernet_II. Nowadays, we use switches in place of hubs because they can separate the network into multiple collision domains, learn the Media Access Control (MAC) addresses of the devices attached to them, create a type of permanent virtual circuit between all network devices, and prevent collisions.

Shorts Basically, a short circuit, or short, happens when the current flows through a different path within a circuit than it’s supposed to; in networks, they’re usually caused by some type of physical fault in the cable. You can find shorts with circuit-testing equipment, but because sooner is better when it comes to getting a network back up and running, replacing the ailing cable until it can be fixed (if it can be) is your best option.

Open Impedance Mismatch (echo) Open impedance on cable-testing equipment tells you that the cable or wires connect into another cable and there is an impedance mismatch. When that happens, some of the signal will bounce back in the direction it came from, degrading the strength of the signal, which ultimately causes the link to fail.

Interference/Cable Placement EMI and radio frequency interference (RFI) occur when signals interfere with the normal operation of electronic circuits. Computers happen to be really sensitive to sources of this, such as TV and radio transmitters, which create a specific radio frequency as part of their transmission process. Two other common culprits are two-way radios and cellular phones.

Your only way around this is to use shielded network cables like shielded twisted-pair (STP) and coaxial cable (rare today) or to run EMI/RFI-immune but pricey fiber-optic cable throughout your entire network.

Split Pairs A split pair is a wiring error where two connections that are supposed to be connected using the two wires of a twisted pair are instead connected using two wires from different pairs. Such wiring causes errors in high-rate data lines. If you buy your cables precut, you won’t have this problem.

TX/RX Reverse When connecting from a PC-type device into a switch, for the PC use pins 1 and 2 to transmit and 3 and 6 for receiving a digital signal. This means that the pins must be reversed on the switch, using pins 1 and 2 for receiving and 3 and 6 for transmitting the digital signal. If your connection isn’t working, check the cable end pinouts.

Bent Pins Many of the connectors you will encounter have small pins on the end that must go into specific holes on the interface to which they plug. If these pins get bent, either they won’t go into the correct hole or they won’t go into a hole period. When this occurs, the cable either will not work at all or will not work correctly. Taking care not to bend these fragile pins when working with these cable types will prevent this issue from occurring.

Bottlenecks Bottlenecks are areas of the network where the physical infrastructure is not capable of handling the traffic. In some cases, this is a temporary issue caused by an unusual burst of traffic. In other scenarios, it is a wake-up call to upgrade the infrastructure or reorganize the network to alleviate the bottleneck. The obvious result of a bottleneck is poor performance.

Fiber Cable Issues

Fiber is definitely the best kind of wiring to use for long-distance runs because it has the least attenuation at long distances compared to copper. The bad news is that it’s also the hardest to troubleshoot. Here are some common fiber issues to be aware of.

SFP/GBIC (Cable Mismatch) The small form-factor pluggable (SFP) is a compact, hot-pluggable transceiver used for networking and other types of equipment. It interfaces a network device motherboard for a switch, router, media converter, or similar device to a fiber-optic or copper networking cable. Due to its smaller size, SFP obsolesces the formerly ubiquitous gigabit interface converter (GBIC), so SFP is sometimes referred to as a mini-GBIC. Always make sure you have the right cable for each type of connector type and that they are not mismatched.

Bad SFP/GBIC (Cable or Transceiver) If your link is down, verify that your cable or transceiver hasn’t gone bad. I covered the pricey equipment you need to get this done in Chapter 18. It can provide exact locations the problem has originated.

Wavelength Mismatch One of the more confusing terms used regarding fiber networks is wavelength. Though it sounds very complicated and scientific, it’s actually just the term used to define what we think of as the color of light. Wavelength mismatch occurs when two different fiber transmitters at each end of the cable are using either a longer or shorter wavelength. This means you’ve got to make sure your transmitters match on both ends of the cable.

Fiber Type Mismatch Fiber type mismatches, at each of the transceivers, can cause wavelength issues, massive attenuation, and Db loss.

Dirty Connectors It’s important to verify your connectors to make sure no dirt or dust has corrupted the cable end. You need to polish your cable ends with a soft cloth, but do not look into the cable if the other end is transmitting—it could damage your eyes!

Connector Mismatch Just because it fits doesn’t mean it works. Make sure you have precisely the right connectors for each type of cable end or transceiver.

Bend Radius Limitations Fiber, whether it is made of glass or plastic, can break. You need to make sure you understand the bend radius limitations of each type of fiber you purchase and that you don’t exceed the specifications when installing fiber in your rack.

Distance Limitations The pros of fiber are that it’s completely immune to EMI and RFI, and that it can transmit up to 40 kilometers—about 25 miles! Add some repeater stations and you can go between continents. But all fiber types aren’t created equally. For example, single mode can perform at much greater distances than multi-mode can. And again, make sure you have the right cable for the distance you’ll require to run your fiber!

Unbounded Media Issues (Wireless)

Now let’s say your problem-ridden user is telling you they only use a wireless connection. Well, you can definitely take crosstalk and shorts off the list of suspects, but don’t get excited, because with wireless, you’ve got a whole new bunch of possible Physical layer problems to sort through.

Wireless networks are really convenient for the user but not so much for administrators. They can require a lot more configuration, and understand that with wireless networks, you don’t just get to substitute one set of challenges for another—you pretty much add all those fresh new issues on top of the wired challenges you already have on your plate.

The following list includes some of those new wireless challenges:

Interference Because wireless networks rely on radio waves to transmit signals, they’re more subject to interference, even from other wireless devices like Bluetooth keyboards, mice, or cell phones that are all close in frequency ranges. Any of these—even microwave ovens!—can cause signal bleed that can slow down or prevent wireless communications. Factors like the distance between a client and a wireless access point (WAP) and the stuff between the two can also affect signal strength and even intensify the interference from other signals. So, careful placement of that WAP is a must.

Device Saturation/Bandwidth Saturation Clearly it’s important to design and implement your wireless network correctly. Be sure to understand the number of hosts that will be connecting to each AP that you’ll be installing. If you have too much device saturation on an AP, it will result in low available bandwidth. Just think about when you’re in a hotel and how slow the wireless is. This is directly due to device/bandwidth saturation for each AP. And more APs don’t always solve the problem—you need to design correctly!

Simultaneous Wired/Wireless Connections It’s not unusual to find that a laptop today will have both a wired and wireless connection at the same time. Typically this doesn’t create a problem, but don’t think you get more bandwidth or better results because of it. It’s possible that the configurations can cause a problem, although that’s rare today. For instance, if each provides a DNS server with a different address, it can cause name resolution issues, or even default gateway issues. Most of the time, it just causes confusion in your laptop, which will make it work harder to determine the correct DNS or default gateway address to use. And it’s possible for the laptop to give up and stop communicating completely! Because of this, you need to remind the user to turn off their wireless when they take it into their office and connect it to their dock.

Configurations Mistakes in the configuration of the wireless access point or wireless router or inconsistencies between the settings on the AP and the stations can also be the source of problems. The following list describes some of the main sources of configuration problems.

  • Incorrect Encryption/Security Type Mismatch You know that wireless networks can use encryption to secure their communications and that different encryption flavors are used for wireless networks, like Wired Equivalent Privacy (WEP) and Wi-Fi Protected Access 2 (WPA2) with Advanced Encryption Standard (AES). To ensure the tightest security, configure your wireless networks with the highest encryption protocol that both the WAP and the clients can support. Oh, and make sure the AP and its clients are configured with the same type of encryption. This is why it’s a good idea to disable security before troubleshooting client problems, because if the client can connect once you’ve done that, you know you’re dealing with a security configuration error.

    Incorrect, Overlapping, or Mismatched Channels Wireless networks use many different frequencies within the 2.4 GHz or 5 GHz band, and I’ll bet you didn’t know that these frequencies are sometimes combined to provide greater bandwidth for the user. You actually do know about this—has anyone heard of something called a channel? Well, that’s exactly what a channel is, and it’s also the reason some radio stations come in better than others—they have more bandwidth because their channel has more combined frequencies. You also know what happens when the AP and the client aren’t quite matching up. Have you ever hit the scan button on your car’s radio and only kind of gotten a station’s static-ridden broadcast? That’s because the AP (radio station) and the client (your car’s radio) aren’t quite on the same channel. Most of the time, wireless networks use channel 1, 6, or 11, and because clients auto-configure themselves to any channel the AP is broadcasting on, it’s not usually a configuration issue unless someone has forced a client onto an incorrect channel. Also, be sure not to use the same channel on APs within the same area. Overlapping channels cause your signal-to-noise ratio to drop because you’ll get a ton of interference and signal loss!

    Incorrect Frequency/Incompatibilities So, setting the channel sets the frequency or frequencies that wireless devices will use. But some devices, such as an AP running 802.11g/n, allow you to tweak those settings and choose a specific frequency such as 2.4 GHz or 5 GHz. As with any relationship, it works best if things are mutual. So if you do this on one device, you’ve got to configure the same setting on all the devices with which you want to communicate, or they won’t—they’ll argue, and you don’t want that. Incorrect-channel and frequency-setting problems on a client are rare, but if you have multiple APs and they’re in close proximity, you need to make sure they’re on different channels/frequencies to avoid potential interference problems.

    Wrong Passphrase When a passphrase is used as an authentication method, the correct passphrase must be entered when authenticating to the AP or to the controller. When an incorrect passphrase is provided, access will be denied. This is another issue that will impair functionality.

    SSID Mismatch When a wireless device comes up, it scans for service set identifiers (SSIDs) in its immediate area. These can be basic service set identifiers (BSSIDs) that identify an individual access point or extended service set identifiers (ESSIDs) that identify a set of APs. In your own wireless LAN, you clearly want the devices to find the ESSID that you’re broadcasting, which isn’t usually a problem: Your broadcast is closer than the neighbor’s, so it should be stronger—unless you’re in an office building or apartment complex that has lots of different APs assigned to lots of different ESSIDs because they belong to lots of different tenants in the building. This can definitely give you some grief because it’s possible that your neighbor’s ESSID broadcast is stronger than yours, depending on where the clients are in the building. So if a user reports that they’re connected to an AP but still can’t access the resources they need or authenticate to the network, you should verify that they are, in fact, connected to your ESSID and not your neighbor’s. This is very typical in an open security wireless network. You can generally just look at the information tool tip on the wireless software icon to find this out. However, you can easily solve this problem today by making the office SSID the preferred network in the client software.

    Wireless Standard Mismatch As you found out in Chapter 12, “Wireless Networking,” wireless networks have many standards that have evolved over time, like 802.11a, 802.11b, 802.11g, and 802.11n. Standards continue to develop that make wireless networks even faster and more powerful. The catch is that some of these standards are backward compatible and others aren’t. For instance, most devices you buy today can be set to 802.11a/b/g/n, which means they can be used to communicate with other devices of all four standards. So, make sure the standards on the AP match the standards on the client, or that they’re at least backward compatible. It’s either that or tell all your users to buy new cards for their machines. Be sure to understand the throughput, frequency, distance capabilities, and available channels for each standard you use.

    Untested Updates It’s really important to push updates to the APs in your wireless network, but not before you test them. Just like waiting for an update from Microsoft or Apple to become available for weeks or months before you update, you need to wait for the OS or patch updates for your AP. Then, you need to test the updates thoroughly on your bench before pushing them to your live network.

Distance/Signal Strength/Power Levels Location, location, location. You’ve got only two worries with this one: Your clients are either not far enough away or too far from the AP. If your AP doesn’t seem to have enough power to provide a connectivity point for your clients, you can move it closer to them, increase the distance that the AP can transmit by changing the type of antenna it uses, or use multiple APs connected to the same switch or set of switches to solve the problem. If the power level or signal is too strong, and it reaches out into the parking area or farther out to other buildings and businesses, place the AP as close as possible to the center of the area it’s providing service for. And don’t forget to verify that you’ve got the latest security features in place to keep bad guys from authenticating to and using your network.

Latency and Overcapacity When wireless users complain that the network is slow (latency) or that they are losing their connection to applications during a session, it is usually a capacity or distance issue. Remember, 802.11 is a shared medium, and as more users connect, all user throughput goes down. If this becomes a constant problem as opposed to the occasional issue where 20 guys with laptops gather for a meeting every six months in the conference room, it may be time to consider placing a second AP in the area. When you do this, place the second AP on a different non-overlapping channel from the first and make sure the second AP uses the same SSID as the first. In the 2.4 GHz frequency (802.11b and 802.11g), the three non-overlapping channels are 1, 6, and 11. Now the traffic can be divided between them and users will get better performance. It is also worth noting that when clients move away from the AP, the data rate drops until at some point it is insufficient to maintain the connection.

Bounce For a wireless network spanning large geographical distances, you can install repeaters and reflectors to bounce a signal and boost it to cover about a mile. This can be a good thing, but if you don’t tightly control signal bounce, you could end up with a much bigger network than you wanted. To determine exactly how far and wide the signal will bounce, make sure you conduct a thorough wireless site survey. However, bounce can also refer to multipath issues, where the signal reflects off objects and arrives at the client degraded because it is arriving out of phase. The solution is pretty simple. APs use two antennas, both of which sample the signal and use the strongest signal and ignore the out-of-phase signal. However, 802.11n takes advantage of multipath and can combine the out-of-phase signals to increase the distance hosts can be from the AP.

Incorrect Antenna Type or Switch Placement Most of the time, the best place to put an AP and/or its antenna is as close to the center of your wireless network as possible. But you can position some antennas a distance from the AP and connect to it with a cable—a method used for a lot of the outdoor installations around today. If you want to use multiple APs, you’ve also got to be a little more sophisticated about deciding where to put them all; you can use third-party tools like the packet sniffers Wireshark and AirMagnet on a laptop to survey the site and establish how far your APs are actually transmitting. You can also hire a consultant to do this for you—there are many companies that specialize in assisting organizations with their wireless networks and the placement of antennas and APs. This is important because poor placement can lead to interference and poor performance, or even no performance at all.

Environmental Factors It’s vital to understand your environmental factors when designing and deploying your wireless network. Do you have concrete walls, window film, or metal studs in the walls? All of these will cause a degradation of Db or power level and result in connectivity issues. Again—plan your wireless network carefully!

Reflection When a wave hits a smooth object that is larger than the wave itself, depending on the media the wave may bounce in another direction. This behavior is categorized as reflection.

Reflection can be the cause of serious performance problems in a WLAN. As a wave radiates from an antenna, it broadens and disperses. If portions of this wave are reflected, new wave fronts will appear from the reflection points. If these multiple waves all reach the receiver, the multiple reflected signals cause an effect called multipath.

Multipath can degrade the strength and quality of the received signal or even cause data corruption or canceled signals. APs mitigate this behavior by using multiple antennas and constantly sampling the signal to avoid a degraded signal.

Refraction Refraction is the bending of an RF signal as it passes through a medium with a different density, thus causing the direction of the wave to change. RF refraction most commonly occurs as a result of atmospheric conditions.

In long-distance outdoor wireless bridge links, refraction can be an issue. An RF signal may also refract through certain types of glass and other materials that are found in an indoor environment.

Absorption Some materials will absorb a signal and reduce its strength. While there is not much that can be done about this, this behavior should be noted during a site survey, and measures such as additional APs or additional antenna types may be called for.

Signal-to-Noise Ratio Signal-to-noise ratio (SNR) is the difference in decibels between the received signal and the background noise level (noise floor).

If the amplitude of the noise floor is too close to the amplitude of the received signal, data corruption will occur and result in Layer 2 retransmissions, negatively affecting both throughput and latency. An SNR of 25 dB or greater is considered good signal quality, and an SNR of 10 dB or lower is considered very poor signal quality.

Now that you know all about the possible physical network horrors that can befall you on a typical network, it’s a good time for you to memorize the troubleshooting steps that you’ve got to know to ace the CompTIA Network+ exam.

Troubleshooting Steps

In the Network+ troubleshooting model, there are seven steps you’ve got to have dialed in:

  1. Identify the problem.
  2. Establish a theory of probable cause.
  3. Test the theory to determine cause.
  4. Establish a plan of action to resolve the problem and identify potential effects.
  5. Implement the solution or escalate as necessary.
  6. Verify full system functionality, and if applicable, implement preventative measures.
  7. Document findings, actions, and outcomes.

To get things off to a running start, let’s assume that the user has called you yet again, but now they’re almost in tears because they can’t connect to the server on the intranet and they also can’t get to the Internet. (By the way, this happens a lot, so pay attention—it’s only a matter of time before it happens to you!)

Absolutely, positively make sure you memorize this seven-step troubleshooting process in the right order when studying for the Network+ exam!

Step 1: Identify the Problem

Before you can solve the problem, you’ve got to figure out what it is, right? Again, asking the right questions can get you far along this path and really help clarify the situation. Identifying the problem involves steps that together constitute information gathering.

Gather Information by Questioning Users

A good way to start is by asking the user the following questions:

  • Exactly which part of the Internet can’t you access? A particular website? A certain address? A type of website? None of it at all?
  • Can you use your web browser?
  • Is it possible to duplicate the problem?
  • If the hitch has to do with an internal server to the company, ask the user if they can ping the server and talk them through doing that.
  • Ask the user to try to telnet or FTP to an internal server to verify local network connectivity; if they don’t know how, talk them through it.
  • If there are multiple complaints of problems occurring, look for the big stuff first, then isolate and approach each problem individually.

Here’s another really common trouble ticket that just happens to build on the last scenario: Now let’s say you’ve got a user who’s called you at the help desk. By asking the previous questions, you found out that this user can’t access the corporate intranet or get out to any sites on the Internet. You also established that the user can use their web browser to access the corporate FTP site, but only by IP address, not by the FTP server name. This information tells you two important things: that you can rule out the host and the web browser (application) as the source of the problem and that the physical network is working.

Duplicate the Problem, If Possible

When a user reports an issue, you should attempt to duplicate the issue. When this is possible, it will aid in discovering the problem. When you cannot duplicate the issue, your challenge becomes harder because you are dealing with an intermittent problem. These issues are difficult to solve because they don’t happen consistently.

Determine If Anything Has Changed

Moving right along, if you can reproduce the problem, your next step is to verify what has changed and how. Drawing on your knowledge of networking, you ask yourself and your user questions like these:

Were you ever able to do this? If not, then maybe it just isn’t something the hardware or software is designed to do. You should then tell the user exactly that, as well as advise them that they may need additional hardware or software to pull off what they’re trying do.

If so, when did you become unable to do it? If, once upon a time, the computer was able to do the job and then suddenly could not, whatever conditions surrounded and were involved in this turn of events become extremely important. You have a really good shot at unearthing the root of the problem if you know what happened right before things changed. Just know that there’s a high level of probability that the cause of the problem is directly related to the conditions surrounding the change when it occurred.

Has anything changed since the last time you could do this? This question can lead you right to the problem’s cause. Seriously—the thing that changed right before the problem began happening is almost always what caused it. It’s so important that if you ask it and your user tells you, “Nothing changed…it just happened,” you should rephrase the question and say something like, “Did anyone add anything to your computer?” or “Are you doing anything differently from the way you usually do it?”

Were any error messages displayed? These are basically arrows that point directly at the problem’s origin; error messages are designed by programmers for the purpose of pointing them to exactly what it is that isn’t working properly in computer systems. Sometimes error messages are crystal clear, like Disk Full, or they can be cryptically annoying little puzzles in and of themselves. If you pulled the short straw and got the latter variety, it’s probably best to hit the software or hardware vendor’s support site, where you can usually score a translation from the “programmerese” in which the error message is written into plain English so you can get back to solving your riddle.

Are other people experiencing this problem? You’ve got to ask this one because the answer will definitely help you target the cause of the problem. First, try to duplicate the problem from your own workstation because if you can’t, it’s likely that the issue is related to only one user or group of users—possibly their workstations. (A solid hint that this is the case is if you’re being inundated with calls from a bunch of people from the same workgroup.)

Is the problem always the same? It’s good to know that when problems crop up, they’re almost always the same each time they occur. But their symptoms can change slightly as the conditions surrounding them change. A related question would be, “If you do x, does the problem get better or worse?” For example, ask a user, “If you use a different file, does the problem get better or worse?” If the symptoms lighten up, it’s an indication that the problem is related to the original file that’s being used. It’s important to try to duplicate the problem to find the source of the issue as soon as possible!

Understand that these are just a few of the questions you can use to get to the source of a problem.

Okay, so let’s get back to our sample scenario. So far, you’ve determined that the problem is unique to one user, which tells you that the problem is specific to this one host. Confirming that is the fact that you haven’t received any other calls from other users on the network.

And when watching the user attempt to reproduce the problem, you note that they’re typing the address correctly. Plus, you’ve got an error message that leads you to believe that the problem has something to with Domain Name Service (DNS) lookups on the user’s host. Time to go deeper…

Identify Symptoms

I probably don’t need to tell you that computers and networks can be really fickle—they can hum along fine for months, suddenly crash, and then continue to work fine again without ever seizing in that way again. That’s why it’s so important to be able to reproduce the problem and identify the affected area to narrow things down so you can cut to the chase and fix the issue fast. This really helps—when something isn’t working, try it again, and write down exactly what is and is not happening.

Most users’ knee-jerk reaction is to straight up call the help desk the minute they have a problem. This is not only annoying but also inefficient, because you’re going to ask them exactly what they were doing when the problem occurred and most users have no idea what they were doing with the computer at the time because they were focused on doing their jobs instead. This is why if you train users to reproduce the problem and jot down some notes about it before calling you, they’ll be much better prepared to give you the information you need to start troubleshooting it and help them.

So with that, here we go. The problem you’ve identified results in coughing out an error message to your user when they try to access the corporate intranet. It looks like Figure 19.1.

Figure 19.1 Cannot connect

Imahge described by caption and surrounding text

And when this user tries to ping the server using the server’s hierarchical web name, it fails, too (see Figure 19.2).

Figure 19.2 Host could not be found.

Imahge described by caption and surrounding text

You’re going to respond by checking to see whether the server is up by pinging the server by its IP address (see Figure 19.3).

Figure 19.3 Successful ping

Imahge described by caption and surrounding text

Nice—that worked, so the server is up, but you could still have a server problem. Just because you can ping a host, it doesn’t mean that host is 100 percent up and running, but in this case, it’s a good start.

And you’re in luck because you’ve been able to re-create this problem from this user’s host machine. By doing that, you now know that the URL name is not being resolved from Internet Explorer, and you can’t ping it by the name either. But you can ping the server IP address from your limping host, and when you try this same connection to the internal .lammle.com server from another host nearby, it works fine, meaning the server is working fine. So, you’ve succeeded in isolating the problem to this specific host—yes!

It is a huge advantage if you can watch the user try to reproduce the problem themselves because then you know for sure whether the user is performing the operation correctly. It’s a really bad idea to assume the user is typing in what they say they are.

Approach Multiple Problems Individually

You should never mix possible solutions when troubleshooting. When multiple changes are made, interactions can occur that muddy the results. Always attack one issue at a time and make only a single change at a time. When a change does not have a beneficial effect, reverse the change before making another change.

Great—now you’ve nailed down the problem. This leads us to step 2

Step 2: Establish a Theory of Probable Cause

After you observe the problem and identify the symptoms, next on the list is to establish its most probable cause. (If you’re stressing about it now, don’t, because though you may feel overwhelmed by all this, it truly does get a lot easier with time and experience.)

You must come up with at least one possible cause, even though it may not be completely on the money. And you don’t always have to come up with it yourself. Someone else in the group may have the answer. Also, don’t forget to check online sources and vendor documentation.

Again, let’s get back to our scenario, in which you’ve determined the cause is probably an improperly configured DNS lookup on the workstation. The next thing to do is to verify the configuration (and probably reconfigure DNS on the workstation; we’ll get to this solution later, in step 4).

Understand that there are legions of problems that can occur on a network—and I’m sorry to tell you this, but they’re typically not as simple as the example we’ve been using. They can be, but I just don’t want you to expect them to be. Always consider the physical aspects of a network, but look beyond them into the realm of logical factors like the DNS lookup issue we’ve been using.

Question the Obvious

The probable causes that you’ve got to thoroughly understand to meet the Network+ objectives are as follows:

  • Port speed
  • Port duplex mismatch
  • Mismatched MTU
  • Incorrect virtual local area network (VLAN)
  • Incorrect IP address/duplicate IP address
  • Wrong gateway
  • Wrong DNS
  • Wrong subnet mask
  • Incorrect interface/interface misconfiguration
  • Duplicate MAC addresses
  • Expired IP address
  • Rogue DHCP server
  • Untrusted SSL certificate
  • Incorrect time
  • Exhausted DHCP scope
  • Blocked TCP/UDP ports
  • Incorrect host-based firewall settings
  • Incorrect ACL settings
  • Unresponsive service

Let’s talk about these logical issues, which can cause an abundance of network problems. Most of these happen because a device has been improperly configured.

Port Speed Because networks have been evolving for many years, there are various levels of speed and sophistication mixed into them—often within the same network. Most of the newest NICs can be used at 10 Mbps, 100 Mbps, and 1,000 Mbps. Most switches can support at least 10 Mbps and 100 Mbps, and an increasing number of switches can also support 1,000 Mbps. Plus, many switches can also autosense the speed of the NIC that’s connected and use different speeds on various ports. As long as the switches are allowed to autosense the port speed, it’s rare to have a problem develop that results in a complete lack of communication. But if you decide to set the port speed manually, make positively sure to set the same speed on both sides of a link.

Port Duplex Mismatch There are generally three duplex settings on each port of a network switch: full, half, and auto. In order for two devices to connect effectively, the duplex setting has to match on both sides of the connection. If one side of a connection is set to full and the other is set to half, they’re mismatched. More elusively, if both sides are set to auto but the devices are different, you can also end up with a mismatch because the device on one side defaults to full and the other one defaults to half.

Duplex mismatches can cause lots of network and interface errors, and even the lack of a network connection. This is partially because setting the interfaces to full duplex disables the CSMA/CD protocol. This is definitely not a problem in a network that has no hubs (and therefore no shared segments in which there could be collisions), but it can make things really ugly in a network where hubs are still being used. This means the settings you choose are based on the type of devices you have populating your network. If you have all switches and no hubs, feel free to set all interfaces to full duplex, but if you’ve got hubs in the mix, you have shared networks, so you’re forced to keep the settings at half duplex. With all new switches produced today, leaving the speed and duplex setting to auto (the default on both switches and hosts) is the recommended way to go.

Mismatched MTU Ethernet LANs enforce what is called a maximum transmission unit (MTU). This is the largest size packet that is allowed across a segment. In most cases, this is 1,500 bytes. Left alone this is usually not a problem, but it is possible to set the MTU on a router interface, which means it is possible for a mismatch to be present between two router interfaces. This can cause problems with communications between the routers, resulting in the link failing to pass traffic. To check the MTU on an interface, execute the command show interface.

Incorrect VLAN Switches can have multiple VLANs each, and they can be connected to other switches using trunk links. As you now know, VLANs are often used to represent departments or the occupations of a group of users. This makes the configurations of security policies and network access lists much easier to manage and control. On the other hand, if a port is accidentally assigned to the wrong VLAN in a switch, it’s as if that client was magically transported to another place in the network. If that happens, the security policies that should apply to the client won’t anymore, and other policies will be applied to the client that never should have been. The correct VLAN port assignment of a client is as important as air; when I’m troubleshooting a single-host problem, this is the first place I look.

It’s pretty easy to tell if you have a port configured with a wrong VLAN assignment. If this is the case, it won’t be long before you’ll get a call from some user screaming something at you that makes the building shake, like, “I can get to the Internet but I can’t get to the Sales server, and I’m about to lose a huge sale. DO SOMETHING!” When you check the switch, you will invariably see that this user’s port has a membership in another VLAN like Marketing, which has no access to the Sales server.

Incorrect IP Address/Duplicate IP Address The most common addressing protocol in use today is IPv4, which provides a unique IP address for each host on a network. Client computers usually get their addresses from Dynamic Host Configuration Protocol (DHCP) servers. But sometimes, especially in smaller networks, IP addresses for servers and router interfaces are statically assigned by the network’s administrator. An incorrect or duplicate IP address on a client will keep that client from being able to communicate and may even cause a conflict with another client on the network, and a bad address on a server or router interface can be disastrous and affect a multitude of users. This is exactly why you need to be super careful to set up DHCP servers correctly and also when configuring the static IP addresses assigned to servers and router interfaces.

Wrong Gateway A gateway, sometimes called a default gateway or an IP default gateway, is a router interface’s address that’s configured to forward traffic with a destination IP address that’s not in the same subnet as the device itself. Let me clarify that one for you: If a device compares where a packet wants to go with the network it’s currently on and finds that the packet needs to go to a remote network, the device will send that packet to the gateway to be forwarded to the remote network. Because every device needs a valid gateway to obtain communication outside its own network, it’s going to require some careful planning when considering the gateway configuration of devices in your network.

If you’re configuring a static IP address and default gateway, you need to verify the router’s address. Not doing so is a really common “wrong gateway” problem that I see all the time.

Wrong DNS DNS servers are used by networks and their clients to resolve a computer’s hostname to its IP addresses and to enable clients to find the server they need to provide the resources they require, like a domain controller during the login and authentication process. Most of the time, DNS addresses are automatically configured by a DHCP server, but sometimes these addresses are statically configured instead. Because lots of applications rely on hostname resolution, a botched DNS configuration usually causes a computer’s network applications to fail just like the user’s applications in our example scenario.

If you can ping a host using its IP address but not its name, you probably have some type of name-resolution issue. It’s probably lurking somewhere within a DNS configuration.

Wrong Subnet Mask When network devices look at an IP address configuration, they see a combination of the IP address and the subnet mask. The device uses the subnet mask to establish which part of the address represents the network address and which part represents the host address. So clearly, a subnet mask that is configured wrong has the same nasty effect as a wrong IP address configuration does on communications. Again, a subnet mask is generally configured by the DHCP server; if you’re going to enter it manually, make sure the subnet mask is tight or you’ll end up tangling with the fallout caused by the entire address’s misconfiguration.

Incorrect Interface/Interface Misconfiguration If a host is plugged into a misconfigured switch port, or if it’s plugged into the wrong switch port that’s configured for the wrong VLAN, the host won’t function correctly. Make sure the speed, duplex, and correct Ethernet cable is used. Get any of that wrong and either you’ll get interface errors on the host and switch port or, worse, things just won’t work at all!

Duplicate MAC Addresses There should never be duplicate MAC addresses in your environment. Each interface vendor is issued an organizationally unique identifier (OUI), which will match on all interfaces produced by that vendor, and then the vendor is responsible for ensuring unique MAC addresses. That means duplicate MAC addresses usually indicate a MAC spoofing attack, in which some malicious individual changes their MAC address, which can be done quite easily in the properties of the NIC.

Expired IP Address In almost all cases, when DHCP is used to allocate IP configurations to devices, the configuration is supplied to the DHCP client is on a temporary basis. The lease period is configurable, and when the lease period and a grace period transpire, the lease is expired. The effect of an expired lease is the next time that client computer starts, it must enter the initialization state and obtain new TCP/IP configuration information from a DHCP server. There is nothing, however, to prevent the client from obtaining a new lease for the same IP address.

Rogue DHCP Server Dynamic Host Configuration Protocol (DHCP) is used to automate the process of assigning IP configurations to hosts. When configured properly, it reduces administrative overload, reduces the human error inherent in manual assignment, and enhances device mobility. But it introduces a vulnerability that when leveraged by a malicious individual can result in an inability of hosts to communicate (constituting a DoS attack) and peer-to-peer attacks.

When an illegitimate DHCP server (called a rogue DHCP server) is introduced to the network, unsuspecting hosts may accept DHCP Offer packets from the illegitimate DHCP server rather than the legitimate DHCP server. When this occurs, the rogue DHCP server will not only issue the host an incorrect IP address, subnet mask, and default gateway address (which makes a peer-to-peer attack possible), it can also issue an incorrect DNS server address, which will lead to the host relying on the attacker’s DNS server for the IP addresses of websites (such as major banks) that lead to phishing attacks. An example of how this can occur is shown in Figure 19.4.

Figure 19.4 Rogue DHCP

Images show rogue DHCP server that has DHCP client and DHCP server are connected through LAN and DHCP rogue server.

In Figure 19.4, after receiving an incorrect IP address, subnet mask, default gateway, and DNS server address from the rogue DHCP server, the DHCP client uses the attacker’s DNS server to obtain the IP address of his bank. This leads the client to unwittingly connect to the attacker’s copy of the bank’s website. When the client enters his credentials to log in, the attacker now has the client’s bank credentials and can proceed to empty out his account.

Untrusted SSL Certificate Reception of an untrusted SSL certificate error message can be for several reasons. In Figure 19.5, the possible reasons for the warning message are shown. The first reason, “The Security certificate presented by this website was not issued by a trusted certificate authority,” means the CA that issued the certificate is not trusted by the local machine. This will occur if the certificate of the CA that issued the certificate is not found in the Trusted Root Certification Authorities Folder on the local machine.

Figure 19.5 Certificate error

Image shows screen of certificate error that has contents such as “security certificate presented by this website was not issued by trusted certificate authority…”

The second reason this might occur is that the certificate is not valid. It may be that the certificate was presented before the validity period begins, or it may have expired, meaning the validity period is over.

The third reason is that the name on the certificate does not match the name listed on the certificate.

Incorrect Time Incorrect time on a device can be the cause of several issues. First, in a Windows environment using Active Directory, a clock skew of more than 5 minutes between a client and server will prevent communication between the two.

Second, when certificates are in use, proper time synchronization is critical for successful operation.

Finally, when system logs are sent to a central server such as a Syslog server, proper time synchronization is critical to understand the order of events.

Exhausted DHCP Scope When a DHCP server is implemented, it is configured with a limited number of IP addresses. When the IP addresses in a scope are exhausted, any new DHCP clients will be unable to obtain an IP address and will be unable to function on the network.

DHCP servers can be set up to provide backup to another DHCP server for a scope. When this is done, it is important to ensure that while the two DHCP servers service the same scope, they do not have any duplicate IP addresses.

Blocked TCP/UDP Ports When the ports used by common services and applications are blocked, either on the network firewall or on the personal firewall of a device, it will be impossible to make use of the service or application. One easy way to verify the open ports on a device is to execute the netstat command. An example of the output is shown in Figure 19.6.

Figure 19.6 Netstat -a

Image shows output window of netstat -a that has “C: 
etstat -a Active connections, Proto, Local Address, Foreign Address, State, and so on.”

Incorrect Host-Based Firewall Settings As you saw in the explanation of blocked TCP/UDP ports, incorrect host-based firewall settings can either prevent transmissions or allow unwanted communications. Neither of these outcomes is desirable. One of the best ways to ensure that firewall settings are consistent and correct all the time is to control these settings with a group policy. When you do this, the settings will be checked and reset at every policy refresh interval.

Incorrect ACL Settings Access control lists are used to control which traffic types can enter and exit ports on the router. When mistakes are made either in the construction of the ACLs or in their application, many devices may be affected. The creation and application of these tools should only be done by those who have been trained in their syntax and in the logic ACLs use in their operation.

Unresponsive Service Services can fail for several reasons. Many services depend on other services for their operation. Therefore, the failure of one service sometimes causes a domino effect, taking down other services that depend on it. You can use the Services applet in Control Panel to identify these dependencies as well as start and stop services. To identify the services upon which a particular service depends, use the Dependencies tab on the Services applet, as shown in Figure 19.7.

Figure 19.7 Service dependencies

Imahge described by caption and surrounding text

In Figure 19.7, the spooler service is selected and the Dependencies tab is displayed. Here we see that the spooler service depends on the HTTP and the RPC services. Therefore, if the spooler service will not start, we may need to restart one of these two services first.

With all that in mind, let’s move on with our troubleshooting steps.

Consider Multiple Approaches

There are two standard approaches that you can use to establish a theory of probable cause. Let’s take a look at them next.

Top-to-Bottom/Bottom-to-Top OSI Model As its name implies, when you apply a top-down approach to troubleshooting a networking problem, you start with the user application and work your way down the layers of the OSI model. If a layer is not in good working condition, you inspect the layer below it. When you know that the current layer is not in working condition and you discover that a lower layer works, you can conclude that the problem is within the layer above the lower working layer. Once you’ve determined which layer is the lowest layer with problems, you can begin identifying the cause of them from within that layer.

The bottom-up approach to troubleshooting a networking problem starts with the physical components of the network and works its way up the layers of the OSI model. If you conclude that all the elements associated with a particular layer are in good working order, move on to inspect the elements associated with the next layer up until the cause(s) of the problem is/are identified. The downside to the bottom-up approach is that it requires you to check every device, interface, and so on. In other words, regardless of the nature of the problem, the bottom-up approach starts with an exhaustive check of all the elements of each layer, starting with the physical layer and working its way up from there.

Divide and Conquer Unlike when opting for the top-down and bottom-up troubleshooting strategies, the divide-and-conquer approach to network troubleshooting doesn’t always begin the investigation at a particular OSI layer. When using the divide-and-conquer approach, you select a layer, test its health, and based on the results, you can move up or down through the model from the layer you began scrutinizing.

Step 3: Test the Theory to Determine Cause

Once you’ve gathered information and established a plausible theory, you’ve got to determine the next steps to resolve your problem. If you can’t confirm your theory during this step, you must formulate a new theory or escalate the problem.

Let’s look into the matter by first checking the IP configuration of the host that just happens to include DNS information. You use the ipconfig /all command to show the IP configuration. The /all switch will give you the DNS information you need, as shown in Figure 19.8.

Figure 19.8 ipconfig/all

Imahge described by caption and surrounding text

Check out the DNS entries: 1.1.1.1 and 2.2.2.2. Is this right? What are they supposed to be? You can find this out by checking the addresses on a working host, but let’s check the settings on your troubled host’s adapter first. Click Start, then Control Panel, then Network And Sharing Center, and then Manage Network Connections on the left side of the screen, which will take you to the screen shown in Figure 19.9.

Figure 19.9 Manage Network Connections

Imahge described by caption and surrounding text

Now, click the interface in question, and click Properties. You receive this screen, shown in Figure 19.10.

Figure 19.10 Interface properties

Imahge described by caption and surrounding text

From here, you highlight Internet Protocol Version 4, and click Properties (or just double-click). From Figure 19.11, do you see what may be causing the problem?

Figure 19.11 IP properties

Imahge described by caption and surrounding text

As I said, you’re using DHCP, right? But DNS is statically configured on this host. Interesting enough, when you set a static DNS entry on an interface, it will override the DHCP-provided DNS entry.

Once the theory is confirmed, determine the next steps to resolve the problem When the testing of the theory is complete, you will have determined if the suggested cause is correct. If you find you are correct, the next steps (next section) is to establish a plan of action to resolve the problem and identify potential effects.

If the theory is not confirmed, reestablish a new theory or escalate If you find that the suggested theory is not the cause of the issue, then you should move on to test any other theories you may have developed. In the event you have exhausted all theories you may have developed, it is advisable to escalate the issue to a more senior technician or, when it involves a system with which you are unfamiliar, the system owner or manager.

Step 4: Establish a Plan of Action to Resolve the Problem and Identify Potential Effects

Now that you’ve identified some possible changes, you’ve got to follow through and test your solution to see if you really solved the problem. In this case, you ask the user to try to access the intranet server (because that’s what they called about). Basically, you just ask the user to try doing whatever it was they couldn’t do when they called you in the first place. If it works—sweet—problem solved. If not, try the operation yourself.

Now you can test the proposed solution on the computer of the user who is still waiting for a solution. To do that, you need to check the DNS configuration on your host. But first, let me point out something about the neglected user’s network. All hosts are using DHCP, so it’s really weird that a single user is having a DNS resolution issue.

So, to fix the problem and get your user back in the game, just click Obtain DNS Server Address Automatically, and then click OK (see Figure 19.12). Voilà!

Figure 19.12 Obtain DNS Server Address Automatically

Imahge described by caption and surrounding text

Let’s take a look at the output of ipconfig /all in Figure 19.13 and see if you received new DNS server addresses.

Figure 19.13 ipconfig /all

Imahge described by caption and surrounding text

All good; you did. And you can test the host by trying to use HTTP to connect to a web page on the intranet server and even pinging by hostname. Congratulations on solving your first trouble ticket!

If things hadn’t worked out so well, you would go back to step 2, select a new possible cause, and redo step 3. If this happens, keep track of what worked and what didn’t so you don’t make the same mistakes twice.

It’s pretty much common sense that you should change settings like this only when you fully understand the effect your changes will have, or when you’re asked to by someone who does. The incorrect configuration of these settings will disable the normal operation of your workstation, and, well, it seems that someone (the user, maybe?) did something they shouldn’t have or you wouldn’t have had the pleasure of solving this problem.

You have to be super careful when changing settings and always check out a troubled host’s network settings. Don’t just assume that because they’re using DHCP, someone has screwed up the static configuration.

Step 5: Implement the Solution or Escalate as Necessary

Although it’s true that CompTIA doesn’t expect you to fix every single network problem that could possibly happen in the universe, they actually do expect you to get pretty close to determining exactly what the problem is. And if you can’t fix it, you’ll be expected to know how to escalate it and to whom. You are only as good as your resources—be they your own skill set, a book like this one, other more reference-oriented technical books, the Internet, or even a guru at a call center.

I know it seems like I talked to death physical and logical issues that cause problems in a network, but trust me, with what I’ve taught you, you’re just getting started. There’s a galaxy of networking evils that we have not even touched on because they’re far beyond the objectives for Network+ certification and, therefore, the scope of this book. But out there in the real world, you’ll get calls about them anyway, and because you’re not yet equipped to handle them yourself, you need to escalate these nasties to a senior network engineer who has the additional experience and knowledge required to resolve the problems.

Some of the calamities that you should escalate are as follows:

  • Switching loops
  • Missing routes
  • Routing loops
  • Routing problems
  • MTU black hole
  • Bad modules
  • Proxy Address Resolution Protocol (ARP)
  • Broadcast storms
  • NIC Teaming misconfiguration
  • Power failures/power anomalies

If you can’t implement a solution and instead have to escalate the problem, there is no need for you to go on with steps 6 and 7 of the seven-step troubleshooting model. You now need to meet with the emergency response team to determine the next step.

And just as with other problems, you have to be able to identify these events because if you can’t do that, how else will you know that you need to escalate them?

Switching Loops Today’s networks often connect switches with redundant links to provide for fault tolerance and load balancing. Protocols such as Spanning Tree Protocol (STP) prevent switching loops and simultaneously maintain fault tolerance. If STP fails, it takes some expertise to reconfigure and repair the network, so you just need to be concerned with being able to identify the problem so you can escalate it. Remember, when you hear users complaining that the network works fine for a while, then unexpectedly goes down for about a minute, and then goes back to being fine, it’s definitely an STP convergence issue that’s pretty tough to find and fix. Escalate this problem ASAP!

Missing Routes Routers must have routes either configured or learned to function. There are a number of issues that can prevent a router from learning the routes that it needs. To determine if a router has the route to the network in question, execute the show ip route command and view the routing table. This can save a lot of additional troubleshooting if you can narrow the problem to a missing route.

Routing Loops Routing protocols are often used on networks to control traffic efficiently while preventing routing loops that happen when a routing protocol hasn’t been configured properly or network changes didn’t get the attention they deserved. Routing loops can also happen if you or the network admin blew the static configuration and created conflicting routes through the network. This evil event affects the traffic flow for all users, and because it’s pretty complicated to fix, again, it’s up, up, and away with this one. You can expect routing loops to occur if your network is running old routing protocols like Routing Information Protocol (RIP) and RIPv2. Just upgrading your routing protocol to Enhanced Interior Gateway Routing Protocol (EIGRP), Open Shortest Path First (OSPF), or Intermediate System-to-Intermediate System (IS-IS) will usually take care of the problem once and for all. Anyway, escalate this problem to the router group—which hopefully is soon to be you.

Routing Problems Routing packets through the many subnets of a large enterprise while still maintaining security can be a tremendous challenge. A router’s configuration can include all kinds of stuff like access lists, Network Address Translation (NAT), Port Address Translation (PAT), and even authentication protocols like Remote Authentication Dial In User Service (RADIUS) and Terminal Access Controller Access-Control System (TACACS). Particularly diabolical, errant configuration changes can trigger a domino effect that can derail traffic down the wrong path or even cause it to come to a grinding halt and stop traversing the network completely. To identify routing problems, check to see if someone has simply set a wrong default route on a router. This can easily create routing loops. I see it all the time. These configurations can be highly complex and specific to a particular device, so they need to be escalated to the top dogs—get the problem to the best sys admin in the router group.

MTU Black Hole On a WAN connection, communication routes may fail if an intermediate network segment has an MTU that is smaller than the maximum packet size of the communicating hosts—and if the router does not send an appropriate Internet Control Message Protocol (ICMP) response to this condition. If ICMP traffic is allowed, the routers will take care of this problem using ICMP messages. However, as ICMP traffic is increasingly being blocked, this can create what is called a black hole. This will probably be an issue you will escalate.

Bad Modules Some multilayer switches and routers have slots available to add new features. The hardware that fits in these slots is called a module. These modules can host fiber connections, wireless connections, and other types as well. A common example is the Cisco Small Form-Factor Pluggable (SFP) Gigabit Interface Ethernet Converter (GBIC). This is an input-output device that plugs into an existing Gigabit Ethernet port or slot, providing a variety of additional capabilities to the device hosting the slot or port. Conversions can happen in all types of cables, from Ethernet to fiber, for example. Like any piece of hardware made by humans, the modules can fail. It is always worth checking if there are no other reasons a link is not functioning.

Proxy ARP Address Resolution Protocol (ARP) is a service that resolves IP addresses to MAC addresses. Proxy ARP is just wrong to use in today’s networks, but hosts and routers still have it on by default. The idea of Proxy ARP was to solve the problem of a host being able to have only one configured default gateway. To allow redundancy, Proxy ARP running on a router will respond to an ARP broadcast from a host that’s sending a packet to a remote network—but the host doesn’t have a default gateway set. So the router responds by being the proxy for the remote host, which in turn makes the local host think the remote host is really local; as a result, the local host sends the packets to the router, which then forwards the packets to the remote host. Most of the time, in today’s networks, this does not work well, if at all. Disable Proxy ARP on your routers, and make sure you have default gateways set on all your hosts. If you need router redundancy, there are much better solutions available than Proxy ARP! This is another job for the routing group.

Broadcast Storms When a switch receives a broadcast, it will normally flood the broadcast out all the ports except for the one the broadcast came in on. If STP fails between switches or is disabled by an administrator, it’s possible that the traffic could continue to be flooded repeatedly throughout the switch topology. When this happens, the network can get so busy that normal traffic can’t traverse it—an event referred to as a broadcast storm. As you can imagine, this is a particularly gruesome thing to have to troubleshoot and fix because you need to find the one bad link that is causing the mess while the network is probably still up and running—but at a heavily congested crawl. Escalate ASAP to experts!

NIC Teaming Misconfiguration NIC Teaming, also known as load balancing/failover (LBFO), allows multiple network interfaces to be placed into a team for the purposes of bandwidth aggregation and/or traffic failover to prevent connectivity loss in the event of a network component failure. The cards can be set to active-active state, where both cards are load balancing, or active-passive, where one card is on standby in case the primary card fails. Most of the time, the NIC team will use a multicast address to send and receive data, but it can also use a broadcast address so all cards receive the data at the same time. If these are not configured correctly, either they will operate at a severely diminished capacity or, worse, neither card will work at all!

Power Failure/Power Anomalies When you have power issues, whether it’s a full-blown power outage or intermittent power surges, it can cause some serious issues with your network devices. Your servers and core network devices require a fully functional UPS system.

Step 6: Verify Full System Functionality, and If Applicable, Implement Preventative Measures

A trap that any network technician can fall into is solving one problem and thinking it’s all fixed without stopping to consider the possible consequences of their solution. The cure can be worse than the disease, and it’s possible that your solution falls into this category. So before you fully implement the solution to a problem, make sure you totally understand the ramifications of doing so—clearly, if it causes more problems than it fixes, you should toss it and find a different solution that does no harm.

Many people update a router’s operating system or firmware just because a new version of code is released from the manufacturer. Do not do this on your production routers—just say no! Always test any new code before upgrading your production routers: Like a bad solution, sometimes the new code provides new features but creates more problems, and the cons outweigh the pros.

Step 7: Document Findings, Actions, and Outcomes

I can’t stress enough how vital network documentation is. Always document problems and solutions so that you have the information at hand when a similar problem arises in the future. With documented solutions to documented problems, you can assemble your own database of information that you can use to troubleshoot other problems. Be sure to include information like the following:

  • A description of the conditions surrounding the problem
  • The OS version, the software version, the type of computer, and the type of NIC
  • Whether you were able to reproduce the problem
  • The solutions you tried
  • The ultimate solution

Real World Scenario

Network Documentation

I don’t know how many times I’ve gone into a place and asked where their documentation was only to be met with a blank stare. I was recently at a small business that was experiencing network problems. The first question I asked was, “Do you have any kind of network documentation?” I got the blank stare. So, we proceeded to search through lots of receipts and other paperwork—anything we could find to help us understand the network layout and figure out exactly what was on the network. It turned out they had recently bought a WAP, and it was having trouble connecting—something that would’ve taken me five minutes to fix instead of searching through a mess for a couple hours!

Documentation doesn’t have to look like a sleek owner’s manual or anything—it can consist of a simple three-ring binder with an up-to-date network map; receipts for network equipment; a pocket for owner’s manuals; and a stack of loose-leaf paper to record services, changes, network-addressing assignments, access lists, and so on. Just this little bit of documentation can save lots of time and money and prevent grief, especially in the critical first few months of a new network install.

Troubleshooting Tips

Now that you’ve got the basics of network troubleshooting down pat, I’m going to go over a few really handy troubleshooting tips for you to arm yourself with even further in the quest to conquer the world’s networking evils.

Don’t Overlook the Small Stuff

The super simple stuff (SSS) I referred to at the beginning of this chapter should never be overlooked—ever! Here’s a quick review: Just remember that problems are often caused by little things like a bad power switch; a power switch in the wrong position; a card or port that’s not working, indicated by a link light that’s not lit; or simply operator error (OE). Even the most experienced system administrator has forgotten to turn on the power, left a cable unplugged, or mistyped a username and password—not me, of course, but others . . .

And make sure that users get solid training for the systems they use. An ounce of prevention is worth a pound of cure, and you’ll experience dramatically fewer ID10T errors this way.

Prioritize Your Problems

Being a network administrator or technician of even a fairly small network can keep you hopping, and it’s pretty rare that you’ll get calls for help one at a time and never be interrupted by more coming in. Closer to reality is receiving yet another call when you already have three people waiting for service. So, you’ve got to prioritize.

You start this process by again asking some basic questions to determine the severity of the problem being reported. Clearly, if the new call is about something little and you already have a huge issue to deal with, you should put the new call on hold or get their info and get back to them later. If you establish a good set of priorities, you’ll make much better use of your time. Here’s an example of the rank you probably want to give to networking problems, from highest priority to lowest:

  • Total network failure (affects everyone)
  • Partial network failure (affects small groups of users)
  • Small network failure (affects a small, single group of users)
  • Total workstation failure (single user can’t work at all)
  • Partial workstation failure (single user can’t do most tasks)
  • Minor issue (single user has problems that crop up now and then)

Mitigating circumstances can, of course, change the order of this list. For example, if the president of the company can’t retrieve email, you’d take the express elevator to their office as soon as you got the call, right? And even a minor issue can move up the ladder if it’s persistent enough.

Don’t fall prey to thinking that simple problems are easier to deal with because even though you may be able to bring up a crashed server in minutes, a user who doesn’t know how to make columns line up in Microsoft Word could take a chunk out of your day. You’d want to put the latter problem toward the bottom of the list because of the time involved—it’s a lot more efficient to solve problems for a big group of people than to fix this one user’s problem immediately.

Some network administrators list all network-service requests on a chalkboard or a whiteboard. They then prioritize them based on the previously discussed criteria. Some larger companies have written support-call tracking software whose only function is to track and prioritize all network and computer problems. Use whatever method makes you comfortable, but prioritize your calls.

Check the Software Configuration

Often, network problems can be traced to software configuration, like our DNS configuration scenario; so when you’re checking for software problems, don’t forget to check types of configurations:

  • DNS configuration/misconfiguration
  • DHCP configuration/misconfiguration
  • WINS configuration
  • HOSTS file
  • The Registry

Software-configuration settings love to hide in places like these and can be notoriously hard to find (especially in the Registry).

Also, look for lines that have been commented out either intentionally or accidentally in text-configuration files—another place for clues. A command such as REM or REMARK, or asterisk or semicolon characters, indicates comment lines in a file.

In the HOSTS file, a pound sign (#) is used to indicate a comment line.

Don’t Overlook Physical Conditions

You want to make sure that from a network-design standpoint, the physical environment for a server is optimized for placement, temperature, and humidity. When troubleshooting an obscure network problem, don’t forget to check the physical conditions under which the network device is operating. Check for problems like these:

  • Excessive heat
  • Excessive humidity (condensation)
  • Low humidity (leads to electrostatic discharge [ESD] problems)
  • EMI/RFI problems
  • ESD problems
  • Power problems
  • Unplugged cables

Don’t Overlook Cable Problems

Cables, generally speaking, work fine once they are installed properly. If the patch cable isn’t the problem, use a cable tester (not a tone generator and locator) to find the source of the problem.

One of the easiest mistakes to make, especially if cables are not labeled, is to use a crossover cable where a straight-through cable should be used or vice versa. In either case, when you do this it causes TX/RX reversal. What’s that? That’s when the transmit wire is connected to Transmit and the receive wire to Receive. That sounds good, but it needs to be Transmit to Receive. See more about straight-through and crossover cables in Chapter 3, “Networking Topologies, Connectors, and Wiring Standards.”

Wires that are moved can be prone to breaking or shorting, and a short can happen when the wire conductor comes in contact with another conductive surface, changing the path of the electrical signal. The signal will go someplace else instead of to the intended recipient. You can use cable testers to test for many types of problems:

  • Broken cables
  • Incorrect connections
  • Interference levels
  • Total cable length (for length restrictions)
  • Cable shorts
  • Connector problems
  • Testing the cable at all possible data rates

As a matter of fact, cable testers are so sophisticated that they can even indicate the exact location of a cable break, accurate to within 6 inches or better.

Check for Viruses

People overlook scanning for viruses because they assume that the network’s virus-checking software has already picked them off. But to be effective, the software must be kept up-to-date, and updates are made available pretty much daily. You’ve got to run the virus-definition update utility to keep the virus-definition file current.

If you are having strange, unusual, irreproducible problems with a workstation, try scanning it with an up-to-date virus-scan utility. You’d be surprised how many times people have spent hours and hours troubleshooting a strange problem only to run a virus-scan utility, find and clean out one or more viruses, and have the problem disappear like magic.

Summary

In this chapter, you learned about all things troubleshooting, and you now know how to sleuth out and solve a lot of network problems. You learned to first check all the SSS and about how to approach problem resolution by eliminating what the problem is not. You learned how to narrow the problem down to its basics and define it.

Next, you learned a systematic approach using a seven-step troubleshooting model to troubleshoot most of the problems you’ll run into in networking. And you also learned about some resources you can use to help you during the troubleshooting process. In addition, you learned how important documentation is to the health of your network.

Finally, I gave you a bunch of cool tips to further equip you, tips about prioritizing issues, checking for configuration issues, considering environmental factors—even hunting down viruses. As you venture out into the real world, keep these tips in mind; along with your own personal experience, they’ll really help make you an expert troubleshooter.

Exam Essentials

Know the seven troubleshooting steps, in order. The steps, in order, are as follows:

  1. Identify the problem.
  2. Establish a theory of probable cause.
  3. Test the theory to determine cause.
  4. Establish a plan of action to resolve the problem and identify potential effects.
  5. Implement the solution or escalate as necessary.
  6. Verify full system functionality, and if applicable, implement preventative measures.
  7. Document findings, actions, and outcomes.

Be able to identify a link light. A link light is the small, usually green LED on the back of a network card. This LED is typically found next to the media connector on a NIC and is usually labeled Link.

Understand how proper network use procedures can affect the operation of a network. If a user is not following a network use procedure properly (for example, not logging in correctly), that user may report a problem where none exists. A good network troubleshooter should know how to differentiate between a network hardware/software problem and a “lack of user training” problem.

Know how to narrow down a problem to one specific area or cause. Most problems can be traced to one specific area or cause. You must be able to determine if a problem is specific to one user or a bunch of users, specific to one computer or a bunch of computers, and related to hardware or software. The answers to these questions will give you a very specific problem focus.

Know how to detect cabling-related problems. Generally speaking, most cabling-related problems can be traced by plugging the suspect workstation into a known, working network port. If the problem disappears (or at the very least changes significantly), it is related to the cabling for that workstation.

Written Lab

In this section, write the answers to the following questions. You can find the answers in Appendix A.

  1. What is step 3 of the seven-step troubleshooting model?

  2. What is step 7 of the seven-step troubleshooting model?

  3. How is crosstalk minimized in twisted-pair cabling?

  4. If you plug a host into a switch port and the user cannot get to the server or other services they need to access despite a working link light, what could the problem be?

  5. What is it called when a cable has two wires of a twisted pair connected to two wires from a different pair?

  6. When a signal moves through any medium, the medium itself will degrade the signal. What is this called?

  7. What is step 4 of the seven-step troubleshooting model?

  8. What is step 5 of the seven-step troubleshooting model?

  9. What are some of the problems that, if determined, should be escalated?

  10. What cable issues should you know and understand for network troubleshooting?

Review Questions

You can find the answers to the review questions in Appendix B.

  1. Which of the following are not steps in the Network+ troubleshooting model? (Choose all that apply.)

    1. Reboot the servers.
    2. Identify the problem.
    3. Test the theory to determine the cause.
    4. Implement the solution or escalate as necessary.
    5. Document findings, actions, and outcomes.
    6. Reboot all the routers.
  2. You have a user who cannot connect to the network. What is the first thing you could check to determine the source of the problem?

    1. Workstation configuration
    2. Connectivity
    3. Patch cable
    4. Server configuration
  3. When wireless users complain that they are losing their connection to applications during a session, what is the source of the problem?

    1. Incorrect SSID
    2. Latency
    3. Incorrect encryption
    4. MAC address filter
  4. Several users can’t log in to the server. Which action would help you to narrow the problem down to the workstations, network, or server?

    1. Run tracert from a workstation.
    2. Check the server console for user connections.
    3. Run netstat on all workstations.
    4. Check the network diagnostics.
  5. A user can’t log in to the network. She can’t even connect to the Internet over the LAN. Other users in the same area aren’t experiencing any problems. You attempt to log in as this user from your workstation with her username and password and don’t experience any problems. However, you cannot log in with either her username or yours from her workstation. What is a likely cause of the problem?

    1. Insufficient rights to access the server
    2. A bad patch cable
    3. Server down
    4. Wrong username and password
  6. A user is experiencing problems logging in to a Unix server. He can connect to the Internet over the LAN. Other users in the same area aren’t experiencing any problems. You attempt logging in as this user from your workstation with his username and password and don’t experience any problems. However, you cannot log in with either his username or yours from his workstation. What is a likely cause of the problem?

    1. The Caps Lock key is pressed.
    2. The network hub is malfunctioning.
    3. You have a downed server.
    4. You have a jabbering NIC.
  7. You receive a call from a user who is having issues connecting to a new VPN. Which is the first step you should take?

    1. Find out what has changed.
    2. Reboot the workstation.
    3. Document the solution.
    4. Identify the symptoms and potential causes.
  8. A workstation presents an error message to a user. The message states that a duplicate IP address has been detected on the network. After establishing what has changed in the network, what should be the next step using the standard troubleshooting model?

    1. Test the result.
    2. Select the most probable cause.
    3. Create an action plan.
    4. Identify the results and effects of the solution.
  9. You have gathered information on a network issue and determined the affected areas of the network. What is your next step in resolving this issue?

    1. You should implement the best solution for the issue.
    2. You should test the best solution for the issue.
    3. You should check to see if there have been any recent changes to this affected part of the network.
    4. You should consider any negative impact to the network that might be caused by a solution.
  10. A user calls you, reporting a problem logging in to the corporate intranet. You can access the website without problems using the user’s username and password. At your request, the user has tried logging in from other workstations but has been unsuccessful. What is the most likely cause of the problem?

    1. The user is logging in incorrectly.
    2. The network is down.
    3. The intranet server is locked up.
    4. The server is not routing packets correctly to that user’s workstation.
  11. You have just implemented a solution and you want to celebrate your success. But what should you do next before you start your celebration?

    1. Gather more information about the issue.
    2. Document the issue and the solution that was implemented.
    3. Test the solution and identify other effects it may have.
    4. Escalate the issue.
  12. You can ping the local router and web server that a local user is trying to reach, but you cannot reach the web page that resides on that server. From step 2 of the troubleshooting model, what is a possible problem that would lead to this situation?

    1. Your network cable is unplugged.
    2. There is a problem with your browser.
    3. Your NIC has failed.
    4. The web server is unplugged.
  13. When troubleshooting an obscure network problem, what physical conditions should be reviewed to make sure the network device is operating correctly? (Choose all that apply.)

    1. Excessive heat
    2. Low/excessive humidity
    3. ESD problems
    4. None of the above
  14. Which of the following is not a basic physical issue that can occur on a network when a user is connected via cable?

    1. Crosstalk
    2. Shorts
    3. Open impedance mismatch
    4. DNS configurations
  15. You are troubleshooting a LAN switch and have identified the symptoms. What is the next step you should take?

    1. Escalate the issue.
    2. Create an action plan.
    3. Implement the solution.
    4. Determine the scope of the problem.
  16. A user calls you, complaining that he can’t access the corporate intranet web server. You try the same address, and you receive a Host Not Found error. Several minutes later, another user reports the same problem. You can still send email and transfer files to another server. What is the most likely cause of the problem?

    1. The hub is unplugged.
    2. The server is not routing protocols to your workstation.
    3. The user’s workstation is not connected to the network.
    4. The web server is down.
  17. You have implemented and tested a solution and identified any other effects the solution may have. What is your next step?

    1. Create an action plan.
    2. Close the case and head home for the day.
    3. Reboot the Windows server.
    4. Document the solution.
  18. Users are reporting that they can access the Internet but not the internal company website. Which of the following is the most likely problem?

    1. The DNS entry for the server is non-authoritative.
    2. The intranet server is down.
    3. The DNS address handed out by DHCP is incorrect.
    4. The default gateway is incorrect.
  19. Several users have complained about the server’s poor performance as of late. You know that the memory installed in the server is sufficient. What could you use to determine the source of the problem?

    1. Server’s NIC link light
    2. Protocol analyzer
    3. Performance-monitoring tools
    4. Server’s system log file
  20. You lose power to your computer room and the switches in your network do not come back up when everything is brought online. After you have identified the affected areas, established the cause, and escalated this problem, what do you do next?

    1. Start to implement a solution to get those users back online ASAP.
    2. Create an action plan and solution.
    3. Meet with the emergency response team to determine the next step.
    4. Copy all the working routers’ configurations to the nonworking switches.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.123.147