To many who are not familiar with computers, that whirring, humming box sitting under their desk is an enigma. They know what shows up on the screen, where the power button is, where to put DVDs, and what not to spill on their keyboard. But the insides are shrouded in mystery.
Fortunately for them, we’re around. We can tell the difference between a hard drive and a motherboard and have a pretty good idea of what each part inside that box is supposed to do. When the computer doesn’t work like it’s supposed to, we can whip out our trusty screwdriver, crack the case, and perform surgery. And most of the time, we can get the system running just as good as new.
In the following sections, we’re going to focus our troubleshooting efforts on the key hardware components inside the case, but we’re also going to include monitors (which are attached to video cards, which are inside the case, so close enough for our purposes). We will start off with motherboards, processors, memory, and power. Then we will look at storage devices, and finish off the discussion with video and display issues.
These components are the brains, backbone, and nervous system of your computer. Without a network card, you won’t be able to surf the Web. Without a processor, well, you won’t be able to surf the Web. Or do much of anything else for that matter. So we’ll get started with these components.
As you continue to learn and increase your troubleshooting experience, your value will increase as well. This is because, if nothing else, it will take you less time to accomplish common repairs. Your ability to troubleshoot by past experiences and gut feelings will make you more efficient and more valuable, which in turn will allow you to advance and earn a better income. We will give you some guidelines you can use to evaluate common hardware issues that you’re sure to face.
Before we get into specific components, let’s take a few minutes to talk about hardware symptoms and causes at a general level. This discussion can apply to a lot of different hardware components.
Some hardware issues are pretty easy to identify. If there are flames shooting out of the back of your computer, then it’s probably the power supply. If the power light on your monitor doesn’t turn on, it’s the monitor itself, the power cord, or your power source. Other hardware symptoms are a bit more ambiguous. We’ll now look at some hardware-related symptoms and their possible causes.
Electronic components produce heat; it’s a fact of life. While they’re designed to withstand a certain amount of the heat that’s produced, excessive heat can drastically shorten the life of components. There are two common ways to reduce heat-related problems in computers: heat sinks and case fans.
Any component with its own processor will have a heat sink. Typically these look like big, finned hunks of aluminum or other metal attached to the processor. Their job is to dissipate heat from the component so it doesn’t become too hot. Never run a processor without a heat sink!
Case fans are designed to take hot air from inside the case and blow it out of the case. There are many different designs, from simple motors to high-tech liquid-cooled models. Put your hand up to the back of your computer at the case fan and you should feel warm air. If there’s nothing coming out, you either need to clean your fan or replace your power supply. Some cases come with additional cooling fans to help dissipate heat.
We’ve mentioned dust before and now is a good time to bring it up again. Dust, dirt, grime, paint, smoke, and other airborne particles can become caked on the inside of the components. This is most common in automotive and manufacturing environments. The contaminants create a film that coats the components, causing them to overheat and/or conduct electricity on their surface. Blowing out these exposed systems with a can of compressed air from time to time can prevent damage to the components. While you’re cleaning the components, be sure to clean any cooling fans in the power supply or on the heat sink.
One way to ensure that dust and grime don’t find their way into your computer is to always leave the blanks in the empty slots on the back of your box. Blanks are the pieces of metal or plastic that come with the case and cover the expansion slot openings. They are designed to keep dirt, dust, and other foreign matter from the inside of the computer. They also maintain proper airflow within the case to ensure that the computer doesn’t overheat.
Have you ever been working on a computer and heard a noise that resembles fingernails on a chalkboard? If so, you will always remember that sound, along with the impending feeling of doom as the computer stopped working.
Some noises on a computer are normal. The POST beep (which we’ll talk about in a few pages) is a good sound. The whirring of your hard drive and power supply fan are familiar sounds. Some techs get so used to their “normal” system noises that if anything is slightly off pitch, they go digging for problems even if none are readily apparent.
For the most part, the components that can produce noise problems are those that move. Hard drives have motors that spin the platters. Power supply fans spin. CD and DVD drives spin the disks. If you’re hearing excessive noise, these are the likely culprits.
If you hear a whining sound and it seems to be fairly constant, it’s more than likely a fan. Either it needs to be cleaned (desperately) or replaced. Power supplies that are failing can also sound louder and quieter intermittently because a fan will run at alternating speeds.
The “fingernails on a chalkboard” squealing could be an indicator that the hard drive heads have crashed into the platter. This thankfully doesn’t seem to be as common today as it used to be, but it still happens. Note that this type of sound can also be caused by a power supply fan’s motor binding up. A rhythmic ticking sound is also likely to be the hard drive.
Problems with the CD-ROM or DVD-ROM drive tend to be the easiest to diagnose. Those drives aren’t constantly spinning unless you put some media in them. If you put a disc in and the drive makes a terrible noise, you have a good idea what the problem is.
So what do you do if you hear a terrible noise from the computer? If it’s still responsive, shut it down normally as soon as possible. If it’s not responsive, then shut off the power as quickly as you can. Examine the power supply to see if there are any obvious problems such as excessive dust, and clean as needed. Power the system back on. If the noise was caused by the hard drive, odds are that the drive has failed and the system won’t boot normally. You may need to replace some parts.
If the noise is mildly annoying but doesn’t sound drastic, boot up the computer with the case off and listen. By getting up close and personal with the system you can often tell where the noise is coming from and then troubleshoot or fix the appropriate part.
Bad smells or smoke coming from your computer are never good things. While it normally gets pretty warm inside a computer case, it should never be hot enough inside there to melt plastic components, but it does happen from time to time. And power problems can sometimes cause components to get hot enough to smoke.
If you smell an odd odor or see smoke coming from a computer, shut it down immediately. Open the case and start looking for visible signs of damage. Things to look for include melted plastic components and burn marks on circuit boards. If components appear to be damaged, it’s best to replace them before returning the computer to service.
Many hardware devices have status light indicators that can help you identify when there is a problem. Obviously, when you power on a system you expect the power light to come on. If it doesn’t, you have a problem. The same holds true for other external devices, such as wireless routers, external hard drives, and printers. In situations in which the power light doesn’t come on and the device has no power, always obey the first rule of troubleshooting: Check your connections first!
Beyond power indicators, several types of devices have additional lights that can help you troubleshoot. If you have a hub, switch, or other connectivity device, you should have an indicator for each port that lights up when there is a connection. Some devices will give you a green light for a good connection and a yellow or red light if they detect a problem. A lot of connectivity devices will also have an indicator that blinks or flashes when traffic is going through the port. Sometimes it’s the same light that indicates a connection, but other times it’s a separate indicator.
If you have a device with lights and you’re not sure what they mean, it’s best to check the manual or the manufacturer’s website to learn.
An alert is a message generated by a hardware device. In some cases, the device has a display panel that will tell you what the alert is. A good example of this is an office printer. Many have an LCD display that can tell you if something is wrong.
Other alerts will pop up on the computer screen. If the device is attached to a specific computer, the alert will generally pop up on that computer’s screen. Some devices can be configured to send an alert to a specific user account or system administrator, so the administrator will get the alert regardless of which computer they are logged in to.
The good news about visible damage is that you can usually figure out which component is damaged pretty quickly. The bad news is it often means you need to replace parts.
Visible damage to the outside of the case or the monitor casing might not matter much as long as the device still works. But if you’re looking inside a case and see burn marks or melted components, that’s a sure sign of a problem. Replace damaged circuit boards or melted plastic components immediately. After replacing the part, it’s a good idea to monitor the new component for a while too. It could be the power supply causing the problem. If the new part fries quickly too, it’s time to replace the power supply as well.
Every computer has a diagnostic program built into its basic input/output system (BIOS) called the power-on self-test (POST). When you turn on the computer, it executes this set of diagnostics. Many steps are involved in the POST, but they happen very quickly, they’re invisible to the user, and they vary among BIOS versions. The steps include checking the CPU, checking the RAM, checking for the presence of a video card, and verifying basic hardware functionality. The main reason to be aware of the POST’s existence is that if it encounters a problem, the boot process stops. Being able to determine at what point the problem occurred can help you troubleshoot.
If the computer doesn’t POST as it should, one way to determine the source of a problem is to listen for a beep code. This is a series of beeps from the computer’s speaker. A successful POST generally produces a single beep. If there’s more than one beep, the number, duration, and pattern of the beeps can sometimes tell you what component is causing the problem. However, the beeps differ depending on the BIOS manufacturer and version, so you must look up the beep code in a chart for your particular BIOS. The beeping is different for different BIOS manufacturers. AMI BIOS, for example, relies on a raw number of beeps and uses patterns of short and long beeps.
Another way to determine a problem during the POST routine is to use a POST card. This is a circuit board that fits into an ISA or PCI expansion slot in the motherboard and reports numeric codes as the boot process progresses. Each of those codes corresponds to a particular component being checked. If the POST card stops at a certain number, you can look up that number in the manual that came with the card to determine the problem.
Because we just talked about the POST routine, which is a function of the BIOS, let’s look at a few other BIOS issues as well. First, computer BIOSs don’t go bad; they just become out-of-date. This isn’t necessarily a critical issue—they will continue to support the hardware that came with the box. It does, however, become an issue when the BIOS doesn’t support some component that you would like to install—a larger hard drive, for instance.
Most of today’s BIOSs are written to an EEPROM and can be updated through the use of software. This process is called flashing the BIOS. Each manufacturer has its own method for accomplishing this. Check the documentation for complete details.
A fairly common issue with the BIOS is when it fails to retain your computer’s settings, such as time and date and hard drive configuration. The BIOS uses a small battery (much like a watch battery) on the motherboard to help it retain settings when the system power is off. If this battery fails, the BIOS won’t retain its settings. Simply replace the battery to solve the problem.
Finally, remember that your BIOS also contains the boot sequence for your system. You probably boot to the first hard drive in your system, but you can also set your BIOS to boot from the CD-ROM, the floppy drive (if you have one), or the network. If your computer is attempting to boot from the wrong device, you need to change the boot sequence in the BIOS. To do this, reboot the system, and look for the message telling you to press a certain key to enter the BIOS (usually something like F2). Once you’re in the BIOS, find the menu with the boot sequence and set it to the desired order. If the changes don’t hold the next time you reboot, check the battery!
Most motherboard and CPU problems manifest themselves by the system appearing to be completely dead. However, “completely dead” can be a symptom of a wide variety of problems, not only with the CPU or motherboard but also with the RAM or the power supply. Other times, a failing motherboard or CPU will cause the system to completely lock up, or “hang,” requiring a hard reboot, or cause continuous reboots. A POST card may be helpful in narrowing down the exact component that is faulty.
When a motherboard fails, it’s usually because it has been damaged. Most technicians can’t repair motherboard damage; the motherboard must be replaced. Motherboards can become damaged due to physical trauma, exposure to electrostatic discharge (ESD), or short-circuiting. To minimize the risk, observe the following rules:
A CPU may fail because of physical trauma or short-circuiting, but the most common cause for a CPU not to work is failure to install it properly. With a PGA- or LGA-style CPU, ensure that the CPU is oriented correctly in the socket. With an SECC-style CPU, make sure the CPU is completely inserted into its slot.
Input/output (I/O) ports are most often built into the motherboard and include legacy parallel and serial, USB, and FireWire ports. All of them are used to connect external peripherals to the motherboard. When a port doesn’t appear to be functioning, make sure the following conditions are met:
If you suspect it’s the port, you can purchase a loopback plug to test its functionality. If you suspect that the cable, rather than the port, may be the problem, swap out the cable with a known good one. If you don’t have an extra cable, you can test the existing cable with a multimeter by setting it to ohms and checking the resistance between one end of the cable and the other.
Use a pin-out diagram, if available, to determine which pin matches up to which at the other end. There is often—but not always—an inverse relationship between the ends. In other words, at one end pin 1 is at the left, and at the other end it’s at the right on the same row of pins. You see this characteristic with D-sub connectors where one end of the cable is male and the other end is female.
Isolating memory issues on a computer is one of the more difficult tasks to do properly because so many memory problems manifest themselves as software issues. For example, memory problems can cause applications to fail and produce error messages such as general protection faults (GPFs). Memory issues can also cause a fatal error in Windows, producing the infamous Blue Screen of Death (BSOD) that we discussed in Chapter 19. Sometimes these are caused by the physical memory failing. Other times they are caused by bad programming, when an application writes into a memory space reserved for the operating system or another application.
In short, memory problems can cause system lockups, unexpected shutdowns or reboots, or the errors mentioned in the preceding paragraph. They can be challenging to pin down. If you do get an error message related to memory, be sure to write down the memory address if the error gives you one. If the error happens again, write down the memory address again. If it’s the same or a similar address, then it’s very possible that the physical memory is failing. You can also use one of several hardware- or software-based RAM testers to see if your memory is working properly.
Power supply problems can manifest themselves as a system that doesn’t respond in any way when the power is turned on. When this happens, open the case, remove the power supply, and replace it with a new one. Partial failures, or intermittent power supply problems, are much less simple. A completely failed power supply gives the same symptoms as a malfunctioning wall socket, uninterruptible power supply (UPS) or power strip; a power cord that is not securely seated; or some motherboard shorts (such as those caused by an improperly seated expansion card, memory stick, CPU, and the like). You want to rule out those items before you replace the power supply and find you still have the same problem as when you started. Be aware that different cases have different types of on/off switches. The process of replacing a power supply is a lot easier if you purchase a replacement with the same mechanism.
If you’re curious as to the state of your power supply, you can buy hardware-based power supply testers online starting at about $10 and running up to several hundred dollars. Multimeters are also effective devices for testing your power supplies.
A PC that works for a few minutes and then locks up is probably experiencing overheating because of a heat sink or fan not functioning properly. To troubleshoot overheating, first check all fans inside the PC to ensure they’re operating, and make sure any heat sinks are firmly attached to their chips.
In a properly designed, properly assembled PC case, air flows in a specific path driven by the power supply fan and using the power supply’s vent holes. Make sure you know the direction of flow and that there are limited obstructions and no dust buildup. Cases are also designed to cool by making the air flow in a certain way. Therefore, operating a PC with the cover removed can make a PC more susceptible to overheating, even though it’s “getting more air.”
Similarly, operating a PC with expansion-slot covers removed can inhibit a PC’s ability to cool itself properly because the extra holes change the airflow pattern from what was intended by its design.
Although CPUs are the most common component to overheat, occasionally other chips on the motherboard, such as the chipset, or chips on other devices, particularly video cards, may also overheat. Extra heat sinks or fans may be installed to cool these chips.
Liquid cooling systems have their own set of issues. The pump that moves the liquid through the tubing and heat sinks can become obstructed or simply fail. If this happens, the liquid’s temperature will eventually equalize with that of the CPU and other components, resulting in their damage. Dust in the heat sinks has the same effect as with nonliquid cooling systems, so keep these components clean as you would any such components. Check regularly for signs of leaks that might be starting and try to catch them before they result in damage to the system.
Exercise 20.1 walks you through the steps of troubleshooting a few specific hardware problems. The exercise will probably end up being a mental one for you, unless you have the exact problem that we’re describing here. As practice, you can write down the steps you would take to solve the problem and then check to see how close you came to our steps. Clearly, there are several ways to approach a problem, so you might use a slightly different process, but the general approach should be similar. Finally, when you have found the problem, you can stop. This exercise assumes that each step didn’t solve the issue so you need to move on to the next step.
Issue One: Blank screen on bootup. You turn the computer on, and there’s nothing on the screen.
Issue Two: The power supply fan spins, but no other devices have power.
Storage devices present unique problems simply due to their nature. They’re devices with moving parts, which means they are more prone to mechanical failure than a motherboard or a stick of RAM. In the following sections, we’ll discuss hard disk problems, including RAID arrays. Then we’ll take a quick look at CD-ROM/DVD and floppy drive issues.
Hard disk system problems usually stem from one of three causes:
The first and last causes are easy to identify, because in either case the symptom will be obvious: the drive won’t work. You won’t be able to get the computer to communicate with the disk drive.
However, if the problem is a bad disk drive, the symptoms aren’t as obvious. As long as the POST routines can communicate with the disk drive, they’re usually satisfied. But the POST routines may not uncover problems related to storing information. Even with healthy POST results, you may find that you’re permitted to save information to a bad disk, but when you try to read it back, you get errors. Or the computer may not boot as quickly as it used to because the disk drive can’t read the boot information successfully every time.
Let’s take a look at some specific hard-drive-related issues, the likely culprits, and actions to take:
If you are using a Redundant Array of Independent Disks (RAID) system, you have additional challenges to deal with. First, you have more disks, so the chance of having a single failure increases. Second, you more than likely have one or more additional hard disk controllers, so again you introduce more parts that can fail. Third, there will likely be a software component that manages the RAID array.
Boiling it down, though, dealing with RAID issues is just like dealing with a single hard drive issue, except you have more parts that make up the single storage unit. If your RAID array isn’t found or stops working, try to narrow down the issue. Is it one disk that’s failed, or is the whole system down, indicating a problem with a controller or the software? Along with external enclosures, which require a separate connection to the computer, most external RAID systems have status indicators and troubleshooting utilities to help you identify problems. Definitely use those to your advantage.
Finally, the problem could be dependent on the type of RAID you’re using. If you are using RAID 0 (disk striping), you actually have more points of failure than a single device, and your fault tolerance has decreased. One drive failure will cause the set to fail. RAID 1 (disk mirroring) increases your fault tolerance; if one drive fails, the other has an exact replica of the data. You’ll need to replace the failed drive, but unless both drives unexpectedly fail, you shouldn’t lose any data. If you’re using RAID 5 (disk striping with parity), a single drive failure usually means that your data will be fine, provided you replace the failed drive.
CD-ROM and DVD problems are normally media related. Although compact disc technology is much more reliable than that for floppy disks, it’s not perfect. One factor to consider is the cleanliness of the disc. On many occasions, if a disc is unreadable, cleaning it with an approved cleaner and a lint-free cleaning towel will fix the problem. The next step might be to use a commercially available scratch-removal kit. If that fails, you always have the option to send the disc to a company that specializes in data recovery.
If the operating system doesn’t see the drive, start troubleshooting by determining whether the drive is receiving power. If the tray will eject, you can assume there is power to it. Next, check BIOS Setup (SATA or IDE drives) to make sure the drive has been detected. If not, check the master/slave jumper on the drive, and make sure the IDE adapter is set to Auto, CD-ROM, or ATAPI in BIOS Setup. Once inside the case, ensure that the ribbon cable is properly aligned with pin 1 and that both the drive and motherboard ends are securely connected.
To play movies, a DVD drive must have MPEG decoding capability. This is usually built in to the drive, video card, or sound card these days, but it may require a software decoder. If DVD data discs will play but not movies, suspect a problem with the MPEG decoding.
If a CD-RW or DVD drive works normally as a regular CD-ROM drive but doesn’t perform its special capability (doesn’t read DVD discs or doesn’t write to blank discs), perhaps you need to install software to work with it. For example, with CD-RW drives, unless you’re using an operating system that supports CD writing, you must install CD-writing software to write to CDs.
Like CD-ROM/DVD drives, most floppy-drive problems result from bad media. Your first troubleshooting technique with floppy drive issues should be to try a new disk.
One of the most common problems that develops with floppy drives is misaligned read-write heads. The symptoms are fairly easy to recognize—you can read and write to a floppy on one machine but not on any others. This is normally caused by the mechanical arm in the floppy drive becoming misaligned. When the disk was formatted, it wasn’t properly positioned on the drive, thus preventing other floppy drives from reading it.
Numerous commercial tools are available to realign floppy drive read-write heads. They use a floppy drive that has been preformatted to reposition the mechanical arm. In most cases, though, this fix is temporary—the arm will move out of place again fairly soon. Given the inexpensive nature of the problem, the best solution is to spend a few dollars and replace the drive.
Another problem you may encounter is a phantom directory listing. For example, suppose you display the contents of a floppy disk and then you swap to another floppy disk but the listing stays the same. This is almost always a result of a faulty ribbon cable; a particular wire in the ribbon cable signals when a disk swap has taken place, and when that wire breaks, this error occurs.
Troubleshooting video problems is usually fairly straightforward because there are a limited number of issues you might face. You can sum up nearly all video problems with two simple statements:
In the vast majority of cases when you have a video problem, a good troubleshooting step is to check the monitor by transferring it to another machine that you know is working. See if it works there. If the problem persists, you know it’s the monitor. If it goes away, you know it’s the video card (or possibly the driver). Is the video card seated properly? Is the newest driver installed?
Let’s take a look at some common symptoms and their causes:
Other graphics issues can be attributed to the memory installed on the video card. This is the storage location of the screens of information in queue to be displayed by the monitor. Problems with the memory modules on the video card have a direct correlation to how well it works. It follows, then, that certain unacceptable video-quality issues can be remedied by adding additional memory to a video card. Doing so generally results in an increase in both quality and performance. If you can’t add memory to the video card, you can upgrade to a new one.
3.133.127.37