Chapter 27. Responding to Security Incidents

The CERT Coordination Center (CERT-CC) reports that, despite increased awareness, the first time many organizations start thinking about how to handle a computer security incident is after an intrusion has occurred. Obviously, this isn’t a great approach. You need a plan for how you’re going to respond to a computer security incident at your site, and you need to develop that plan well before an incident occurs.

There isn’t room here to detail everything you need to know to deal with a security incident: attacks are many and varied and change constantly; responding to them can involve a byzantine assortment of legal and technical issues. This chapter is intended to give you an outline of the issues involved and the practical steps you can take ahead of time to smooth the process. Appendix A, provides a list of resources that may provide additional help.

Responding to an Incident

This section discusses a number of steps you’ll need to take when you respond to a security incident. You won’t necessarily need to follow these steps in the order they’re given, and not all of these steps are appropriate for all incidents. But, we recommend that you at least contemplate each of them when you find yourself dealing with an incident.

In Section 27.4, later in this chapter, we’ll look again at each of these steps and help you figure out how to work them into the overall response plan that you should develop before an incident actually occurs.

Evaluate the Situation

The first step in responding to a security incident is to decide what response, if any, needs to be made immediately. Ask these questions:

Has an attacker succeeded in getting into your systems?

If so, you have a genuine emergency on your hands, whether or not the attacker is currently active.

Is the attack currently in progress?

If so, you need to decide how you’re going to react right now. If the attack isn’t currently in progress, you may not be in such a hurry.

If the incident looks like an aggressive attack on your system, you probably want to take strong steps quickly. These steps might include shutting down the system or your Internet connection until you figure out how to deal with the situation.

On the other hand, if the incident is a less aggressive one — perhaps someone has just opened a Telnet connection to your machine and is trying various login/password pairs — then you may want to move more slowly. If you’re reasonably confident that the attack won’t succeed (e.g., you can see that the attacker is trying passwords that consist of all lowercase letters, and you know for certain that no account on the system has such a password), you might want to leave things alone and just watch for a while to see what the attacker does. This may give you an opportunity to trace the attack. (However, see the Section 27.3 section, later in this chapter, for a discussion of the issues involved in tracing an attack.)

Whatever you do, remember Rule 1: Don’t panic!

Start Documenting

As soon as you determine that you actually have a problem that you need to respond to, start documenting what’s going on. You don’t need to get fancy at this point (you don’t have time to, until you’ve taken the next step), but you should at least start a log by making a note of what time it is.

Disconnect or Shut Down, as Appropriate

Once you’ve evaluated the situation, your next priority is to give yourself the time to respond without risking your systems further. The least disruptive alternative is usually to disconnect the affected machine from all networks; this will shut down any active connections. Shutting down active connections may make it harder to trace the intruder, but it will allow the rest of the people at your site to continue to do their work, and it will leave the intruder’s programs running. This may help you to identify who the intruder might be.

If you’re afraid that other machines have been compromised or are vulnerable to the same attack, you’ll probably want to disconnect as many machines as you can as a unit. This may mean taking down your connection to the Internet, if possible. If your Internet connection is managed elsewhere in your organization, you may need to detach just your portion of the network, but you’ll also need to talk to other parts of your organization as soon as possible to let them know what’s happening.

In some situations, you may want to shut down the compromised system. However, this action should be a last resort for a number of reasons:

  • It destroys information you may need.

  • You won’t be able to analyze or fix the machine while it’s down; you’ll have to disconnect it from the network eventually anyway to bring it back up again.

  • It’s even more disruptive to legitimate users than removing the network connection.

  • It protects only one machine at a time. (It’s much easier to cleanly disconnect a set of systems than to cleanly shut them down.)

Even if you’re responding to an incident that has already ended, you still might want to disconnect or shut down the system, or at least close it to users, while you analyze what happened and make any changes necessary to keep it from happening again. This will keep you from being confused by things users are doing, and it will prevent the intruder from returning before you’re done.

Analyze and Respond

Your next priority is to start to fix what’s gone wrong. The first step in actually correcting the problem is to relax, think for a while, and make sure you really understand what’s happening and what you’re dealing with. The last thing you want to do is make the situation worse by doing something rash and ill considered. Whatever corrective actions you’re contemplating, think them through carefully. Will they really solve the problem? Will they, in turn, cause other problems?

When you’re working in an unusual, high-stress situation like this, the chances increase of making a major error. Because you’re probably going to be working with system privileges (for example, working as root on a Unix system), the consequences of an error could be serious.

There are several ways you can reduce the chances of making an error. One good way is to work with a partner; each of you can check the other’s commands after they’re typed but before they’re executed. Even if you’re working alone, many people find that reading commands aloud and checking the arguments in reverse order before executing them helps avoid mistakes. Resist the temptation to try to work fast. You will go home sooner if you work slowly and carefully.

Try not to let your users get in the way of your response. You may want to give someone the specific job of dealing with user inquiries so the rest of your response team can concentrate on responding to the incident.

Also, try to keep your responders from tripping over each other. Make it clear which system managers and investigators are working on which task, so they won’t step on each other’s toes (or wind up unintentionally chasing each other as part of the investigation!).

Make “Incident in Progress” Notifications

You’re not the only person who needs to know what’s going on. A number of other people — in a number of different places — have to be kept informed.

Your own organization

Within your own organization are people who need to know that something is happening: management, users, and staff. At the very least, let them know that you are busy responding to an incident and that you may not be available to them for other matters. They usually need to know why they’re being inconvenienced and what they should do to speed recovery (even if the only thing they can do is to go away and leave you alone).

It is particularly important that management and other staff know what’s going on. Otherwise, you risk having them act in opposition to you. For instance, if you’ve disconnected the Internet connection, the chances are high that somebody’s going to notice the service outage and try to fix it. That’s a problem if it’s another staff member, but it can be a disaster if it turns into a management requirement.

If people call management to complain about some side effect of your response, and the manager they get has been briefed about what’s going on, the chances are that the manager will defend your need to make a response. At worst, the manager will make a reasoned decision about the importance of incident response versus other needs of the company. However, if the manager doesn’t know what’s going, he or she will probably respond the same way the manager would to any other network outage: “Gee, that’s terrible, we’ll fix it as soon as possible.” The manager has then promised the user something, and the chances are very small that the manager will go back on that promise. Instead, your response will be curtailed by the need to restore service as soon as possible.

Depending on the nature of your site and the incident in question, you may also need to inform your legal, audit, public relations, and security departments. You will always want to contact the security department if:

  • You want to involve law enforcement agencies.

  • You suspect an insider is involved.

  • You suspect physical access is involved.

If multiple computer facilities are at your site, you’ll need to inform the other facilities as soon as possible; they are likely sources and future targets for similar attacks.

CERT-CC or other incident response teams

If your organization is served by an incident response team such as CERT-CC, or has its own such team, let them know what’s going on and try to enlist their aid. (For instructions on how to contact CERT-CC or another response team, see Appendix A.) What steps response teams can take to help you will depend on the charter and resources of the response team. Even if they can’t help you directly, they can tell you whether the attack on your site looks as if it is part of a larger pattern of incidents. In that case, they may be able to coordinate your response with the responses of other sites.

Vendors and service providers

You might want to get in touch with your vendor support contacts or your Internet service provider(s) if you think they might be able to help or should be aware of the situation. For example, if the attackers appear to be exploiting an operating system bug, you should probably contact the vendor to see if they know about it and have a fix for it. At the very least, they’ll be able to warn other sites about the bug. Similarly, your Internet provider is unlikely to be able to do much about your immediate problem, but they may be able to warn other customers. There is also a possibility that your Internet provider has itself been compromised, in which case, they need to know immediately. Your vendors and service provider may have special contacts or procedures for security incidents that will yield much faster results than going through normal support channels.

You may get little or no visible response when you make these reports. This might be because you’re being ignored or because companies are putting self-defense before the interests of their customers. On the other hand, it’s often due to sensible precautions that are intended to make certain that problems are not publicized before fixes are available (jeopardizing places not yet under attack), that the fixes that are made are appropriate to the problem, and that attackers don’t get valuable information by pretending to be sites under attack. You might as well give your suppliers the benefit of the doubt, since it’s almost impossible to tell which of these is going on.

Other sites

Finally, if the incident appears to involve other sites — that is, if the attack appears to be coming from a particular site, or if it looks as if the attackers have gone after that site after breaking into yours — you should inform those other sites. These sites are usually easy to identify as the sources or destinations of connections. It’s often much harder to figure out how to find an actual human being with some responsibility for the computer in question, who is awake and reachable and has a common language with you.

Once again, you may get little or no apparent response for any number of different reasons, some of them annoying and reprehensible, and some of them perfectly sensible. The other site may not care whether their users are attacking you, or they may care desperately but have no way of telling you about it without revealing information to the attackers. While it’s always nice to get somebody who makes an immediate, visibly effective response and thanks you promptly for the information, don’t expect it and don’t be upset when you don’t get it.

If you don’t know who to inform, talk to your response team (or CERT-CC). They will probably either know or know how to find out, and they have experience in calling strangers to tell them they have security problems.

Snapshot the System

Another early step to take is to make a “snapshot” of each compromised system. You might do so by doing a full backup to tape or by copying the whole system to another disk. In the latter case, if your site maintains its own spare parts inventory, you might consider using one of the spares for this purpose, instead of a disk that is already in use and might itself turn out to have been compromised.

The snapshot is important for several reasons:

  • If you misdiagnose the problem or blow the recovery, you can always get back to the time of the snapshot.

  • The snapshot may be vital for investigative and legal proceedings. It lets you get on with the work of recovering the system without fear of destroying evidence.

  • You can examine the snapshot later, after you’re back in operation, to determine what happened and why.

Because the snapshot may become important for legal proceedings, you need to secure the evidence trail. Here are some guidelines:[1]

  • Uniquely identify (label) the snapshot media and put the date, time, your name, and your signature on it.

  • Write-protect the media — permanently, if possible.

  • Safeguard the media against tampering (for example, put it in a locked container) so that if and when you hand it over to law-enforcement or other authorities, you can tell them whose custody the media has been in and why you’re certain it hasn’t been tampered with since it was first created.

It’s a good idea to set aside an adequate supply of fresh media just for snapshots because you never know when you’re going to need to produce one. It’s very frustrating to respond to an incident, and be ready to do the snapshot, only to discover that the last blank tape got used for backups the day before and the new order hasn’t come in yet.

Restore and Recover

Finally, you’re at the point of actually dealing with the incident. What do you do? It depends on the circumstances. Here are some possibilities:

  • If the attacker didn’t succeed in compromising your system, you may not need to do much. You may decide not to bother reacting to casual attempts. You may also find that your incident was actually something perfectly innocent, and you don’t need to do anything at all.

  • If the attack was a particularly determined one, you may want to increase your monitoring (at least temporarily), and you’ll probably want to inform other people to watch out for future attempts.

  • If the attacker became an intruder (that is, he or she actually managed to get into your computers), you’re going to need to at least plug the hole the intruder used, and check to make certain he hasn’t damaged anything or left anything behind.

At worst, you may need to rebuild your system from scratch. Sometimes you end up doing this because the intruder damaged things, purposefully or accidentally. More often, you’ll rebuild your system because it’s the only way to ensure you have a clean system that hasn’t been booby-trapped. Most intruders start by making sure they’ll be able to get back into your system, even if you close their initial entry point. As a result, your systems may be compromised even if the intruder was present for only a short time.

Tip

Always assume that intruders have created back doors into your system so that they can get back in again easily. It’s one of the first things many intruders do when they break in to a system.

If you need to rebuild your system, first ensure that your hardware is working properly. You want to make sure it passes all relevant self-tests and diagnostics; you don’t want to restore onto a flaky system. A reinstall may reveal previously unnoticed hardware problems. For instance, a disk may have bad spots that are in unused files. When you reinstall the operating system, you will attempt to write over the bad parts, and the problem will suddenly become apparent.

Next, make sure you are using trusted media and programs, not necessarily your last backup, to restore the system. Unless you are absolutely sure that you can accurately date the first time the intruder accessed your system, you don’t know whether or not programs had already been modified at the time the backups happened. It’s often best to rebuild your system from vendor distribution media (that is, the tapes or CD-ROM your operating system release came on) and then reload only user data (not programs that multiple users share) from your backup tapes.

If you need programs you didn’t get from your vendor (for instance, packages from the Internet), then do one of the following:

  • Rebuild and reinstall these programs from a trusted backup (one you’re absolutely positive contains a clean copy).

  • Obtain and install fresh copies from the site you got the packages from in the first place.

Do not recompile software until you’ve reinstalled the operating system, including the compiler; you don’t know whether the compiler itself, and the libraries it depends on, have been compromised.

This implies that if you’re heavily customizing your system or installing a lot of extra software beyond what your vendor gives you, you need to work out a way of archiving those customizations and packages that you’re sure can’t be tampered with by an attacker. This way, you can easily restore those customizations and packages if you need to. One good way is to make a special backup tape of new software immediately after it’s installed and configured, before an attacker has a chance to modify it.

You may have programs that were locally written, and in these cases, you may not be able to find even source code that’s guaranteed to be uncontaminated. In this situation, someone — preferably the original author — will need to look through the source code. People rarely bother to modify source code, and when they do, they aren’t particularly subtle most of the time. That’s because they don’t need to be; almost nobody actually bothers to look at the source before recompiling it.

In one case, a programmer installed a back door into code he expected would run on only one machine, as a personal convenience. The program turned out to be fairly popular and was adopted in a number of different sites within his university. Years after he wrote it, and long after the original machine was running a version without the back door, he discovered that the back door was still present on all the other sites, despite the fact that it was clearly marked and commented and within the first page of code. You can’t make a comprehensive search of a large program, but you can at least avoid humiliation by looking for obvious changes.

Document the Incident

Life gets very confusing when you’re discovering, investigating, and recovering from a security incident. A good chain of communication is important in keeping people informed and preventing them from tripping over each other. Keeping a written (either hardcopy or electronic) record of your activities during the incident is also important. Such a record serves several purposes:

  • It can help keep people informed (and thereby help them to resolve the incident more quickly).

  • It tells you what you did and when, in responding, so that you can analyze your response later on (and maybe do better next time).

  • It will be vital if you intend to pursue any legal action.

From a legal standpoint, the best records are hardcopy records generated and identified at the time of occurrence. Just about anything else (particularly anything kept online) could be tampered with or falsified fairly easily — or at least a judge and jury could be convinced of that. You need to produce records on pieces of paper, label, date, and sign them. Furthermore, unless the pages are actually bound together, so that pages can’t be inserted or removed without indication, you’ll need to date and sign every page. (And you thought continuous tractor-feed paper was useless these days!)

You need to have legal documentation even if you aren’t completely certain you’re going to need it. An incident that initially looks fairly simple may turn out to be serious. Don’t assume it isn’t going to be worth bringing in the police.

For both legal and practical reasons, it’s useful to put in exact times when things occurred. Legally, this helps to show that entries were being made in order. Practically, it’s extremely helpful when you need to correlate multiple sources of information (for instance, when you need to compare your logs against event logs on computers or against somebody else’s actions).

Here are several useful documentation methods you might want to consider:

  • Notebooks — carbon copy lab notebooks are especially useful because you can write a note, tear it out and give it to someone and still have a copy of the note. Another benefit is that the pages are usually numbered, so you can determine later on whether any pages have been removed or added.

  • Terminals running with attached printers or old-fashioned printing terminals.

  • A shell running under the Unix script command, with the resulting typescript immediately printed and identified.

  • A personal computer terminal program running in “capture” mode, with the resulting typescript immediately printed and identified.

  • A microcassette recorder for verbal notes.

You will probably want to use multiple methods, one to record what’s happening online and one to record what’s happening outside of the computer. For example, you might have a typescript of the commands you were typing, but a handwritten log for phone calls.

It’s easy to decide what to record online; you simply record everything you do. Remember to use the terminal or session that’s being recorded. (With some methods, like script, you can record every session you’ve got going; just make sure you record each session in a separate file.) It’s harder to decide what to record of the events that don’t just get automatically captured. You certainly want to record at least this much:

  • Who you called, when, and why.

  • A summary of what you told them.

  • A summary of what they told you. (That summary may end up being “see above” some of the time, but you still want to be able to figure out who you were talking to, and when and why.)

  • Meetings and important decisions and actions that aren’t captured online (e.g., the time at which you disconnected the network).

In addition to the journal, a log of time spent for everyone working on an incident can be invaluable. You may need to justify some level of “loss” in order for some law enforcement agencies to be able to open an investigation, and if the intruder didn’t do any damage to the machines, the time that was spent cleaning up is the main loss.

Time logs may also be useful if you are having difficulty in convincing management that the organization needs to allocate additional resources to be prepared to deal with incidents. It’s a way of showing how much these incidents cost. It’s particularly helpful if you can show which areas could have been anticipated and mitigated by planning.

What to Do After an Incident

There are a variety of things you’ll need to take care of after you finish responding to an incident. Don’t relax just yet.

First and foremost, you want to figure out what happened and how to keep it from happening again. Now is the time to examine the snapshot you made of your system before you started the recovery process. When you’ve figured out what happened, you obviously want to take steps to keep it from happening again. You also need to think about anything you or others did during the response (for example, enabling or disabling certain software) that now needs to be undone, fixed, or documented and made permanent.

In addition to analyzing the incident, this is the time to analyze your response to the incident. In this phase, it’s important to concentrate on critiquing the response, not on assigning blame for the original incident. Don’t be confrontational but talk to any folks involved with, or affected by, the response. With them, try to determine what you did right, what you did wrong, what worked and didn’t work, what other tools or resources would have helped, how to respond better next time, and what you’ve all learned from the experience.

If you made “incident in progress” notifications to various people and organizations, now is probably the time to tell them that the incident is over. Be sure to follow up with appropriate information about what happened, how you responded, and how you plan to keep it from happening again.

Pursuing and Capturing the Intruder

If you discover a security incident — particularly one in progress — you’re going to be tempted to go gunning for the bad guys who are invading your system.

Going after the bad guys has a certain emotional appeal, but it’s generally not very practical. There are a variety of approaches you can take, but there are also a variety of technical and legal hurdles.

For an appreciation of the problems involved in hunting down an intruder, see Cliff Stoll’s book, The Cuckoo’s Egg (Doubleday, 1989). In the late 1980s, Cliff was a system manager at Lawrence Berkeley Labs. While tracking down a minor inconsistency in the accounting system LBL used for computer time billing, he discovered evidence that an intruder had broken in over the Internet. He spent many months on an odyssey trying to chase down the attackers. Although Cliff succeeded admirably (and wrote an entertaining and useful book to boot), few of us are going to be able to emulate his feats. Most sites just don’t have the time and resources to track their attackers the way Cliff did; most are going to have to be satisfied with simply getting them off their systems.

Tracking down intruders for legal action is always a long and involved process. Having it take months is actually unusually fast! In general, the process of prosecuting an intruder, or group of intruders, takes a year or more. Be prepared for a long and frustrating process. It can be extremely educational—you may learn more about the legal system and the phone system than you really wanted to know. It also can be a complete let-down; you call three law-enforcement agencies, two of which can’t figure out how to do anything about a computer break-in, and the third of which says something noncommittal and takes down contact information. This might mean, as you will probably suspect, that they don’t care and are going to ignore you, or it might mean that they are already nine months into an investigation of this intruder with the help of other sites and don’t want to give you any information that might somehow get back to the intruder. You might find out when they call back and ask you to testify or to estimate damages. You might find out in the newspaper. It is worth doing your best to report these incidents anyway, but don’t expect much from the experience.

There are two main problems in tracking down intruders: one is technical and the other is legal. The first problem is that tracking an attack back to its ultimate source is usually technically difficult. It’s usually easy to tell what site an attack came from (simply by looking at the IP addresses the attacker’s packets are coming from), but once you find the apparent source of the attack, you usually find out that the attack isn’t really being carried out by a user from that site. Instead, it’s very likely that the site has itself been broken into, and it’s being used as a base by the person who attacked you.

If that site traces its own break-in, it will usually discover the same thing: the attacker isn’t wherever the attacks appear to be coming from. Moreover, where the attacks appear to be coming from is simply another site in the chain that’s been broken into. Each link in the chain between the attacker and you involves more sites and more people. There is a practical limit to how far back you can trace someone in a reasonable period of time. Eventually, you’re probably going to run into a site in the chain that you can’t get in touch with, or that doesn’t have the time or expertise to pursue the matter, or that simply doesn’t care about the attack or about you. As Figure 27.1 illustrates, these are many links in any network connection.

A network connection has many links
Figure 27.1. A network connection has many links

Furthermore, at some link in the chain, you are likely to discover that the attacker is coming in over a telephone line, and tracing telephone calls involves whole new realms of technical and legal problems.

You may well find the same attacker coming in from multiple sites. In one incident, responders kept correcting each other, until they realized that nobody was confused; one set of people was referring to SFU (Simon Fraser University, in Canada) and the other to FSU (Florida State University). The similarity of the abbreviations had momentarily concealed the fact that two separate sites, physically distant from each other, were being concurrently used by the same attacker. The attacker had not started from either of them, and when SFU and FSU closed down access, identical attacks starting occurring from other sites.

Be wary when using email or voicemail to contact administrators at other sites when tracking down an intruder. How do you know it’s really the administrator that’s receiving and responding to your messages and not the intruder? Even if you’re sure that you’re talking to the administrator of a site, maybe the intruder is the administrator.

The second problem is legal. You might contemplate leaving your site “open”, even after you’re aware of the attack, in hopes of tracking down the attackers while they’re using your site. This may seem like a clever idea; after all, if you shut down your system or disconnect it from the Internet, the attackers will know they’ve been discovered, and it will be much harder for you to track them down.

The problem is this: leaving your site open doesn’t just risk further loss or damage at your own site. What the attacker is probably doing at your site is using it as a base to attack other sites. If you’re aware of it, and do nothing to prevent it, those other sites might have grounds to sue you and your organization for negligence or for aiding the attacker.

If you’re dealing with someone who’s attacking your system unsuccessfully, there’s less risk. It’s polite to inform the site that the attacker is apparently coming from, so the system administrators there can do their own checking. It also lets you straighten out people who aren’t really trying to break in, but are just very confused. For example, attempts to log in as “anonymous”, even extremely persistent ones, usually come from people who have confused FTP with Telnet and simply need better advice. Most sites are grateful to be told that attacks are coming from them, but don’t be surprised if universities seem somewhat bored to hear the news. Although they will usually follow up, large universities see these incidents all the time.

You’ll also find that the occasional site is uninterested, hostile, or incapable of figuring out what you’re talking about, and it’s not worth your time to worry about it unless the attacks from the site are persistent, determined, and technically competent enough to have a chance of succeeding. In this case, you should enlist the assistance of a response team.

Planning Your Response

All of the actions we’ve outlined in the previous sections sound fine in theory, but you can’t actually do any of them reliably without an incident response plan. You may personally be able to mount a sensible response to an attack, but you aren’t necessarily going to be the person who discovers one. You may not even be available at the time. How will your organization react if someone attacks your system? Unless you have an incident response plan in place, the people involved will waste valuable time trying to figure out what to do first.

If you already have a plan in place for disaster or emergency response of any kind (e.g., fire, earthquake, electrical problems), you’re probably not going to have to change it significantly to meet your security needs. If you don’t have such a plan already, you can probably use your security incident response plan with only minor modifications for most emergencies.

Your incident response plan need not be an elaborate document, but you need to have something, even if it’s only an email message that records and confirms the details you’ve all worked out over lunch at the local sushi bar. You’ll be better off than many sites even if you do nothing more than think about the issues and discuss them with the relevant people.

What’s in your plan?

The response plan is primarily concerned with two issues: authority and communication. For each part of the incident response, the plan should say who’s in charge and who they’re supposed to talk to. Although you’ll specify a few steps people will take, incidents vary so much that the response plan mostly specifies who’s going to make decisions, and who they’re going to contact after they’ve decided — not what they’re going to decide. This section summarizes the different parts of a response plan.

Planning for Detection

An incident starts when somebody detects an intruder or attacker. That person might be a system administrator, but more often it’s someone with no official responsibility. If you’ve properly educated the people who use your computers, they know they’re supposed to report weird events. Somebody then needs to sort run-of-the-mill peculiarities from a security incident in progress. Who are the users going to report to? Who are those people going to report to if they’re still not sure? What are they authorized to do if they are sure?

The two cases you really want to plan for are these:

  • Somebody notices a real security incident in progress at 3 A.M.

  • Somebody notices one of your perfectly legitimate users who happens to be doing vital work from halfway across the globe at 3 A.M. local time. (In Australia, where the user is consulting at the moment, it’s a reasonable 5 P.M.)

In the first case, you need a procedure that is going to reliably start a full incident response immediately. Don’t waste any time. It’s going to be embarrassing and expensive if you don’t actually get around to doing anything until your senior security person arrives in the next morning, takes in enough caffeine to become able to think, and gets around to looking at some report. (And that’s if there is a report in the first place; without a response plan, it may be weeks before anyone actually tells someone who can begin to do something about the situation.)

In the second case, it’s going to be embarrassing and expensive if you disconnect the network and get five people out of bed, all to prevent somebody from doing the work they’re paid to do.

Either way, it’s not a decision you probably want made by a night operator, or by a user acting alone because he or she can’t figure out how to call somebody who knows how to tell a real incident from a false alarm.

At a small site, you might want to simply post a number that users can call to get help outside of office hours (for instance, a pager number). Users might be encouraged to shut down personal machines if they suspect an attack and know how to shut the machine down gracefully. You want to be very cautious about this, however, because an ungraceful shutdown, particularly of a multi-user machine, may be more damaging than an intruder.

At a larger site, one that has on-site support after hours, you should instruct the on-site support people to call a senior person if they see a possible security incident. They should be told explicitly not to do anything more than that unless circumstances are extreme, but to keep trying to contact senior personnel until they get somebody who can take a look at what’s going on.

Planning for Evaluation of the Incident

Who’s going to decide that you don’t just have a suspicious situation — you actually have a security problem? You need to designate one specific person who will have responsibility for making the important decisions. It’s tempting to pick one specific person in advance and put his or her name in your plan. But, what if that person isn’t available in the event of an actual incident? Who, then, will have the responsibility?

Teamwork is great, but emergencies call for leadership. You don’t want to have everybody doing their own thing and nobody in charge, and you certainly can’t afford to stand around arguing about it. If your senior technical person is absent, do you want someone less senior but more technical to do the evaluation, or do you want someone more senior but less technical? How much time are you going to spend searching for the senior technical person when you have an emergency to deal with, before proceeding to your next candidate for the hot seat?

At a small site, you may not have a lot of options; if only one person has the skills necessary to do something about an attack, your policy will simply list that person as the one in charge in case of a security incident. If that person is unavailable, authority should go to somebody levelheaded and calm who can take stopgap actions and arrange for assistance (for example, from a relevant response team). In this situation, technical skills would be nice, but resourcefulness and calm are more important.

At a larger site, probably more than one person could be in charge. Your plan may want to say that the most senior will be in charge by default or that whoever is specified as being on call will be in charge. Either way, the plan should state that if the default person in charge is unavailable, the first of the other possible people to respond is in charge. Specifying what order they’re going to be contacted in is probably overkill; let whoever is trying to reach these people use his or her knowledge of the situation. If none of those people are available, you’ll usually want to work up the organizational hierarchy rather than down. (A manager, particularly a technical one, is probably better equipped to cope than an operator.)

In a small organization, you will pick your fallback candidates by name. In a large one, you will usually specify fallbacks by job title. If job title is your criterion, it’s important to base your decision on the characteristics of the job, not of the person currently in it. Don’t write into your plan that the janitor should decide, on the theory that the current janitor also is the most sensible and technical of those who aren’t system administrators. The next janitor might be an airhead with a mop.

Planning for Disconnecting or Shutting Down Machines

Your response plan needs to specify what kind of situation warrants disconnecting or shutting down, and who can make the decision to do it. Most importantly, as we’ve discussed in “Pursuing and Capturing the Intruder”, are you ever willing to allow a known intruder to remain connected to your systems? If you’re not, are you going to take down the system, or are you going to disconnect from the network altogether?

If you are at a site with multiple computer facilities, do you want to take the entire site off the Internet if one facility has been compromised, or is it better (or even possible) to take just that facility off the Internet?

At most sites, the reasonable plan is to disconnect the site as a whole from the network as soon as you know for sure that you have an intruder connected to your systems. You may have a myriad of internal connections, with a triply redundant, diversely cabled, UPS-protected routing mesh, which can make “disconnecting” a daunting prospect (the system keeps “fixing” itself). On the other hand, you probably have only one (or a small handful) of connections to the outside world, which can be more easily severed.

Your plan needs to say how to disconnect the network, and how the machines should be shut down. Be very careful about this. You do not want to tell people to respond to a mildly suspicious act by hitting the circuit breakers and powering off every machine in the machine room. On the other hand, if an intruder is currently removing all the files on the machine, you don’t want them to give that intruder a 15-minute warning for a graceful shutdown.

This is one case in which you need clear, security-specific instructions in your plan. Here’s what we recommend you do:

  • In most security emergencies, the correct way to shut down the machine is to do an immediate but graceful shutdown, with no explanations or warnings sent. Your plan should state that and specify the appropriate commands to issue.

  • If the intruder is actively destroying things, you want people to shut the machine down by the fastest method possible. If they are physically near the machine, cutting off the power to the machine or the disk drive is completely appropriate, despite the damage it may cause. This implies that the relevant power switches must be easy to locate; a master switch for each machine is a good idea.

Whoever is going to disconnect the network needs to know how to do that. The safest and easiest way often is to unplug cables and clean up the side effects afterwards. With networks, this tends to result in voluminous error messages but to cause no actual damage. You do have to unplug the relevant cables, however, and the voluminous error messages may make it difficult to determine whether or not the cables that were unplugged were actually the correct ones. Your plan needs to tell people what to unplug and how to make things functional afterwards.

Planning for Notification of People Who Need to Know

Your incident response plan needs to specify who you’re going to notify, who’s going to do the notification, when they’re going to do it, and what method they’re going to use. As we described earlier in this chapter, you may need to notify:

  • People within your own organization

  • CERT-CC or other incident response teams

  • Vendors and service providers

  • People at other sites

Your own organization

To start with, you need to notify the people who are going to be involved in the response. You’ll have an urgent need to get hold of them, so you need telephone and pager numbers. Be sure you have all the relevant phone numbers; in addition to home and work numbers, check to see if people have mobile phones at which they might be reached. This list includes anybody who manages computers within your site and anybody who manages those people, plus anybody else who might be needed to provide resources (to sign off on emergency purchases or to unlock doors, for example). Ideally, the list — or at least the key portions of it — should be reduced down so it’s small enough to carry easily (for example, it might be laser-printed onto business card-sized stock). Obviously, the list isn’t much use unless it’s kept up to date.

If many people must be notified, you may wish to use a phone tree or an alert tree. In such a tree, shown in Figure 27.2, each person notifies two or three other people; it is a geometric progression, so a large number of people can be rapidly notified with relatively little work to any one person. Everybody should have a copy of the entire tree, so that if people are unavailable, their calls can be taken over by someone else (usually the person above them on the tree). It’s best to set it up so that as many calls as possible are toll-free, and so that people are notifying other people they know relatively well (which increases their chances of knowing how to get through). There’s no need for an alert tree to reflect an organizational chart or a chain of command.

An alert tree
Figure 27.2. An alert tree

Next, you’re going to notify other people within your organization who need to know, starting with the users of your computer facility. For that, you’ll use whatever your organization normally uses for relatively urgent notifications to everybody, whether that’s memos or electronic mail. Your plan should specify how to do it (system administrators rarely send memos to all personnel and may not know how).

Your plan should also show a sample notification message for the users of your systems, which can sometimes be tricky. Your message needs to contain enough information so that legitimate users understand what’s happening. They need to know:

  • What has been taken out of service

  • Why you’re making their lives miserable

  • Exactly which things that they normally do aren’t going to work

  • When service will be restored

  • What they’re supposed to do (including leave you alone so that you can concentrate on the response)

  • That you realize you’re making life unpleasant for them

  • That you’re doing everything possible to improve matters

  • That you’re going to tell them the details later

Things that are obvious to you may not be obvious to your users (e.g., they might not even understand why it’s so bad to have an intruder). Writing an appropriate message (see Figure 27.3) is not easy, particularly if you’re busy and tired.

A notification message
Figure 27.3. A notification message

For the remaining people within your organization — people from other computer facilities, legal, audit, public relations, or security — the plan needs to specify who gets notified. Do you need to call the legal department? If so, who should you talk to? Who are the administrators for other sites within your organization? During the Morris worm incident in 1988, at least one large government lab was reduced to having the guards hand out flyers at the gate to people as they came to work, asking “Are you a system administrator?” because they had no idea who all the system administrators were, much less how to contact them.

Think about how you are going to send your message. If you send it via electronic mail, remember that the intruder may see it. Even if you know that your own systems are clean, don’t assume that other people’s are. Don’t say anything in your message that you don’t want the attacker to know. Even better yet, use a telephone.

Some sites use a simple code phrase to announce a system attack that they can include in electronic mail. This can rapidly degenerate into bad spy fiction, but if you have an agreed-upon phrase that isn’t going to alert an intruder (and isn’t going to cause people who don’t know it or don’t remember it to give the game away by asking what on earth you’re talking about), it can be effective. Something like “We’re having a pizza party; call 3-4357 to RSVP” should serve the purpose.

Should you contact your organization’s security department? At some organizations, the security department is responsible only for physical security. You’ll want to have a contact number for them in case you need doors unlocked, for example, but they are unlikely to be trained in helping with an emergency of this kind, so you probably won’t need to notify them routinely of every computer security incident. However, if a group within your organization is responsible for computer security, you are probably required to notify that group. Find out ahead of time when the members of the group want to be notified and how, and put that information in the plan. Even if that group cannot help you respond to your particular type of incident (perhaps because they may be personal computer specialists or government security specialists), it’s advisable to at least brief them on the incident after you have finished responding to it.

CERT-CC and other incident response teams

Your plan should also specify what emergency response team, if any, you’re served by and how to contact them. CERT-CC and many teams in the FIRST have 24-hour numbers, and they prefer to be called immediately if a security incident occurs.

Vendors and service providers

Your plan should also contain the contact numbers for your vendors and Internet service providers. These people probably do not need to be called immediately, unless you need their help. However, if you have any reason to suspect that your Internet provider itself has been compromised, you should contact them immediately.

Many vendors and service providers have special contact procedures for security incidents. Using these procedures will yield much faster results than going through normal support channels. Be sure to research these procedures ahead of time and include the necessary information in your response plan.

Other sites

You will not ordinarily need to talk to other sites as part of the immediate incident response. Instead, you’ll call them after the immediate emergency is over, when you have time to work without needing everything written down in the plan. In addition, no plan could cover all the information needed to find out what other sites were involved and to contact them. Therefore, your plan doesn’t need to say much about informing other sites.

If you are providing Internet service for other sites, however, or have special network connections to other sites, you should have contact information in the plan and should contact them promptly. They need to know what happened to their service and to check that the attacker didn’t reach them through your site.

Planning for Snapshots

Your incident response plan should specify how you’re going to do snapshots of the compromised system. Make sure that your plan contains the answers to these questions:

  • Where are the necessary supplies and what program are you going to use?

  • How should the snapshot be labeled and where should it be stored?

  • How should snapshots be preserved against tampering, for possible later use in legal proceedings?

Planning for Restoration and Recovery

Different incidents are going to require different amounts of recovery. Your response plan should provide some general guidelines.

Reinstalling an operating system from scratch is time consuming, unpleasant, and often exposes underlying problems. For example, you may discover that you no longer know where some of your programs came from. For this reason, people are extremely reluctant to do it. Unless your incident response plan says explicitly that they need to reinstall the operating system, they probably won’t. The problem is, this leads to situations where you have to get rid of the same intruder over and over again because the system hasn’t been properly cleaned up. Your response plan should specify what’s acceptable proof that the operating system hasn’t been tampered with (for instance, a comparison against cryptographic checksums of an operating system known to be uncompromised). If you don’t have those tools, which are discussed in Chapter 10, or if you can’t pass the inspection, then you must install a clean operating system, and the plan should say so.

The plan should also provide the information needed to reinstall the operating system; for example:

  • Where are the distribution media kept?

  • How do you find out how to install the operating system?

  • Where are the backups, and how do you restore from them?

  • Where are the records that will let you reconstruct third-party or locally written programs?

Planning for Documentation

Your plan should include the basic instructions on what documentation methods you intend to use and where to find the supplies. If you might pursue legal action, your plan should also include the instructions on dating, labeling, signing, and protecting the documentation. Remember that you aren’t likely to know when you start out whether or not there will be legal action, so you will always need to document if you ever want to be able to take legal steps; this is not something you can go back and “fix” later on.

Periodic Review of Plans

However solid your security incident response plans may seem to be, make sure to review them periodically. Changes — in requirements, priorities, personnel, systems, data, and other resources — are inevitable, and you need to be sure that your response plans keep up with these changes. The right question to ask about each item isn’t “Has it changed?,” but "How has it changed?”

A good time to review your incident response plan is after a live drill, which may have exposed weaknesses or problems in the plan. (See Section 27.5.7 at the end of this chapter.) For example, a live drill may uncover any of the following:

  • That you’ve changed all your storage since the plan was written

  • That you can’t actually restore your operating system from scratch

  • That your plan relies on the ability to use the network to reach external sites, but at the same time instructs you to disconnect the network

Being Prepared

The incident response plan is not the only thing that you need to have ready in advance. You need to set up a number of practices and procedures so that you’ll be able to respond quickly and effectively when an incident occurs. Most of these procedures are general good practice; some of them are aimed at letting you recover from any kind of disaster; and a few are specific to security incidents.

Backing Up Your Filesystems

Your filesystem backups are probably the single most important part of your recovery plan. Before you do anything else (including writing your response plan), make sure that your site’s backup plan is a solid one and that it works. Don’t assume that it’s OK just because you haven’t had a problem yet. It is entirely possible to go for months without noticing that you have no backups at all, and it may take you years to notice that they’re only partially broken. Unfortunately, when you do notice, it’s often when you need the backups most, and the outcome is likely to be disastrous.

Backups are vital for two reasons:

  • If your site suffers serious damage and you have to restore your systems from scratch, you will need these backups.

  • If you aren’t sure of the extent of the damage, backups will help you to determine what changes were made to a system and when.

Every organization needs a backup plan and not just for security reasons. If you don’t have one, that’s probably a sign that your current backup system is not OK. When you are doing incident response planning, however, pay special attention to your backup plan.

For your security-critical systems (e.g., bastion hosts and servers), you might want to consider keeping your monthly or weekly backups indefinitely, rather than recycling them as you would your regular systems. If an incident does occur, you can use this archive of backup tapes to recover a “snapshot” of the system as of any of the dates of the backups. Snapshots of this kind can be helpful in investigating security incidents. For example, if you find that a program has been modified, going back through the snapshots will tell you approximately when the modification took place. That may tell you when the break-in occurred; if the modification happened before the break-in, it may tell you that it was an accident and not part of the incident at all.

If you’re not sure whether or not you should be worried, try testing your backup system. Play around and see what you can restore. Ask these questions:

  • Can you restore files from all of your tapes?

  • Can you do a restore of an entire filesystem?

  • If you pick a specific file, can you figure out how to restore it?

  • If you have a corrupt file and want a version from before it was corrupted, can you do that?

  • If all of your disks died (or were trashed by an attacker) simultaneously, would you be able to rebuild your computer facility?

Even the best backup system won’t work if the backup images aren’t safeguarded. Don’t rely on online backups and keep your media in a secure place separate from the data they’re backing up.

Tip

The design of backup systems is outside the scope of this book. This description, along with the description in Chapter 26, provides only a summary. If you’re uncertain about your backup system, you’ll want to look at a general system administration reference. See Appendix A for complete information on additional resources.

Labeling and Diagramming Your System

As organizations grow, they acquire hardware; they configure networking in different ways; and they add or change equipment of various kinds. Usually only one or two people really know what a site’s systems look like in any detail.

Information about system configuration may be crucial to investigating and controlling a security incident. While you may know exactly how everything works and fits together at your site, you may not be the person who has to respond to the incident. What if you’re on vacation? Think about what your managers or coworkers would need to know about each system in order to respond effectively to an incident involving that system.

Labels and diagrams are crucial in an emergency. System labels should indicate what a system is, what it does, what its physical configuration is (how much disk space, how much memory, etc.), and who is responsible for it. They should be attached firmly to the correct systems and easily legible. Use large type sizes and put at least minimal labels on the back as well as the front (the front of a machine may have more flat space, but you’re probably going to be looking at it from behind when you’re trying to work on it). Network diagrams should show how the various systems are connected, both physically and logically, as well as things like what kind of packet filtering is done where.

Be sure that labels are kept up to date as you move systems around; wrong labels are worse than no labels at all. It’s particularly important to label racked equipment and equipment with widely scattered pieces. There’s nothing more frustrating than turning off all the equipment in a rack, only to discover that some of it was actually part of the computer in the next rack over, which you meant to leave running.

Information that’s easily available when machines are working normally may be impossible to find if machines are not working. For example, you’ll need disk partition tables written down in order to reformat and reinstall disks, and you may need a printed copy of the host table in order to configure machines as they’re brought back up.

Keeping Secured Checksums

Once you’ve had a break-in, you need to know what’s been changed on your systems. The standard tools that come with your operating system won’t tell you; intruders can fake modification dates and match the trivial checksums most operating systems provide. You will need to install a cryptographic checksumming program (these are discussed in Chapter 10), make checksums of important files, and store them where an intruder can’t modify them (which generally means somewhere offline). You may not need to checksum every system separately if they’re all running the same release of the same operating system, although you should make sure that the checksum program is available on all your systems.

Keeping Activity Logs

An activity log is a record of any changes that have been made to a system, both before an incident and during the response to an incident. Normally, you’ll use an activity log to list programs you’ve installed, configuration files you’ve modified, or peripherals you’ve added. During an incident, you’ll be doing a lot more logging.

What is the purpose of an activity log? A log allows you to redo the changes if you have to rebuild the system. It also lets you determine whether any of the changes affect the incident or the response. Without a log, you may find mystery programs; you don’t know where they came from and what they were supposed to do, so you can’t tell whether or not the intruder installed them, if they still work the way they’re supposed to, or how to rebuild them. Figure 27.4 shows a sampling of routine log entries and incident log entries.

Activity logs
Figure 27.4. Activity logs

There are a variety of easy ways to keep activity logs, both electronic and manual; email, notebooks, and tape recorders can also be used. Some are better for routine logs (those that record your activities before an incident occurs). Others may be more appropriate for incident logs (those that keep track of your activities during an incident).

Email to an appropriate staff alias that also keeps a record of all messages is probably the simplest approach to keeping an activity log. Not only will email keep a permanent record of system changes, but it has the side benefit of letting everybody else know what’s going on as the changes are made. The email approach is good for routine logs, whereas manual methods are likely to work more reliably during an incident. During an actual security incident, your email system may be down, so any messages generated during the response may be lost. You may also be unable to reach existing online logs during an incident, so keep a printed copy of these email messages up to date in a binder somewhere.

Notebooks make a good incident log, but people must be disciplined enough to use them. For routine logs, notebooks may not be convenient because they may not be physically accessible when people actually make changes to the system. Some sites use a combination of electronic and paper logs for routine logs, with a paper logbook kept in the machine room for notes. This works as long as it’s clear which things should be logged where; having two sets of logs to keep track of can be confusing.

Pocket tape recorders make good incident logs, although they require that somebody transcribe them later on. They’re not reasonable for routine logging.

Keeping a Cache of Tools and Supplies

Well before a security incident, collect the tools and supplies that you are likely to need during that incident. You don’t want to be running around, begging and borrowing, when the clock is ticking.

Here are some of the things you’ll need in order to respond appropriately to an incident. (Actually, you ought to have these things around at all times; they come in handy in all sorts of disasters.)

  • Blank backup tapes and possibly spare disks as well.

  • Basic tools; you’ll need them if you disconnect your system from the external network, or if you need to rewire the internal network to disconnect compromised hosts. Make sure you have a ladder if your site uses in-ceiling cabling or tall equipment racks.

  • Spare networking equipment — at least cables.

Set aside basic supplies (e.g., a full backup’s worth of media, networking cables, the most critical tools, notebooks or tape recorders for incident logs) in a cache to be used only in case of disaster. This cache should be separate from your normal stock of spare parts and tools.

Testing the Reload of the Operating System

If a serious security incident occurs, you may need to restore your system from backups. In this case, you will need to load a minimal operating system before you can load the backups. Are you equipped to do this?

Make sure that you:

  • Understand your system’s operating system installation procedures

  • Understand the procedures for restoring from backups

  • Have all the materials (distribution media, manuals, etc.) available to restore the system

  • Test your reload plans and procedures before you really need them

Testing your ability to reload the operating system is a good idea, and too few organizations ever do it. You can learn a lot by doing this. While you’re trying to reload a dead system is not a good time to discover that you’ve got a bad copy of the distribution media. It’s also not a good time to discover that the people who have to do the reload can’t figure out how to do it. The best way to test is to designate the least experienced people who might have to do the work, and let them try out the reload well ahead of time.

Most organizations find that the first time they try to reinstall the operating system and restore on a completely blank disk, the operation fails. This can happen for a number of reasons, although the usual reason is a failure in the design of the backup system. One site found that people were doing their backups with a program that wasn’t distributed with the operating system, so they couldn’t restore from a fresh operating system installation. (After that, they made a tape of the restore program using the standard operating system tools; they could then load the standard operating system, recover their custom restore program, and reload their data from backups.)

Doing Drills

Don’t assume that responding to a security incident will come naturally. Like everything else, such a response benefits from practice. Test your own organization’s ability to respond to an incident by running occasional drills.

There are two basic types of drills:

  • In a paper (or “tabletop”) drill, you gather all the relevant people in a conference room (or over pizza at your local hangout), outline a hypothetical problem, and work through the consequences and recovery procedures. It’s important to go through all the details, step by step, to expose any missing pieces or misunderstandings.

  • In a live drill, you actually carry out a response and recovery procedure. A live drill can be performed, with appropriate notice to users, during scheduled system downtimes.

You might also test only parts of your response. For example, before configuring a new machine, use it to test your recovery procedures by recovering an existing machine onto it. If you have down time scheduled for your facility, you may be able to use it to test what happens when you disconnect from the network. Run your checksum comparison program before and after you install changes to the operating system to see what changes it catches when you think everything’s the same, and what it does about the things you know have changed. Coordinate with another site to see what messages are logged when various types of attacks occur (pick someone you know and trust and who’ll reliably tell you exactly what they did, or do it yourself). Try taking down all of your central machines at the same time and see whether they’ll all come back up in this situation. (Do this when you have a few hours to spare; if it doesn’t work, it often takes a while to figure out how to coax the machines past their interdependencies.)

This is all a lot of trouble, but a certain amount of perverse amusement can be had by playing around with fictitious disasters, and it’s much less stressful than having to improvise in a real disaster.



[1] See Computer Crime: A Crimefighter’s Handbook, by David Icove, Karl Seger, and William VonStorch (O’Reilly & Associates, 1995), for a detailed discussion of labeling and protecting evidence.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.58.150.59