CHAPTER 56

COMPUTER SECURITY INCIDENT RESPONSE TEAMS1

Michael Miora, M. E. Kabay, and Bernie Cowens

56.1 OVERVIEW

56.1.1 Description

56.1.2 Purpose

56.1.3 History and Background

56.1.4 Types of Teams

56.2 PLANNING THE TEAM

56.2.1 Mission and Charter

56.2.2 Establishing Policies and Procedures

56.2.3 Interaction with Outside Agencies and Other Resources

56.2.4 Establish Baselines

56.3 SELECTING AND BUILDING THE TEAM

56.3.1 Staffing

56.3.2 Involve Legal Staff

56.4 PRINCIPLES UNDERLYING EFFECTIVE RESPONSE TO COMPUTER SECURITY INCIDENTS

56.4.1 Baseline Assumptions

56.4.2 Triage

56.4.3 Technical Expertise

56.4.4 Training

56.4.5 Tracking Incidents

56.4.6 Telephone Hotline

56.5 RESPONDING TO COMPUTER EMERGENCIES

56.5.1 Observe and Evaluate

56.5.2 Begin Notification

56.5.3 Set Up Communications

56.5.4 Contain

56.5.5 Identify

56.5.6 Record

56.5.7 Return to Operations

56.5.8 Document and Review

56.5.9 Involving Law Enforcement

56.5.10 Need to Know

56.6 MANAGING THE CSIRT

56.6.1 Professionalism

56.6.2 Setting the Rules for Triage

56.6.3 Triage, Process, and Social Engineering

56.6.4 Avoiding Burnout

56.6.5 Many Types of Productive Work

56.6.6 Setting an Example

56.6.7 Notes on Shiftwork

56.6.8 Role of Public Affairs

56.6.9 Importance of Forensic Awareness

56.7 POSTINCIDENT ACTIVITIES

56.7.1 Postmortem

56.7.2 Continuous Process Improvement: Sharing Knowledge within the Organization

56.7.3 Sharing Knowledge with the Security Community

56.8 CONCLUDING REMARKS

56.9 FURTHER READING

56.10 NOTES

56.1 OVERVIEW.

No matter how good one's security, at some point a security measure will fail. Knowing that helps organizations to plan for security in depth, so that a single point of failure does not necessarily result in catastrophe. Furthermore, instead of trying to invent a response when every second counts, it makes sense to have a competent team in place, trained, and ready to act. The value of time is not constant. Spending an hour or a day planning, so that an emergency response is shortened by a few seconds, may save a life or prevent a business disaster.

An essential element of any effective information security program today is the ability to respond to computer emergencies. Although many organizations have some form of intrusion detection in place, far too few take full advantage of the capabilities those systems offer. Fewer still consistently monitor the data available to them from automated intrusion detection systems, let alone respond to what they see.

The key is to make beneficial use of the knowledge that something has happened, that something is about to happen, or that something is perhaps amiss. Intrusion detection systems can be costly to implement and maintain. It therefore makes little business sense to go to the trouble of implementing an intrusion detection capability if there is not, at the same time, a way to make use of the data produced by these systems.

Computer emergency quick-response teams are generally called computer security incident response teams (CSIRTs, the abbreviation used in this chapter) or computer incident response teams (CIRTs). Sometimes one sees the term “computer emergency response team” (CERT), but that term and acronym are increasingly reserved for the Computer Emergency Response Team Coordination Center (CERT/CC® ) at the Software Engineering Institute of Carnegie Mellon University, as explained in Section 56.1.3 of this chapter.

CSIRTs can provide organizations with a measurable return on their investment in computer security mechanisms and intrusion detection systems. Intrusion detection can indicate that something occurred; CSIRTs can do something about that occurrence. Often their value to an organization can be felt in more subtle ways as well. Many times computer emergencies and incidents cast an organization in an unfavorable light, and they can erode confidence in that organization. Efficient handling of computer emergencies can lessen the erosion of confidence, can help speed the organization's recovery, and in some cases can help restore its image. In addition, CSIRT postmortems (see Section 56.7) can provide information for process improvement (as discussed in Section 56.7.2).

When an incident occurs, the intrusion detection system makes us aware of the incident in one manner or another. We make use of this knowledge by responding to the situation appropriately. “Appropriately” can mean something different in different situations. Therefore, a well-trained, confident, authoritative CSIRT is essential.

Intrusion detection systems are not the only means by which we learn about incidents. In a sense, every component of a system and every person who interacts with the system forms a part of the overall defense and detection system. End users are often the first to notice that something is different. They may not recognize a particular difference as an incident; however, proper awareness and training will encourage them to report such situations to those who can make a determination and act on the information.

56.1.1 Description.

CSIRTs are multifaceted, multitalented groups of individuals who are specially trained and equipped to respond quickly and effectively to computer emergencies.

CSIRTs come in a variety of forms and compositions. For example, while some teams are static, established groups, others are assembled dynamically to fit a specific mission or to deal with specific emergencies. Often the most effective teams are characterized as a mixture of these two approaches. These teams generally have a standing core membership made up of both technical and nontechnical members. When a situation arises that must be addressed by the CSIRT, additional members with specific skills are added to meet the requirements of handling the incident in progress. Once the incident is resolved, the team reverts to its core membership status.

56.1.2 Purpose.

CSIRTs provide the first reaction to an incident. Their immediate goal should be to take control of a situation in order to contain the scope of a potential compromise, to conduct damage control, and to prevent the possible spread of a compromise to adjacent systems. Containing the scope of compromise is also synonymous with preventing or reducing loss.

Maintaining a dedicated CSIRT at the ready, 24 hours a day is a costly proposition from many perspectives. Virtually any organization today, no matter its mission, will be hard pressed to justify funding such a team only to have the team stand by awaiting an emergency. The cost of maintaining a team of highly trained resources, with only emergency response roles, is a most difficult issue to overcome. Therefore, it is important to make use of team members and their skills during nonemergency periods. In many organizations, the teams have important security and awareness roles as integral parts of their charters. Carefully selected, these additional roles can benefit team readiness while at the same time providing tangible and often visible value to the organization.

For example, some CSIRTs spend their nonemergency days and nights monitoring security issues and developments for the latest trends, threats, and countermeasures. They analyze threat data and prepare reports for various levels of the organization on such topics as virus protection, password security, and emerging technology. Members of the team spend a significant amount of time and effort developing and maintaining leading-edge technical skills. They hone their response skills and procedures through continuous training. Training often is conducted in the classroom and through dry runs using a variety of response scenarios.

Often, response teams provide and maintain awareness programs for the organization. This serves several purposes. First, awareness programs benefit an organization by pointing out risks and ways to avoid them. Next, delivering awareness programs makes the team members more visible. As a result, should something unusual or out of the ordinary occur, members of the organization are not only more likely to notice it, they are also more likely to report the information, so that it winds up in the hands of the response team.

The teams respond to emergencies or incidents. An incident does not always indicate something unwanted; it also can be something that is merely unexplained, or out of the ordinary. A response acts not only to defend, or to fight back, or to prevent further damage, but also to discover more information or to verify facts—in essence, it is part investigation and part education. To keep in step with the rapid pace of change in technology, quick response teams must be learning constantly. These teams should strive to remain abreast of each new development and technology that impacts, or has the potential to impact, the systems under their care. Therefore, responding to incidents, whether actual attacks or benign anomalies, should be seen as opportunities to sharpen the CSIRT's skills. This need is also served by the additional responsibilities the team holds in nonemergency times, typically including ongoing security research and evaluation.

If locks and other preventive measures were foolproof, intrusion detection and incident response would be unnecessary. Banks put huge vault doors, time locks, and other seemingly impenetrable defenses into their buildings. But they recognize that these measures are insufficient to prevent completely any loss of their money or other valuables. So they also install alarm systems to detect when one of the defensive barriers has been breached. But that knowledge is of little value if no one hears the alarm or, if having heard it, does not act on the information. Therefore, organizations also put into place guards, night watchers, and others, including law enforcement, to monitor systems and to respond. CSIRTs are the response part of the secure + monitor + detect + respond equation. Because connected systems are under constant passive and active attack virtually 24 hours a day, emergency response teams are a necessary part of the security equation.

56.1.3 History and Background.

CSIRTs can be traced back to one of the more notorious computer incidents of the late twentieth century. The infamous Morris Worm incident of November 2, 1988, wreaked havoc by disabling a significant portion, by some estimates as much as 10 percent, of the Internet. As organization after organization attempted to deal with the worm, it quickly became apparent that a coordinated response to such incidents would have helped to lessen the impact and speed recovery. There was no central place to report or disseminate information about the attack. Internally, few organizations were equipped with teams dedicated to responding to such attacks. As a result, they wasted time and resources duplicating efforts to identify the source of the attack, to formulate countermeasures, and finally to eradicate the worm.

At that time, the most common response was simply to disconnect from the Internet. That same response would carry unacceptable losses of revenues, confidence, and performance for today's commercial enterprises. As more commercial organizations, government agencies, and individuals became dependent on Internet-connected systems, the criticality of the need for teams capable of responding to emergencies quickly and effectively increased.

The Morris Worm incident and several other attacks on the Internet in November and December 1988 highlighted the need for a coordinated response to widespread computer emergencies. As a result, in December 1988, under the direction of the Defense Advanced Research Projects Agency (DARPA), security experts established the Computer Emergency Response Team Coordination Center (CERT/CC®® ) at the Software Engineering Institute of Carnegie Mellon University in Pittsburgh, Pennsylvania. The role of the CERT/CC was to coordinate communication among organizations during computer emergencies. Their role has since expanded dramatically to include, among other things, assisting with the establishment of other CSIRT teams, acting as a clearinghouse for threat and vulnerability data, and providing training and education programs relating to security incident handling. Many entities, both public and private, have since that time adopted CSIRT's incident-handling procedures.2

Since its inception, CERT/CC has provided invaluable services to the world community of Internet users and especially to system and security administrators. In addition to the archives of security alerts and incident analyses available online and via free e-mail subscriptions, CERT/CC provides free electronic textbooks of great quality. One of these is the famous Handbook for Computer Security Incident Response Teams (CSIRTs) edited by Moira J. West-Brown and colleagues, and which is now in its second edition.3 We strongly recommend this work to anyone concerned with establishing and managing a CSIRT.

West-Brown et al. describe the functions of the CSIRT in this way:

For a team to be considered a CSIRT, it must provide one or more of the incident-handling services: incident analysis, incident response on site, incident response support, or incident response coordination.

They explain in detail all aspects of these functions, and they summarize their research on the range of services that CSIRTs actually provide, whether by themselves or in cooperation with other teams in the information technology sector. They provided a “List of Common CSIRT Services” as Table 4 in their Handbook for CSIRTs; we have reformatted it here:

  • Reactive Services
    • Alerts and warnings
    • Incident handling
      • Incident analysis
      • Incident response on site
      • Incident response support
      • Incident response coordination
    • Vulnerability handling
      • Vulnerability analysis
      • Vulnerability response
      • Vulnerability response coordination
    • Artifact handling
      • Artifact analysis
      • Artifact response
      • Artifact response coordination
  • Proactive Services
    • Announcements
    • Technology watch
    • Security audits or assessments
    • Configuration and maintenance of security tools, applications, and infrastructures
    • Development of security tools
    • Intrusion detection services
    • Security-related information dissemination
  • Security Quality Management Services
    • Risk analysis
    • Business continuity and disaster recovery planning
    • Security consulting
    • Awareness building
    • Education and training
    • Product evaluation or certification

The only problematic term in this list is “artifact,” which the authors define as “any file or object found on a system that might be involved in probing or attacking systems and networks or that is being used to defeat security measures. Artifacts can include but are not limited to computer viruses, Trojan horse programs, worms, exploit scripts, and toolkits” (p. 28).

The specific combination of functions that a CSIRT will provide will be a function of personnel and budgetary resources, and of the maturity of the team. It is wise to focus a completely new CSIRT on essential services such as incident handling and analysis as their first priority. With time and experience, the team can add functions, such as coordinating with other security teams, with computer and network operations in the more proactive services, and with the security quality services that will lead to long-term reduction in security incidents and to lower damages and costs from such incidents.

56.1.4 Types of Teams.

The exact composition of a CSIRT depends on factors such as the size, type, complexity, budget, and location of its sponsoring organization. Some organizations might be able to justify and support a full-time, dedicated, in-house CSIRT with the very latest technology and training. Others might improvise teams once an incident occurs, or they might even hire outside expertise to handle computer emergencies on their behalf. Still others might use a combination of these approaches by having core CSIRT staff who can be augmented by other people as needed to manage incidents.

One viable alternative to developing in-house quick response teams is to take advantage of outsourced services in this area. Outsourced incident-handling services are becoming increasingly popular, and many security companies offer them to their customers. In some cases, this might be the most practical option for companies that lack the resources or desire to develop in-house response capabilities. Outsourcing computer emergency quick response efforts can be an effective, albeit somewhat costly, alternative to developing in-house response teams, for both short- and long-term incident handling.

However, for many organizations, establishing their own incident response capabilities can provide significant advantages. Internal teams generally know the organization and understand its goals, issues, and requirements. Outsourced responses are often mechanical and standardized. Vendors can take longer to respond since they are normally located off site and might even be a considerable distance away. Vendors may undergo frequent staff turnover, which can mean that those assigned to respond to incidents at an organization might be unfamiliar with that organization or its mission. As a result, outsiders might require precious time to “ramp up” before dealing effectively with a current incident or situation. The connected nature of today's organizations, and modern monitoring technologies, may make up for distance in some ways, but there is no substitute for an expert on site.

Some organizations are fortunate enough to have implemented formal, standing CSIRTs whose members are dedicated primarily to monitoring systems, preventing intrusions, and responding to computer emergency incidents. These teams are superior to ad hoc or outsourced teams in their ability to respond quickly with customized procedures acting from a deep and current knowledge base.

56.2 PLANNING THE TEAM.

Not every organization has a CSIRT already in place; not all CSIRTs are structured and managed in the most appropriate ways for an organization's specific needs. This section presents systematic approaches for rational design and implementation of a CSIRT.

Establishing a CSIRT is a complex process that must be given careful thought and must be based on comprehensive planning. Before establishing a CSIRT, the organization needs to determine exactly what it expects to accomplish. From this, the organization can decide on specific goals for the team, and perhaps most important, it can decide on policies that apply to the team. The team should be conceived and defined in terms of the organization to which it belongs. That is, the team should be tailored to achieving a specific mission. Clarity, vision, and focus are vital planning elements that ultimately will determine the success or failure of the CSIRT. Skimping during the planning stage will ensure failure. Devoting some extra effort to planning in this stage will help improve the chances of success.

56.2.1 Mission and Charter.

The CSIRT should include members from every sector of the organization; key members include operations, facilities, legal staff, public relations, information technology, and at least one respected and experienced manager with a direct line to top management. The CSIRT should establish good relations with law enforcement officials and should be prepared to gather forensic evidence. The organization should have a policy in place on how to decide whether to prosecute malefactors if they can be identified. The CSIRT should be prepared to respond not only to external attacks but also to criminal activities by insiders. Proper logging at the operating system level and from intrusion-detection systems can be useful to the CSIRT. The CSIRT plays an important role in disaster prevention, mitigation, and recovery planning.

Organizing people to respond to computer security incidents is worth the effort not only when an incident occurs but also because the analysis and interactions leading to establishment of the CSIRT bring benefits even without an emergency. A CSIRT can provide opportunities for improving institutional knowledge, contributing to continuous process improvement and offering challenging and satisfying work assignments to technical and managerial staff, thus contributing to reduced turnover. A well-trained, professional, courteous CSIRT can improve relations between the entire technical support infrastructure and the user community. The team and its members often serve as key elements of business continuity and disaster recovery teams.

A clear, written mission and a charter establishing the CSIRT are essential to its success. These documents should establish why the team exists and what the organization expects from the team at a high level. Although the current security landscape provides compelling reasons for establishing an incident response capability, identifying organization-specific goals and expectations for the team remains an essential task.

Effective policies are essential for any organization. The CSIRT's mission and charter should be based on organizational policies, especially information security policies. Establishing a team without having appropriate policies in place is ineffective and can put the team at odds with its own organization. Without formal policies upon which to base computer emergency response activities, the team can have no legitimate basis for deciding on courses of action that support the organization. During an incident, CSIRT decisions can be unpopular. Disconnecting systems from the Internet could prevent some, or perhaps all, of the organization from carrying out its mission. Without established policies both to define and to defend those decisions, often the team is viewed as an adversary.

Without this clear definition of mission and an idea of what can be expected from the CSIRT, internal cooperation and support for the team will be difficult to obtain and even more difficult to sustain. Without internal cooperation, the team's effectiveness will be diminished, which could exacerbate the impact of an incident or prevent the team from handling an incident in a timely manner.

The overarching goal of responding to an incident should always be to prevent further damage and to restore systems and operations to normal as expeditiously as possible, consistent with organizational policies. The CSIRT members must have a clear sense of the strategic and operational priorities of the organization, just as members of business continuity and disaster recovery teams do. Without a clear idea of what the organization expects the team to accomplish, the team is likely to waste the limited time and resources it usually has available.

For more information on business continuity planning, see Chapter 58 in this Handbook; for discussion of disaster recovery, see Chapter 59.

56.2.2 Establishing Policies and Procedures.

As the U.S. Government Defense Information Systems Agency (DISA) training course on CD-ROM about CSIRTs succinctly puts it, “policies and procedures are not merely bureaucratic red tape.”4 They are the scaffolding on which one can establish clear understanding and expectations for everyone involved in incident response. These living, evolving documents are (quoting the CD-ROM notes) tools that provide guidance on:

  • Roles and responsibilities
  • Priorities
  • Escalation criteria
  • Response provided
  • Orientation

Policies are the statements of desired goals; procedures are the methods for attaining those goals. Policies tend to be global and relatively stable; procedures can and should be relatively specific, and can be adapted quickly to meet changing conditions and to integrate knowledge from experience. Policies cannot be promulgated without the approval and support of appropriate authorities in the organization, so one of the first steps is to identify those authorities. Another step is to gain their support for the policy project.

All policies and especially CSIRT policies should be framed in clear, simple language so that everyone can understand them, and they should be made available in electronic form. Hypertext can make policies more understandable by providing pop-up comments or explanations of difficult sections or technical terms. Similarly, procedures show how to implement the policies in real terms. For example, a policy might stipulate: “All relevant information about the time and details of a computer incident shall be recorded with regard for the requirements of later analysis, and for possible use in a legal proceeding.” That policy might spawn a dozen procedures describing exactly how the information is to be recorded, named, stored, and maintained through a proper chain of custody. For example, one procedure might start: “Using the Incident-Report form in the CSIRT Database accessible to all CSIRT members, fill in every required field. Use the pull-down menus wherever possible in answering the questions.” Again, as the DISA CD-ROM points out, these procedures should minimize ambiguity and should help members of the team to provide a consistent level of service to the organization. A glossary of local acronyms and technical terms can be helpful as part of these procedures.

Whenever policies and procedures are changed in a way that may affect users, it is important to let people know about the changes so that their expectations can be adjusted. The DISA course recommends using several channels of communications to ensure that everyone gets the message; for example, send e-mail, use phone and phone messages, send broadcast voicemail, announce the changes at staff meetings, and use posters and Web sites.

For more information on promulgating security policies, see Chapters 44 and 50 in this Handbook.

56.2.3 Interaction with Outside Agencies and Other Resources.

No CSIRT can operate in a vacuum in an interconnected world. At some point, teams and their sponsors will require interaction with outside agencies, and even with other CSIRTs. Rather than wait until an emergency is under way, the team should establish and document contacts such as:

Internal Contacts

  • Management. The team should establish and maintain management contacts who hold sufficient authority to make the tough business decisions that will inevitably arise during an emergency situation.
  • Systems. The CSIRT should have a working relationship with those responsible for operating and maintaining the organization's information systems. These contacts will be necessary to allow CSIRT members appropriate system access during an emergency response situation.
  • Applications. As with systems personnel, the CSIRT should have preestablished contact with those who manage and maintain applications. These individuals will be able to provide CSIRT members with access to application logs, documentation, and with access to accounts during an emergency.
  • Business units. CSIRT members should be familiar with and have established contacts with the various organizational business units they support. Having such contacts ahead of time will facilitate decision making and will avoid delays in gaining access to appropriate business personnel during an emergency.

External Contacts

  • CERT/CC. The CERT/CC can provide valuable advice and assistance to the response team during an attack. Knowing before an emergency whom to contact, and how to reach them, will speed the process.
  • Consultants. Often, organizations will rely on outside consultants to augment technical skills and knowledge. Therefore, it is important that the organization be able to contact these consultants during an emergency, in the event their expertise is required to respond to or to resolve a situation. Planning for such emergency contact ahead of time will avoid delays in responding to an emergency.
  • Vendors. Responding to an emergency may require specialized information about hardware or software features and about specifications that might be available only from a vendor. Additionally, backup systems or software may need to be acquired in order to return systems to operational status during or after an incident.
  • Law enforcement. Law enforcement agencies today frequently have specialized units capable of assisting an organization in tracing and identifying the perpetrator of an attack. However, it is important that law enforcement contacts be made in advance so that the organization can make sound decisions about whom to notify and how best to use such assistance.
  • Utilities. Obviously, electrical power and similar infrastructure services are essential to any organization. Power outages and similar emergencies can have a devastating effect on operations. Maintaining contact with local utilities will help the organization plan for and mitigate impact from power outages. In the event of an outage, team members will know whom to contact, and they can gather information about the cause and duration of an outage more quickly.
  • Internet service providers. Typically, vital connectivity is provided to the organization by one or more Internet service providers (ISPs). In many cases, ISPs are the first line of defense against some types of Internet-originated attacks, such as distributed denial of service. In addition, if an organization is attempting to track a suspected intruder, the Internet service provider will be pivotal. It is therefore vitally important that the CSIRT have preestablished contacts within the organization's ISPs in order to avoid wasting precious response time trying to get assistance during an emergency.
  • Other CSIRTs. Other CSIRTs may well have faced the situation or emergency that another team might be facing. Other CSIRTs can provide advice and assistance, and some may even share resources and expertise to help an organization respond to an emergency. There is also an opportunity to share knowledge and conduct joint training with other teams.

56.2.4 Establish Baselines.

In order to be able to spot that which is out of the ordinary, the CSIRT must determine what “normal” looks like. False incidents have occurred because the observer did not have adequate knowledge to realize that the event was actually normal. Emergency response teams called into action without well-documented baselines, or detailed activity logs, must work very hard to determine whether the event is normal.

In either case, whether the triggering event turns out to be false or genuine, resources and time will have been wasted by this identification effort. A good baseline can reduce the resources expended on false positives, and can hasten the response to real events.

56.3 SELECTING AND BUILDING THE TEAM.

An effective CSIRT is comprised of these elements, dictated by the incident at hand:

  • People
  • Skills
  • Knowledge
  • Equipment
  • Access
  • Authority

The makeup of the team has everything to do with how effective and responsive it will be in an emergency. Careful selection of team members at the outset will provide for an effective, cohesive group with the right skills, authority, and knowledge to deal properly with a range of known and unknown incidents.

Frequently, the first inclination is to select the most technically knowledgeable individuals available as members of the team. While technical ability is essential to a CSIRT, this should not be the overriding characteristic. Given aptitude and motivation, appropriate technical skills can be learned. Indeed, during the course of an incident situation, adept handlers can draw on the technical expertise of people, either internal or outside, to augment their own skills and knowledge.

Maturity and the ability to work long hours under stress and intense pressure are crucial characteristics. Integrity in the response team members must be absolute, since these people will have access and authority exceeding that given them in normal operations.

Exceptional communications skills are required because, in an emergency, quick and accurate communications are needed. Inaccurate communications can cause the emergency to appear more serious than it is and therefore escalate a minor event into a crisis. Conversely, proper communications can galvanize others into immediate and effective activity, without creating a panic reaction.

56.3.1 Staffing.

The CSIRT may be a permanent, full-time assignment for a fixed group of experts, or it may be a part time role formed dynamically as conditions require. In either case, or for any of the intermediate arrangements, certain fundamentals will dictate the choice of staff members.

The DISA course on CSIRT management also addresses the question of the technical level required by CSIRT staff. The course authors suggest using a scale from 1 to 10, with 1 representing the novice or support staff and 10 representing the technical wizard: Individuals in the 1-to-3 technical range should be sufficient to handle the initial triage process, which involves separating service request into categories, and directing them to the appropriate team member.

Information requests can be handled by team members in the 1-to-5 range. For example, a support staff person can send out publications, while someone with greater expertise would be required to address questions about identifying spoofed e-mail.

Team members in the 5-to-8 technical range are necessary to respond to actual incidents. This response can involve technical analysis and communicating with compromised sites, with law enforcement technical staff, and with other CSIRTs. In handling incidents that represent new attack types, it may be necessary to call “wizards” to help understand and analyze the activity.

Vulnerability handling requires the most proficient personnel, falling into the 8-to-10 range. These individuals must be able to work with software vendors, CSIRTs, and other experts to identify and resolve vulnerabilities. Many CSIRTs do not have access to this level of technical expertise.

CSIRT staff with the psychological flexibility to allow them to adapt quickly to changing requirements will do better than people who resist change or resent ambiguity. Ideally, the team will include problem-solvers with an intuitive grasp of the differences between observation and assumption, hypothesis, and deduction. As always, team players committed to getting the problem solved will contribute more than people interested in acquiring personal credit for achievements. Having at least one person on the team with a penchant for meticulous note taking is a real benefit (see Sections 56.4.5.2 and 56.5.6).

56.3.2 Involve Legal Staff.

As with any crisis event, every action carries with it a potential legal implication. This is especially true in a situation where an evidentiary chain may be required. Even if evidence is not a primary concern, due diligence requires that accurate records be kept of the incident costs, including response team costs, and of the scope of compromise and effect.

The corporate legal staff must play an important role in developing response team procedures, in training the response team, and in crisis resolution.

56.4 PRINCIPLES UNDERLYING EFFECTIVE RESPONSE TO COMPUTER SECURITY INCIDENTS.

Some general considerations underlie effectiveness for all incident-handling teams.

56.4.1 Baseline Assumptions.

The primary consideration in responding to any emergency situation must be given to preventing loss of human life. Following that, a comprehensive information security program will have included the identification and classification of the most sensitive data and systems. This classification should provide a clear prioritization of what should be protected first, in the event of an emergency. For example, a business's survival might depend on the confidentiality and integrity of some intellectual property, such as engineering diagrams. After the safety of personnel, those drawings, and the systems on which they are stored, would be the obvious first priority for protection. A network intrusion that threatened an e-mail server located on a separate network segment would likely not warrant immediately disconnecting the entire network from the Internet. If the particular server containing the company's engineering drawings, or the network segment on which it sits, were under direct attack, however, an appropriate first response might well be to disconnect from the Internet.

In any case, there must be an unambiguous sense among the CSIRT that those responsible for taking actions in good faith will not suffer reprisals as a result of taking those actions. For example, based on facts in evidence at one time or another, a member of the team might decide to disconnect an operational system from the Internet because it appears to be under attack or compromised. Should this turn out to be a false alarm of some sort, the individual authorizing the action should suffer no reprisal or sanction by taking what he or she believed was a legitimate action to stop, or to respond to, an attack.

With appropriate plans in place before incidents happen, recovery can be effected much more quickly and with less residual damage.

Appropriate responses depend on the systems involved, and should be documented and agreed to before those systems are connected to the Internet. If a router, or a firewall protecting the outer perimeter of the network, becomes compromised, it may be necessary to disconnect the entire system in order to contain the situation.

If a Web server in an isolated part of the network becomes compromised, though, disconnecting only that system should be sufficient. However, any business-related activities carried out using that server might no longer be available. A loss of revenue or image might result from disconnecting the server. Business units and others that depend on these systems must be made aware of, and must agree to accept, the impact of proposed responses the CSIRT might take during an incident.

Planned responses, combined with the authority and confidence to execute them, can save the organization both time and money. As an example, if every incident requires the presence of senior business managers or executives, those leaders must be taken away from their normal duties during rehearsals and during actual events. If the CSIRT leader on duty or on call, finds the incident to be routine, and if the incident has been well planned for, a simple notification of the facts can be sent to appropriate senior-level personnel, leaving them free to attend to their normal duties and to participate only in a major event at their appropriate level of management.

Planned, preapproved responses can speed reaction times, enhance security, and lessen impact of a given breach or incident. In most cases, the CSIRT leader will follow a series of commonsense steps to handle an incident from identification through resolution. As the leader progresses through each step, he or she may choose from one of the preapproved responses to handle the incident, or the situation may require the involvement of other people and resources for resolution. In either case, the basic flow of events should be similar to this list:

  • Triage. Deciding how to direct calls for help or reports of a computer security incident
  • Technical expertise. Assembling the different kinds of knowledge required to support an effective response
  • Tracking incidents. Ensuring appropriate documentation to save time and reduce errors
  • Critical information. Laying the ground rules for collecting the kinds of data needed for effective decisions
  • Telephone hotline. Establishing protocol for real-time notification and response

56.4.2 Triage.

The word “triage” itself comes from a French root meaning “to sort.” In medicine, triage is “prioritization of patients for medical treatment: the process of prioritizing sick or injured people for treatment according to the seriousness of the condition or injury.”5 Similarly, anyone receiving calls about computer security incidents must be able to classify the call right away, so that the right resources can be called into play. As the DISA course on CSIRT management suggests:

The triage process recognizes and separates

  • new incidents,
  • new information for ongoing incidents,
  • vulnerability reports,
  • information requests, [and]
  • other service requests.6

We have altered the order of the original list to reflect a decreasing rank of importance for these factors in communicating and acting upon calls.

Triage is common to ordinary help desks as well as to emergency hotlines. In general, there are two models for staffing the phones for such front-line functions: the “dispatch” model and the “resolve” model.7

  1. The dispatcher has just enough technical knowledge to collect appropriate information about an incident and to assign a team member for investigation; the alternative is to assign someone with more expertise to answer the phone so that response can be even faster.
  2. However, the resolve model risks wasting resources because the more experienced staff member may end up doing largely clerical work instead of focusing on applying expertise to problem analysis and resolution.

To support triage, staff members need explicit training on data collection and priorities. They need to record: who is calling; how to reach that person; what the caller thinks is happening; what the caller has observed; how serious the consequences are; how many people or systems are affected; whether the incident is in progress or is over, as far as they know; and how the caller and others are responding. The CSIRT procedures should include guidance on assigning priorities to incidents; factors can include security classifications (e.g., SECRET or COMPANY CONFIDENTIAL data under attack), type of problem (e.g., breach of confidentiality, data corruption, loss of control, loss of authenticity, degradation of availability or utility), possible direct costs (e.g., personnel downtime, costs of recovery, or loss of business), possible indirect costs (e.g., damage to business reputation or legal liability) and so on, as appropriate for each organization.

Readers may find the work of John Howard relevant for such analysis; Dr. Howard has established a useful taxonomy for discussing computer security incidents that can serve as a framework for establishing priorities.8 See Chapter 8 in this Handbook.

We recommend an automated system for capturing information on all calls to the CSIRT. Using keywords “helpdesk software” and also “help desk software” in an online search brings up dozens of options for such programs. Readers with modest skills in database design can also create their own using a program such as MS-Access, but it may take a good deal of time to improve the home-grown system so that it matches commercial or freeware versions based on extensive experience of the user community. However, with appropriate locking strategies to permit safe concurrent access and with well-designed automated reports, your CSIRT can know and control the priorities of all the open incidents under investigation at any time.

For more about management perspectives on triage, see Section 56.6.3.

56.4.3 Technical Expertise.

The DISA CD-ROM course starts by classifying technical expertise in approximate ranges:

  • Low, suitable for the triage function which involves determining who should best handle a specific call
  • Medium, appropriate for answering requests for information
  • High, suitable for technical problem-solving
  • Expert, suitable for handling problems that others have been unable to resolve and especially for issues involving vulnerability analysis and real-time response to attacks

As the DISA writers point out, “Vulnerability handling requires your most proficient personnel…. These individuals must be able to work with software vendors, CIRTs, and other experts to identify and resolve vulnerabilities. Many CIRTs don't have access to this level of technical expertise.”

56.4.4 Training.

Teams that have no experience responding to incidents are of little value to an organization. Predictably, computer emergency responses by untrained or inexperienced teams result in loss or destruction of evidence, legal exposure by failing to properly protect individual rights, and failure to properly document and learn from the experience. To be most effective, training must be iterative (learn, exercise, review, analyze, repeat) and should involve as many realistic scenarios as possible, so that the CSIRT becomes exposed to a wide variety of potential emergency situations.

56.4.4.1 Rehearse Often.

Experience can be gained only by responding to incidents or through training with simulated attacks. While the time and resources required to practice responding to incidents might be costly, more costly still is the potential damage resulting from an uncontained or poorly handled breach of the system.

An excellent opportunity to practice response procedures, and to develop response teams, occurs during periodic security assessments, including penetration testing. Penetration tests simulate external and internal attacks on a system and offer a real yet controlled environment in which to exercise, train, and evaluate a CSIRT. The simulation is especially effective when an outside, independent team is engaged to conduct the penetration test and security assessment. With proper coordination, the test can provide an opportunity for the team to observe, and react to, many different types of incidents.

56.4.4.2 Perform Training Reviews.

At the conclusion of any training exercise, it is important to reassemble the team as soon as possible, not only to review management's view of its performance but to reveal its own perspectives as well. Each of the participants should be asked what went right and what went wrong. Were necessary resources (information, decision makers, tools, software, equipment) unavailable when the team needed them? Did the team have, or was it able to obtain in a timely manner, the physical and system access it needed? Did it have the right documentation and access to other company personnel? Were systems for communicating among the team and with other company or external personnel adequate and efficient?

Video cameras are a useful tool for recording events during the training sessions; many employees will have such equipment available and can make inexpensive recordings that can be analyzed during the training reviews.

See Section 56.7.1 on postmortem analysis of real incidents for more ideas that can be applied to training reviews.

56.4.5 Tracking Incidents.

This section focuses on some of the advantages, requirements, and tools for incident tracking. First, we establish why documentation in general is so important.

56.4.5.1 Will This Have to Be Done Again?

When one of the authors (MK) joined Hewlett-Packard (Canada) Ltd. in 1980, he arrived on the job armed with a small, green, hard-covered Daytimer® book prominently labeled LOGBOOK in big black letters. From his first day as a member of the systems engineering organization, he wrote down what he learned; he logged how he spent his time. When he met clients, he took notes. When he installed new versions of the MPE operating system, he kept a chronological record of everything he did—including mistakes. When he taught courses, he kept a list of questions he could not answer right away.

Pretty soon, people began asking him what he thought he was doing, writing a novel?

His colleagues may have been puzzled by what they perceived as a mania for record keeping, but he was equally astonished that record keeping was not a normal part of their way of doing work. The reason for automatically kept records was his years in scientific research, where logbooks with hard covers, numbered pages, and even waterproof paper were just usual parts of doing serious work. The idea of doing anything of importance without keeping a concurrent record simply did not occur to anyone in research. One could not reproduce an experiment without knowing exactly what sequence one had used in accomplishing the steps. Even adding salts to solutions had to be done in a particular order.

So he just kept on keeping his little green logbooks.

56.4.5.2 Why Document?

Documentation, far from being a sterile exercise done to conform to arbitrary requirements of nameless, faceless superiors, should be a vital part of any intellectual exercise. Documentation is simply writing down what we learn: the crucial step in human history that changed traditional cultures into civilizations. By keeping a record independent of any specific individual, we liberate our colleagues and our successors from dependence on our physical availability. Documentation is our assurance that work will continue without us; a kind of immortality, if you will.

We document what we do as a part of systematic problem solving. Writing forces us to identify the problem in words, instead of being content to define it in vague, unclear ideas. Writing down each idea we are in the process of testing helps us notice the ideas we missed the first time we tackled the problem. Keeping notes helps us pay attention to what we are doing.

Documenting what we do also helps us during training—both our own and that of the people we are helping to learn technical skills. Trainees can review their own notes on how to do something instead of relying entirely on someone else's description. If taking notes is viewed as a chance to engage one's mind more thoroughly in what we are learning, it can be stimulating and even fun.

Finally, accurate records can be a boon in legal wrangles. In a case one of us (MK) experienced, upper management seriously considered legal procedures against a supplier for supposed breach of contract. Careful records of exactly when meetings were held, and with whom, permitted us to analyze the problem and to resolve the issues by collaboration instead of by confrontation. Such records, if kept consistently, in good times and bad, can be accepted in a court of law as evidence—but only if everything points to a steady pattern of record keeping as events unfold. Records made long after a problem occurs are worthless.

56.4.5.3 Keep Electronic Records.

The best way of keeping records on specific problems is an easy–to–use database. Suchrecords help team members remember and share information that can help in solving new problems as they arise. With easily accessible records, it is possible to solve problems without the presence of specific team members. Such shared knowledge speeds problem resolution, improves the competence of all team members with access to the knowledge base, and provides a sound basis for training and integration of new team members. Sharing knowledge can be a liberation for key members of any organization by sparing them from the sense of obligation to be present at all times; it also supports management policies that enforce security principles by requiring employees to take vacation time. For a discussion of the dangers of allowing any employee to become indispensable, see Chapter 45 in this Handbook on employment practices and policies.

56.4.5.4 Advantages for Technical Support and CSIRTs.

Keeping track of all of technical support calls is essential for effective incident handling. Having details available to all members of the CSIRT in real time, and for research and analysis later, serves many functions:

  • Communication among team members. Having the details written down in one place means that team members can pass a case from one to another and share data efficiently.
  • Better client service. Callers become frustrated when they have to repeat the same information to several people in a row; a good incident-tracking system reduces that kind of irritation.
  • Documentation for effective problem solving. A good base of documented experience can help find the right procedure and the right solution quickly.
  • Institutional memory. When experience is written down and accessible, the organization's capacity to respond quickly and correctly to incidents improves over time.
  • Follow-up with clients. Managers can use the incident database to prepare management reports and to follow-up with specific clients to understand and resolve difficulties or complaints.
  • Forensic evidence. Detailed, accurate, and correctly time-stamped notes can be a deciding element in successful prosecution of malefactors.

56.4.5.5 Requirements.

Some of the more obvious requirements of any incident-handling system are:

  • A unique identifier for each case.
  • Dates and times for all events.
  • Who currently controls the case: it should be instantly obvious who is in charge of solving the problem.
  • Keywords.
  • Contact information: Every person in the case should be listed with office phone, cell phone, e-mail, and fax numbers.
  • Handover of control: Whenever someone takes over control of the case, that handover should be noted in the record.
  • Technical details including
  • Diagnostics.
  • Tests of hypotheses.
  • Resolution: What was the outcome? When was the case closed?
  • Search facilities: Full-text search capabilities.
  • Knowledge base: Ability to integrate vendor-supplied entries to speed research.
  • Industry-standard database engine: Easy to learn, maintain, and improve.
  • Accept input from comma-separated value (CSV) files: Import data from other systems.

56.4.5.6 Tools.

There is a wide range of software available for tracking incidents. One can build one's own, but then proper documentation and training materials must also be created, because turnover is a constant problem for CSIRTs. In addition, unless analysts have experience with the CSIRT function, they are likely to miss useful features that have accumulated over the years in products used by thousands of people.9 Well-respected open-source tools are listed in the “Further Reading” section of this chapter.

All such tools can be complex. To prevent people from fumbling about in an emergency, a budget must be established adequate for staff training in implementation of the selected tool.

56.4.5.7 Get the Global Picture.

When gathering information about an incident, staff members should establish a clear picture of what people were doing when they realized that there was a problem. For example, it may be important to know that someone was accessing a rarely used account and noticed that a file was not available because someone else had it open. Those details will help to characterize the attack and to provide clues that may lead to additional valuable data. However, the CSIRT investigator should also ask why the contact was accessing the rarely used account; it takes only a minute, but getting a wider picture may give the analyst another perspective that can also lead to new clues. In the scenario just sketched, one could imagine that a system administrator had become curious about some unexpected resource utilization in a supposedly dormant account. This simple fact might lead to additional exploration of system log files and questions about whether any other dormant accounts had sparked curiosity. So, in general, it is worthwhile to explore the situation more broadly at first, rather than driving down the very first avenue that presents itself in the initial questions.

For more information on using log files for analysis of problems, see Chapter 53 in this Handbook.

56.4.5.8 Distinguish Observation from Assumption.

As the CSIRT member listens to the observations of other staff members, it is critically important to distinguish facts—that is, personal observations—from assumptions. Assumptions are ideas taken for granted or statements that are accepted without proof. For example, imagine the serious consequences of hearing someone say, “And so then they must have exploited a flaw in the firewall and then they…” and simply writing that assumption down as if it were a tested and validated explanation of the events. Such an assumption could profoundly distort the investigation, putting people's efforts onto the wrong track and diverting their attention from a more fruitful line of inquiry. Hearing such a statement, one should write down, “And so perhaps they exploited a flaw in the firewall and…” or “Bob thinks that they exploited a flaw in the firewall and…”

56.4.5.9 Distinguish Observation from Hearsay.

Everyone has played the child's game of whispering a sentence to another person and then hearing the distorted version that come out the other end of a long chain of transmission without error correction. CSIRT staff must always distinguish between first-person observations (“I read the log file and found…”) and hearsay (“Shalama read the log file and she found…”). Team members should not trust hearsay: They must check it out themselves by tracking down the source of the information. Even when someone is reporting a personal observation to a CSIRT member, it is important to weigh the cost of verifying the observation (when possible) against the consequences of branching off into the wrong part of the solution space.

56.4.5.10 Distinguish Observation from Hypothesis.

Sometimes when people are careless or untrained, they do not distinguish between what they saw and an idea that might explain what they saw. In the previous example about a supposed flaw in a firewall, the person speaking seemed to take the flaw for granted; that was an assumption. A similar problem can occur when someone thinks that maybe there is a flaw in the firewall and then proceeds as if that were true without testing the hypothesis. “And so maybe they exploited a flaw in the firewall, so we should patch all the holes right away.” Putting aside for the moment the advisability of patching holes in firewalls, merely hypothesizing an exploit does not make it true. Maybe it is a good thing to patch the firewall, but it does not follow that it is the top priority right now simply from having thought of the idea. CSIRT staff should be careful to think about what they are hearing and should note explicitly when people are proposing explanations rather than reporting facts.

56.4.5.11 Challenge Hypotheses.

When CSIRT members develop hypotheses about what is happening in a breach of security, they have to do two things: see if the ideas are consistent with observation but also test those ideas to see if they are flawed.

Trying to show that an idea is correct is a natural response when solving problems. Especially when the clock is ticking and a critical process is stopped, the immediate need is to get the system running immediately, no matter what it takes. Unfortunately, doing something and having the system work afterward does not automatically mean that the solution being proposed actually fixed the problem—that is the fallacy known as post hoc, propter hoc (after the fact, because of the fact). It is possible that what we think fixed the problem simply preceded a change of state related to some other factor. The supposed fix may have nothing to do with the solution or may be only part of the solution. Even under pressure, technical support teams with experience go beyond the immediate fix to see if there are other factors that need to be controlled for long-term stability of the system.

In analyzing the behavior of a compromised system, the CSIRT is usually less concerned with restart than with forensic analysis. Who did what to which parts of the system? What do the log files tell us about the incident? How could the attacker have gotten in? What might (s)he have changed?

For that kind of analysis, it is especially important to find ways of testing our ideas before we go down a long chain of reasoning that may be flawed at its very start. Thus, just as in quality assurance, try to come up with ways of showing that our explanation is wrong. If we fail to disprove a hypothesis using genuine, thoughtful, intelligent tests of our ideas, maybe we have got something useful after all. In practice, the principle teaches us to go a step further when solving problems. Instead of stopping the testing as soon as we find supportive evidence, we can make it a habit to ask “Yes, but what if…?”

56.4.6 Telephone Hotline.

Users should be trained and encouraged to call the telephone hotline—usually the help desk line—to report anomalies or suspected breaches of security. The help desk operator can route the call to the appropriate persons, including the CSIRT monitor.

Returning for a moment to the advice on staffing the CSIRT as discussed in Section 56.3.1, there are some additional requirements for the people involved in the CSIRT concerning their interpersonal relations. Not only should managers look for, and ensure, adequate technical knowledge, they should also enhance interpersonal skills and disciplined work habits.

CSIRT members inevitably work with some users who are stressed by the problems they are facing. It is no help to have a technical wizard who so offends the users that they stop cooperating with the problem-resolution team. Sometimes CSIRT staff members forget that their job includes not only resolving a technical issue but also keeping the clients as happy as possible under the circumstances—and the use of the word “clients” is deliberate here.

Here are some of the most irritating responses to users we have run across in several decades of technical support and client support followed by our comments in brackets:

  • “No one has ever complained about this before.” [So what? If the problem is real, we should thank the user for reporting it, not make veiled criticisms that imply that the problem cannot be real.]
  • “I don't have time for this now.” [That is a time management problem for the CSIRT, not for the client. Take responsibility for getting the right person to take charge of the problem in real time.]
  • “Why don't you try calling …?” [Same comment as last one.]
  • “That's not my problem.” [Just plain rude as well as irresponsible.]
  • “Why don't you reload the operating system and call me back if it happens again?” [Significant risk and time cost for the client; often the first-line suggestion of the terminally incompetent technician.]
  • “Just format your hard disk and see if it happens again.” [Even worse than the previous suggestion if it is just a casual suggestion to get the client off the phone for now.]
  • “Don't get mad at me—I just work here.” [A professional will understand that there is a difference between criticism directed at the organization or its procedures versus a direct ad hominem attack. The former should be taken seriously and passed on to people who can evaluate the seriousness of the criticism; the latter can be unacceptable and should be passed on to a manager who can explain the need for civility even under stress.]

56.5 RESPONDING TO COMPUTER EMERGENCIES.

This section offers specific recommendations and comments on a structured response plan to help develop a systematic approach to CSIRTs.

56.5.1 Observe and Evaluate.

A response team leader must assess the situation as quickly as possible, based on available information. The leader should make a preliminary estimate of the type of incident, its scope, the people involved, and the data or systems affected, and then begin formulating first responses. This is the point at which the team leader or other responsible person orders a move from a state of standby monitoring to one of active monitoring, focused on the particular event or events. It is important to maintain standby and baseline monitoring activities during an actual incident, because the obvious event might well be a ruse designed to divert attention from a more serious attack.

If proper planning has taken place, the team leader usually will be able to direct a specific course of action in response to a particular incident. The leader can choose from a menu of planned responses while drawing on only those resources necessary to execute that particular response. Doing this minimizes the impact on staff at all levels and allows the incident to be dealt with efficiently and effectively. However, the more unique or complex the situation, the more likely it is that a complete team response may be required.

Responses, the players involved, and the audience are obviously different when considering a data center–type situation as opposed to a user-reported situation. Often, formal handling procedures are preestablished for data centers that generally are staffed by more technical personnel. The CSIRT can expect a higher level of response from data center personnel and will likely be able to communicate instructions more concisely and with more assured compliance.

Dealing with individual users, however, requires a greater degree of sensitivity and understanding. In most cases, the CSIRT will be moving quickly and enthusiastically when handling an incident, since this is what the team has trained so long and hard to do. Individual users often are stressed or bewildered when confronted by a computer emergency incident serious enough to warrant a response team. In these situations, users tend to be nervous rather than excited or confident. This can cause communication problems, especially when a team member converses with users over a phone. Instructions become garbled, or may not be carried out exactly as desired.

Team members must be trained to communicate clearly and calmly when dealing with individual users who may not have exceptionally well-developed technical skills. This is especially important when giving end users instructions over the phone or via some remote means. A calm, careful conversation will lessen the amount of stress the individual feels while helping to ensure that the CSIRT member's instructions are carried out properly. Proper compliance with instructions can make or break an incident investigation, especially when forensic issues are to be considered, as in the case of known or suspected criminal activity. Failure to maintain the state of an attacked system properly can thwart any subsequent attempt at a successful prosecution.

56.5.2 Begin Notification.

Once the team leader establishes that an incident is in fact in progress, notification must begin to appropriate individuals within the organization, consistent with the type of situation. Notification and actions should be carried out, whenever possible, according to existing plans.

In some cases, the CSIRT leader might be able to identify the incident as one calling for a prearranged response. The leader, having the authority and confidence to carry out such a preapproved response, will notify those appropriate to the incident and carry out the contemplated actions. In other cases, the situation might not be so clear, and the notification process might include additional personnel with authority to decide on various courses of action.

56.5.3 Set Up Communications.

Team members, especially when dealing with remote or multiple sites, must be able to communicate easily and securely with one another as well as with management representatives. Team members need to be able to communicate data, status updates, actions, responses, and similar events. Communications should flow securely to the designated CSIRT leader for coordination. The team leader must be able to direct and advise other team members, but the potentially sensitive nature of an incident may require that these communications be handled out of band and through secure means. “Out of band” in this case refers to communication methods that are neither part of nor connected to the system believed to be under attack. For example, in communications regarding an attack, the use of unencrypted e-mail that might be intercepted by an attacker, or by other unauthorized parties, should be avoided whenever possible.

56.5.4 Contain.

The CSIRT's next course of action is to contain the incident. The goal is to limit the scope of any compromise as much as possible, by isolating the system under attack from other systems in order to prevent the problem, attack, or intrusion from spreading. Containment might involve steps such as disconnecting systems from the Internet. However, doing so might limit the organization's ability to catch an intruder who is currently active on the system. The priority level assigned to intruder identification and prosecution is a part of the mission and charter of the team, modified by the specific action plans in use for a particular incident.

56.5.5 Identify.

Once the team has taken steps to contain the incident as much as possible, it should focus on identifying exactly what happened, why it happened, and how it happened, and then identify steps that can be taken to prevent a recurrence. This effort also might involve identifying who, if anyone, was or still is involved in the incident or attack.

56.5.6 Record.

All CSIRTs should be trained to document everything during an incident. No event or detail is too small to record when responding to computer emergencies. Always try to answer “Who? What? Where? How? When? Why?” This is especially true when dealing with criminal activity, when there is an expectation that the intruder will be prosecuted. Keeping accurate records of what happened and the team's actions can prove pivotal in the organization's ability to identify positively the cause or source of an incident and to prevent similar incidents in the future.

In the case of criminal activity, with an expectation of prosecution, a legal representative should be kept informed so that appropriate forensic measures may be ordered at appropriate times.

56.5.7 Return to Operations.

For most business managers and executives, restoring operations is of paramount importance. Frequently they will pressure the EDP people and the CSIRT to put off all other activities and to direct all resources to that end. Except in extreme cases, that pressure should be resisted, and the orderly carrying out of all preceding steps must be assured. As soon as possible, the CSIRT should assist operations personnel with bringing systems back online and returning them to full operating capacity. In some cases, hard drives, logs, and even entire systems may need to remain off-line until detailed forensics examinations can be completed. In these situations, backup systems should be used to bring systems and operating capabilities back online.

56.5.8 Document and Review.

While all CSIRT members should keep careful notes at all times, it is important to remember that formal procedures to document incidents and resulting actions is vital to the overall success of the incident response effort. This documentation can form the basis for new approaches, procedures, policies, awareness programs, and similar changes. Documenting successes and failures can provide the organization with a realistic view of its security posture and of its capability to respond to emergencies, and in some cases can justify the expenditure of additional funds on training or technology. This effort also ensures that the data captured can be used by the CSIRT to learn and to sharpen skills.

56.5.9 Involving Law Enforcement.

The decision to involve law enforcement, or even when to involve law enforcement, in an incident response is one that must be given careful consideration. While most organizations recognize the benefit of a close relationship with law enforcement, involving such agencies when responding to a computer emergency can have consequences beyond those that are immediately evident.

Clearly, local and national law enforcement agencies have a great deal to offer when establishing CSIRTs and developing incident handling capabilities. It is not uncommon these days to find that many law enforcement agencies have specialized units dedicated to computer crimes and issues. They can be a valuable resource, likely to have a wealth of threat data on hand. For this reason, it is important to partner with appropriate agencies to take advantage of their experience and to establish relationships. Knowing whom to contact in an emergency not only will save time and frustration, but may mean the difference between merely repelling an attack or catching and successfully prosecuting the perpetrator, which could help prevent future attacks.

Local laws and statutes may dictate specific notification requirements that an organization is obliged to follow in the event an actual or suspected incident occurs. Careful review of local laws, statutes, and ordinances should be undertaken to ensure that the organization complies with notification requirements and other legal requirements.

When there is a choice to be made, the organization must weigh carefully the decision to involve law enforcement, and especially the question of when to do so. In most cases, formally involving law enforcement means that the organization may have to turn control of the incident and subsequent investigation over to the agency whose jurisdiction it is to investigate the crime.

While most professional law enforcement agencies will work with an organization to minimize any adverse impact on normal operations, this may not always be feasible. Because the goals of law enforcement often are different from those of others, especially of commercial enterprises, law enforcement agencies may not consider the impact of their response on the organization under attack.

Since law enforcement's mission is to investigate criminal activity, its focus will naturally be on identifying, tracking, and locating the intruder. This can, in some cases, result in seizure and removal for forensic purposes of systems and data, even systems that may be critical to the continued operation of the organization. In the case of a business, this might well mean the loss of necessary servers or workstations while an investigation is under way, with a possibly devastating effect.

Indeed, decisions about a preferred response may be taken out of the hands of managers and executives when law enforcement enters into an incident response situation. A commercial business might focus on identifying the vulnerability that made the attack possible, protecting against that vulnerability, and restoring systems to full operating capability. If, during the course of these efforts, the perpetrator can be identified, law enforcement will be informed, but such identification is rarely the overriding objective of the business. For law enforcement, however, identification and prosecution of the perpetrator is the primary objective. Establishing contact with appropriate law enforcement agencies before the organization is forced to respond to an incident will help the CSIRT plan when to notify law enforcement and how most effectively to align both sets of objectives when dealing with an incident.

For more details of working with law enforcement agencies and personnel, see Chapter 61 in this Handbook. For additional discussion of data collection for forensic applications, see Section 56.6.9 in this chapter.

56.5.10 Need to Know.

Protecting information about an incident in progress is essential, not only to a successful response but because it can have serious legal, privacy, and other ramifications as well. Those charged with handling an incident must use out-of-band communications, such as cellular telephones, pagers, and encrypted e-mail systems not connected to the system under attack, to ensure that knowledge of the incident is restricted to those who need to know about it. Attackers could intercept team communications if passed through in-band or normal channels and use that information to cover their tracks or even to prolong an incident.

Responding to incidents always involves gathering information about systems, users, activities, and events. In most cases, sensitive system and even personal information may be collected. During the course of their response and investigation, members of the team frequently make assumptions about the identities of those responsible for the incident, These assumptions are based on data that are continually being collected, refined, modified, and frequently changed during the course of an emergency response. Should an unproven or interim assumption that a particular individual was involved in the incident be made public, that individual's reputation might become needlessly tarnished, and the organization might well find itself facing legal proceedings as a result.

It is therefore essential that the CSIRT disseminate information about the incident according to a strict need-to-know policy. Limiting knowledge about an incident will help ensure that sensitive information remains in the hands of those who need it to perform their duties.

56.6 MANAGING THE CSIRT.

All the work that goes into creating a CSIRT can be wasted if managers fail to lead. Sloppy management can result in degraded performance, alienation of the client base, staff frustration, sabotage, and employee turnover. Management plays a key role in the formation, operation, and support of a CSIRT. Ideally, teams should be composed not only of technical personnel, but also of managers with sufficient authority to assist the team in taking actions that contain an incident and that protect data and systems from further compromise. Outside of incident handling, management support for planning, establishing and enforcing policies, and preauthorizing responses is essential. Most important is management support of the CSIRT. Without solid backing from the highest levels of management, the CSIRT will be frustrated in its attempts to carry out its mission.

56.6.1 Professionalism.

The DISA course wisely emphasizes the importance of professional behavior by all members of the CSIRT. The authors write:

The survival of your CIRT may well depend upon using a Code of Conduct, which will earn the trust and respect of the commands you support. The conduct of any single team member reflects upon the entire CIRT organization. If the commands don't trust your CIRT, they won't report to you. It is important, therefore, not only to have a Code of Conduct, but also to shake it out and dust it off every once in a while. Remind team members what it is and why it is important… and use it.10

Here are some of the practical recommendations from that course (although we have put them in our own words for the most part):

  • Write down the rules—a code of conduct—that represent your ideals of courteous, professional service to your clients.
  • Train the team to understand and apply the code.
  • Review the code periodically with the team.
  • Speak clearly and avoid technobabble.
  • Tell people exactly what you intend to do.
  • Never hesitate to say “I don't know—but I'll find out”.
  • Do not criticize other people in your interactions with clients.
  • Respect the confidentiality of your clients.
  • Be respectful of your callers; do not belittle them or make them feel bad.

Notice how consistently we refer to clients; this usage emphasizes that both technical support teams and CSIRTs all perceive users as people to whom we owe service. There is no benefit to allowing an adversarial relationship between the technical support team or a CSIRT and the client base. Managers must not allow a gulf to develop between the CSIRT and the client community; leaders should clamp down on disparaging terms and derogatory comments about users. Team members must understand why such language is harmful.

Managers should identify CSIRT members with a chip on their shoulders; they must not adopt defensive, arrogant, or aggressive attitudes toward the users. If a computer-security incident can be traced to procedural errors (i.e., the procedures themselves rather than user error are causing problems), the person reporting the problem should be thanked for the information, not criticized for having experienced or identified the problem.

No one in a CSIRT has ever regretted being professional.

56.6.2 Setting the Rules for Triage.

As we mentioned in Section 56.4.2, “triage” in French means “sorting.” In emergency medicine, the term was applied to the process of prioritizing treatment for patients arriving at trauma hospitals near combat zones in World War I. The same concept has been applied to help desks. For example, the “Help desk triage policy” from Courtesy Computers illustrates how a help desk team can categorize problems to ensure that important issues receive faster service than less important problems.11 Importance is defined in terms of the number of users affected, the effects on mission-critical functions, and the costs of downtime or of less-than-optimal functions. The five priority levels suggested in the document are typical of the kind of triage categories established in many help desk departments (adapted from a table in the Courtesy Computers document):

Priority 1

  • Issues of the highest importance; mission-critical systems with a direct impact on the organization (Examples: widespread network outage, payroll system, sales system, telecom system, etc.)
  • Contact: Immediate-5 minutes
  • Resolution: 30 minutes

Priority 2

  • Single user or group outage that is preventing the affected user(s) from working (Examples: failed hard drive, broken monitor, continuous OS lockups, etc.)
  • Contact: 15 minutes
  • Resolution: 1 hour

Priority 3

  • Single user or group outage that can be permanently or temporarily solved with a workaround (Examples: malfunctioning printer, PDA synchronization problem, PC sound problem, etc.)
  • Contact: 30 minutes
  • Resolution: Same day

Priority 4

  • Scheduled work (Examples: new workstation installation, new equipment/software order, new hardware/software installation)
  • Contact: 1 hour
  • Resolution: 1–4 days

Priority 5

  • Nonessential scheduled work (Examples: office moves, telephone moves, equipment loaners, scheduled events)
  • Contact: Same day
  • Resolution: 5 days

The particular structure and specific timelines are merely examples, not blanket recommendations. Every organization must determine its own version of such a table.

Inhis helpful overview, “CIRT—Framework and Models,” Ajoy Kumar summarizes the functions of triage in this way:

Triage: The actions taken to categorize, prioritize, and assign incidents and events.12

It includes the following sub-processes:

  • Categorize events.
  • Correlate various events. Personnel involved in such teams typically also belong to Forensic teams.
  • Prioritize events.
  • Assign events for handling and response.
  • Communicate information to “Respond” process for further handling.
  • Re-assign (and close) events not belonging to CIRT.

The DISA training materials suggest three broader categories of interactions with help desks and CSIRTs: “incidents, vulnerabilities, and information requests”.13

  1. Incidents involve breaches of security
  2. Vulnerabilities include reports of security weaknesses (and may be reported as part of an incident)
  3. Information requests often are managed using lists of frequently asked questions (FAQs).

The DISA instructors go on to define factors that can help CSIRTs prioritize incidents:

  • The sensitivity and/or criticality of the data affected
  • The amount of data affected
  • Which host machines are involved
  • Where and under what conditions the incident occurred
  • Effects of the incident on mission accomplishment
  • Whether the incident is likely to result in media coverage
  • Number of users affected
  • Possible relationships to other incidents currently being investigated
  • The nature of the attack
  • Economic impact and time lost
  • Number of times the problem has recurred; and even
  • Who reports the incident.14

On this last point, the DISA writers point out that the organizational rank of someone calling in an incident may bear on its priority—but that it may be wise to cross-check the report with a security expert who can speak to whether the report is sound. Sometimes a high-level manager's sense of urgency may be rooted more in his sense of self-importance than in operational requirements.

In summary, it is important to establish a sound basis for staff members of the CSIRT to carry out triage effectively. Once the rules for evaluating incidents have been clarified, staff members should practice analyzing a number of cases to train themselves in applying the rules consistently. Role-playing exercises, based on historical records or on made-up examples, can provide an excellent and enjoyable mechanism for staff members to establish a common standard for this difficult and sensitive task.

For additional ideas on using ideas and lessons from social psychology in managing security personnel, see Chapter 50 in this Handbook.

56.6.3 Triage, Process, and Social Engineering.

Sometimes staff (or even managers) question the value of strict adherence to policy. Policy is sometimes seen as the expression of unnecessary rigidity—an inability to respond quickly to changing or unexpected circumstances. However, in CSIRT management, knowing and adhering to well-thought-out policies and following a reliable process are particularly valuable, not only for information gathering, data recording, and analysis, but also to maintain strict security.

One of the well-known tricks used by criminal hackers and spies is to simulate urgency that supports demands for violations of normal security restrictions. For example, criminals will call a relatively low-status employee such as a secretary and pressure him into violating standard protocols to obtain the password of his boss by claiming extreme circumstances of great urgency. The criminal may escalate the pressure to outright bullying by threatening the employee with punishment.

A criminal determined to penetrate security barriers can manufacture an incident that leads to involvement of the CSIRT. Allowing such a person to apply pressure for violations of protocol is an invitation to compromise. Worse, such deviations from well-tried and well-justified procedures can add to the embarrassment caused by the compromise; it is bad enough to have someone breaking through our security without having to admit that we helped.

For additional information on social engineering, see Chapters 19 and 20 in this Handbook.

56.6.4 Avoiding Burnout.

Much of the discussion that follows applies equally to CSIRTs and to help desks; in a sense, one can view the CSIRT as a specialized help desk. Many CSIRTs are specialized subsets of the help desk team.

Any organization, even one with a relatively small CSIRT or a small help desk, can suffer spikes in demand. Ordinary business cycles can influence network usage; for example, universities often see perfectly normal but large increases in call volumes at registration times as new students forget their passwords, try to connect unverified laptops to the university network, or get blocked for violating appropriate-use policies. At any site, a denial-of-service attack, a plague of computer virus infections, or an infestation of computer worms can cause a flood of calls.

Another trend is the ironic observation that the better a CSIRT (or help desk team, although the focus will continue on CSIRTs) becomes at handling problems, the more readily members of its community will turn to it to report problems or ask for help. Thus the better the CSIRT does its job, the heavier its workload can become, at least for a while. According to the DISA course:

As a new CSIRT grows and the workload increases, and especially on those teams that provide 24-hour emergency response, burnout becomes quite common. By studying the issue, one national CSIRT determined that a full-time team member could comfortably handle one new incident per day, with 20 incidents still open and actively being investigated.15

Staff members who face increasing workloads may become stressed. Working long periods of overtime, missing time with family and friends, perhaps even missing regular exercise and food—these factors may lead to increased errors and turnover if people are forced to accept increasingly demanding conditions for long periods.

One of the most valuable organizational approaches to preventing burnout is to rotate staff from the IT group through the CSIRT function on a predictable schedule. For example, one can assign people to the CSIRT for three- or six-month rotations.

Such rotations require especially good training programs and particularly good documentation, to maintain efficiency as new people come on duty; in addition, the assignments must be staggered so that the CSIRT does not have to cope with large numbers of newcomers all at once. Ideally, there would not be more than one switch of personnel a week.

How should existing assignments be transferred within the CSIRT? Difficult existing cases should be transferred to staff members who have been on duty for a few weeks, not to the incoming staff member (even one with experience on the CSIRT). The incoming CSIRT member should be given a chance to get into (or get back into) the rhythm of the job before being hit with the most intractable problem or the most ornery client.

Every incident must have a case coordinator—the person who monitors the problem, aggregates information from varied resources, and serves as the voice of the CSIRT for that incident. When transferring responsibility for a case from one case coordinator to another, managers should ensure that the previous coordinator prepares clients for the transition and introduces the new coordinator to the key client contacts to ensure a smooth transition of control. Clients often come to depend on the person they have been working with to resolve an incident; an unexpected change can be unsettling and disturbing.

56.6.5 Many Types of Productive Work.

The DISA course writers suggest:

Allow team members to allocate time away from high stress incident response assignments and pursue broader interests in areas such as tool development, public education and presentations, research, and other professional opportunities.16

CSIRT members, by the nature of their work, will have a great deal to contribute to the awareness, training, and education of their colleagues.

In the technical support group for Hewlett-Packard Canada in the 1980s, managers exercised great care in preventing consistent overwork. In emergencies, employees all pitched in—including managers—to resolve the problem for clients; however, the policy on allocation of time was strictly enforced under normal circumstances. Everyone kept careful records of time worked—a habit everyone can usefully follow—so that managers could analyze where the burden was distributed, and to provide statistical information for load balancing and personnel planning. Employees who violated the policy that no more than 70 percent of their time in the system engineering group should be spent on billable hours were warned to ease off; the rationale was that it was necessary to maintain constant training and time for administration and just for thinking, to ensure long-term productivity of their specialists.

56.6.6 Setting an Example.

The behavior of managers can greatly influence morale, motivation, and dedication among team members. For example, supervisors and upper managers can greatly motivate staff by pitching in to support them during emergencies or extraordinary demands, even if only by their presence. Making the CSIRT a stimulating and enjoyable duty that people want to be on is one of the best approaches to avoiding burnout and ensuring reliable response to computer-related problems.

56.6.7 Notes on Shiftwork.

As discussed in Section 56.6.4, rotating assignments among CSIRT members can be an excellent idea. However, frequent changes in work schedules that involve changes in sleep cycles are not a good idea; for example, weekly changes in shift from day to night schedules can seriously disrupt the natural circadian wake/sleep cycle and have been shown to increase the rate of errors and accidents.17 One authoritative resource states that there are “adverse health and safety effects to working shifts[:]”

A shiftworker, particularly one who works nights, must function on a schedule that is not natural. Constantly changing schedules can:

  • upset one's circadian rhythm (24-hour body cycle),
  • cause sleep deprivation and disorders of the gastrointestinal and cardiovascular systems,
  • make existing disorders worse, and
  • disrupt family and social life.18

Scientific studies throughout the world have long shown that shiftwork, by its very nature, is a major factor in the health and safety of workers; LaDou writes in his abstract:

Daily physiologic variations termed circadian rhythms are interactive and require a high degree of phase relationship to produce subjective feelings of wellbeing. Disturbance of these activities, circadian desynchronization, whether from passage over time zones or from shift rotation, results in health effects such as disturbance of the quantity and quality of sleep, disturbance of gastrointestinal and other organ system activities, and aggravation of diseases such as diabetes mellitus, epilepsy and thyrotoxicosis.19

The U.S. National Institute for Occupational Safety and Health has published a monograph about shiftwork that contains this advice for improving shiftwork schedules:

  • Avoid permanent (fixed or non-rotating) night shift.
  • Keep consecutive night shifts to a minimum.
  • Avoid frequent shift changes—provide enough stability to let employees adapt to their schedule.
  • Plan some free weekends.
  • Avoid several long days of work followed by four- to seven-day “mini-vacations”—such schedules are stressful because of the radical shifts of diurnal cycles in the two phases.
  • Keep long work shifts and overtime to a minimum.
  • Consider different lengths for shifts.
  • Examine start-end times to fit in better with life in the external world; for example, making it possible for parents on night shift to see their children before they leave for school.
  • Keep the schedule regular and predictable.
  • Examine rest breaks.20

56.6.8 Role of Public Affairs.

The nature of interconnected systems today all but guarantees that any incident will become obvious to partners, customers, clients, and others. In many cases, the organization will be compelled to advise its constituents continuously of the status of any outage or degradation of services resulting from an incident, and the causes behind it. Therefore, it is crucial that information released for general consumption be properly screened and cleared prior to release. It is equally important that such information be released through a single source, such as the public affairs office. Restricting release of incident-related information through the public affairs office, or other designated point, will help ensure that frequent, straightforward communications with stakeholders can take place, while at the same time controlling rumors and misinformation. This simple step can do much to lessen anxiety about an incident and to reassure members, partners, and customers that the situation is well in hand and will be resolved.

56.6.9 Importance of Forensic Awareness.

As implied in Section 56.5.9, depending on the specific incident, the organization may desire not only to control the incident, but also to trace and prosecute the perpetrators in the case of known or suspected criminal activity. It is therefore highly advisable that members of the CSIRT receive thorough training in procedures for collecting and preserving evidence. Mishandling of evidence can result in an inability to take successful legal action against an attacker or to recover damages following an incident. Computer forensics and evidence handling should be high on the CSIRT's list of training topics. Chapters 2 and 34 of this Handbook contain additional material on computer forensics and working with law enforcement.

56.7 POSTINCIDENT ACTIVITIES.

One of the most important principles of management in general, and operations management in particular, is that fixing a problem has two aspects: the short term and the long term. One must be able to solve problems quickly enough to be effective; that is, the speed of solution must be appropriate to the consequential costs of delay. However, we should not figuratively wipe our hands in satisfaction and walk away from a problem resolution without thinking about why it happened, how we fixed it, and whether we can do better to avoid repeats and to improve our response.21

The CSIRT's efforts do not end once the incident is resolved. Instead, the team should take a reasonable period to rest and recover. Then, while the details are still fresh in the team members' minds, they should examine the incident from start to finish, both formally and informally, asking questions such as “What happened? What went right? What went wrong?” This way, the team will learn from each incident and become more efficient and confident when handling new incidents in the future.

At the conclusion of each incident, the team should be assembled and a formal debriefing and review of the incident should be carried out. This debriefing should include a complete review of the team and its handling of the incident, including its adherence to policy and its technical performance. Each team member should be individually debriefed following the incident. Their recollections, thoughts, ideas, and reactions as to how the incident was handled, and how the team performed, should be documented and preserved. A management team might debrief members, and team members might debrief each other, or they might even debrief themselves using a checklist or form. Regardless of the method, the CSIRT members themselves are the best source of data about the weaknesses and strengths of the team, and that data must be captured if the team is to improve and grow in skills and confidence.

Once individual impressions are captured, it is often effective to assemble the team as a group for an incident postmortem session. Starting from the beginning of the incident, the team should examine whether it had adequate, workable policies on which to base its actions and decisions. The group should jointly evaluate each aspect of the team, its composition, skills, authority, and step-by-step handling of the incident. A list of lessons learned and action items for improvements should result from this review.

Data collected during this review process should form the basis for improving the team. This information provides input to what should be a continuous cycle involving planning, preparation, training, responding, and evaluating. Shortfalls in training, skills, equipment, access, policies, and authority will become evident through this process. These shortfalls can be corrected to improve the team's ability to respond effectively to incidents in the future.

The next sections provide additional insights into how to learn from the CSIRT's experiences.

56.7.1 Postmortem.

As a matter of standard operating procedure, every technical support person and the CSIRT must schedule time to analyze the underlying factors that led to the problem they have just resolved. This analysis will likely involve operational staff outside the CSIRT; these are the people with line expertise who will be able to contribute their intimate knowledge of technical details that contributed to this security breach. These discussions can often lead to practical recommendations for improvement of the security architecture, such as its topology or firewall placement, operational procedures such as monitoring standards or vulnerability patching, and technical details such as configurations or parameter settings.

Similarly, it is a commonplace in discussions of disaster recovery and business continuity planning that every practice run, or real-life incident, should be analyzed to see where we have made errors or achieved less than our goals in performance. Managers must ensure that these analyses are not perceived as (or worse, really are) finger-pointing exercises for apportioning blame. In a column for Network World, M. E. Kabay has explained the concepts of “egoless work”; the postmortem analysis of an incident must be ego-free.22 Managers can set the tone by responding positively to what might otherwise be perceived as criticism; “That's a good point” and “Very good observation” are examples of positive, encouraging responses to observations such as “We were too slow in getting back to the initial caller given that she clearly stated that the entire department was off-line.” The meeting should focus on ways to improve the response, given the insights resulting from detailed analysis of successes and failures during an incident.

The other aspect that sometimes gets lost in such postmortems is exploring the reasons for the problems. If we do not pay attention to underlying causes, we may fix specific problems, and we may improve particular procedures, but we will likely encounter different consequences of the same fundamental errors that caused those particular problems. We must pursue the analysis deeply in order to identify structural flaws in our processes, so that we can correct those problems and thus reduce the likelihood of entire classes of problems.

The U.S. National Institute of Standards and Technology Computer Security Incident Handling Guide specifically recommends a postincident analysis in section 3.4. We quote the authors' list of suggested questions verbatim:

  • “Exactly what happened, and at what times?
  • How well did staff and management perform in dealing with the incident? Were the documented procedures followed? Were they adequate?
  • What information was needed sooner?
  • Were any steps or actions taken that might have inhibited the recovery?
  • What would the staff and management do differently the next time a similar incident occurs?
  • What corrective actions can prevent similar incidents in the future?
  • What additional tools or resources are needed to detect, analyze, and mitigate future incidents?”23

The authors also recommend these actions (paraphrasing and summarizing):

  • Invite people to the postmortem with an eye to increasing cooperation throughout the organization.
  • Plan the agenda by polling participants before the meeting.
  • Use experienced moderators.
  • Be sure the meeting rules are clear to everyone to avoid confusion and conflict.
  • Keep a written record of the discussions, conclusions, and action items.

On this last point, we add that all action items should indicate clearly who intends to deliver precisely what operational result, to whom, in which form, and by when.

56.7.2 Continuous Process Improvement: Sharing Knowledge within the Organization.

On page 3-23 of the Computer Security Incident Handling Guide, the authors make a series of recommendations on how to capitalize on the knowledge gained through systematic analysis of incidents.24 We are commenting briefly on each of their suggestions (which are shown in quotation marks).

  • “Reports from these meetings are good material for training new team members by showing them how more experienced team members respond to incidents.” The incident reports that were used for discussion in the analytic meetings should be made available, perhaps as appendices, in a single report document so that all of the information about a specific incident or series of incidents can be accessed at one time. In what follows, such a dossier is referred to as the follow-up report.
  • “Another important post-incident activity is creating a follow-up report for each incident, which can be quite valuable for future use.” The general principle is that without documentation, we lose the opportunity for increasing institutional knowledge. If we do not record what we have learned, transmission depends on luck: the haphazard contacts of people who need to know something with those who can help. Without documentation and efficient indexing, information transferred becomes an inefficient, random process of querying and guesswork. Informal knowledge sometimes remains limited to a few people or even a single individual; without these key resources, the information is unavailable. If the holders of undocumented information leave the organization, their knowledge is usually lost to the group.
  • “First, the report provides a reference that can be used to assist in handling similar incidents.” Why waste time reinventing solutions that have already been found? Why make the same errors and cause the same problems that have already been located and that could be avoided?
  • “Creating a formal chronology of events (including time-stamped information such as log data from systems) is important for legal reasons, as is creating a monetary estimate of the amount of damage the incident caused in terms of any loss of software and files, hardware damage, and staffing costs (including restoring services).” One of the most important kinds of information for managing security is the cost estimate. Rational allocation of resources depends on knowing how often problems occur, and how much they cost, so that we can spend appropriate amounts of money for equipment, and for the time of our employees and consultants to prevent such problems.
  • “This estimate may become the basis for subsequent prosecution activity by entities such as the U.S. Attorney General's office.” Estimates of monetary consequences are also essential for civil torts in the calculation of restitution.
  • “Follow-up reports should be kept for a period of time as specified in record retention policies.” As the Guide's authors discuss in their section 3.4.2, historical records become increasingly useful as they provide a statistical base for analyzing and predicting phenomena. The costs of saving such data (which have relatively small volumes) have dropped to virtually nothing given the huge digital storage capacities of today's archival media, and their extremely low cost.

56.7.3 Sharing Knowledge with the Security Community.

One of the most valuable contributions we can make to each other is information sharing. The Computer Emergency Response Team Coordination Center (CERT/CC) offers an overview of why and how to report security incidents in its “Incident Reporting Guidelines”.25 The CSIRT experts summarize the types of activity on which they would appreciate receiving reports; reasons for reporting security incidents; the variety of people and agencies who can benefit from such reports; extensive guidelines on what to include in the reports; and how to reach the CERT/CC securely.

The section “Why should I report an incident?” has these headers (and a paragraph or so of explanation of each point):

  • You may receive technical assistance.
  • We may be able to associate activity with other incidents.
  • Your report will allow us to provide better incident statistics.
  • Contacting others raises security awareness.
  • Your report helps us to provide you with better documents.
  • Your organization's policies may require you to report the activity.
  • Reporting incidents is part of being a responsible site on the Internet.

Another way of contributing to the field is to speak at conferences. For example, the Forum of Incident Response and Security Teams (FIRST) organizes conferences, technical colloquia, and workshops.26 The 19th Annual FIRST Conference on Computer Security Incident Handling was held in Seville, Spain, on June 17-22, 2007.27 That year the focus was “Private Lives and Corporate Risk: Digital Privacy—Hazards and Responsibilities,” and the conference included sessions on a wide range of topics suitable for technical, managerial, and legal staff at all levels. The conferences are open to all, not just members of FIRST, and organizers want participants to:

  • Learn the latest security strategies in incident management
  • Increase your knowledge and technical insight about security problems and their solutions
  • Keep up-to-date with the latest incident response and prevention techniques
  • Gain insight on analysing network vulnerabilities
  • Hear how the industry experts manage their security issues
  • Interact and network with colleagues from around the world to exchange ideas and advice on incident management best practices.

Readers should think about contributing papers to such conferences. Anyone who has spoken at technical conferences will confirm that there is no better way to solidify one's expertise than marshaling information into a clear presentation and speaking before one's peers. Feedback from interested participants can improve not only the current presentation but also the process being described. Intelligent, enthusiastic interchange among practitioners of goodwill with varied experiences, and from different environments, is not only productive of new ideas, it is immense fun.

The FIRST event includes “Lightning Talks” which are described as “short presentations or speeches by any attendee on any topic, which can be scheduled into conference proceedings with the approval of the organisers.” Participants with hot news can thus present their findings or their ideas without necessarily having to prepare a long lecture or submitting their work many months in advance.

Other conferences, such as those organized by the Computer Security Institute (CSI),28 MIS Training Institute (MISTI),29 and RSA Security,30 among many others, usually offer opportunities for discussions of CSIRT management. Readers, if they can, should take advantage of these opportunities by registering for the calls for participation (CFPs) and responding to one or two a year.

56.8 CONCLUDING REMARKS.

CSIRTs are an effective organizational tool for responding to computer emergencies. However, to be effective, these teams must be carefully planned, built, trained, and supported. Proper planning and the establishment of a clear set of organizational objectives for the CSIRT are key to ensuring success. Teams that are well planned, well trained, confident, and that possess the authority and training to execute their stated mission, ultimately can provide a real return on investment for an organization. This return often can be measured in terms of limiting the impact and cost, both tangible and intangible, of a computer emergency.

56.9 FURTHER READING

Brownlee, N., and E. Guttman. “Expectations for Computer Security Incident Response,” RFC 2350. IETF (June 1998), www.ietf.org/rfc/rfc2350.txt.

CERT/CC. “Resources for Creating, Managing and Improving Your CSIRT,” 2001, www.CSIRT.org/csirts.

Cerberus Helpdesk: http://cerberusweb.com/.

Donald A-M. “Good, but There's More…” Commentary on J. Ward, “Evaluate Help Desk Call-Tracking Software with These Criteria,” TechRepublic, April 15, 2003; http://tinyurl.com/4bcve.

Help Desk Institute: www.thinkhdi.com/.

HelpMaster Pro Suite: www.prd-software.com.au/prd/help-desk-products/.

Open Source Ticket Request System (OTRS): http://otrs.org/.

Request Tracker (RT): www.bestpractical.com/rt/.

TrackIt!: www.itsolutions.intuit.com/Track-It.asp.

Ward, J. “Evaluate Help DeskCall-Tracking Software with These Criteria,” TechRepublic, April 10, 2003, http://articles.techrepublic.com.com/5100-1035_11-5030618.html.

Ward, J. “Product Review: HEAT PowerDesk, Call Center Tracking Software. TechRepublic, May 20, 2003, http://articles.techrepublic.com.com/5100-1035_11-5034947.html.

Ward, J. “Product Review: HelpMastercall, Center Tracking Software,” TechRepublic, April 24, 2003, http://techrepublic.com.com/5100-6270-5034721.html.

56.10 NOTES

1. Parts of this chapter, which is a major revision of the corresponding chapter in the Computer Security Handbook, Fourth Edition, are based on a long series of articles published over several years by M. E. Kabay in his Network World Security Strategies newsletter. To avoid cluttering the text with unnecessary endnotes and quotation marks, there are no further specific references to particular sources in the Network World series on CSIRT management.

2. www.cert.org.

3. M. J. West-Brown, D. Stikvoort, K.-P. Kossakowski, G. Killcrece, R. Ruefle, and M. Zajicek, Handbook for Computer Security Incident Response Teams (CSIRTs), 2nd ed. (Pittsburgh, PA: Computer Emergency Response Team Coordination Center [CERT/CC−], Carnegie Mellon University Software Engineering Institute, 2003); www.cert.org/archive/pdf/csirt-handbook.pdf.

4. DISA, “Introduction to Computer Incident Response Team (CSIRT) Management, v1.0” (CD-ROM), Defense Information Systems Agency, 2001. Available free for download as ZIP file with permission of the DISA Information Assurance Support Environment at www2.norwich.edu/mkabay/infosecmgmt/disa_cirtm_cdrom.zip.

5. Microsoft® Encarta® Reference Library 2008.

6. DISA, “Introduction to Computer Incident Response Team (CSIRT) Management”.

7. B. Czegel, Running an Effective Help Desk, 2nd ed. (New York: John Wiley & Sons, 1998).

8. J. D. Howard, “An Analysis of Security Incidents on the Internet, 1989–1995,” PhD diss., Department of Engineering and Public Policy, Carnegie Mellon University, Pittsburgh, PA, April 1997; www.cert.org/research/JHThesis/Start.html.

9. For an extensive list of articles on this topic, use the Network World search function: http://search.networkworld.com/query.html?qt=help+desk&

10. DISA, “Introduction to Computer Incident Response Team (CSIRT) Management, v1.0”.

11. Help Desk Triage Policy, Courtesy Computers, www.courtesycomputers.com/Best%20Practices/help%20desk%20triage.doc.

12. A. Kumar, “CIRT—Framework and Models.” SecurityDocs.com, January 31, 2005; www.securitydocs.com/library/2964.

13. DISA, “Introduction to Computer Incident Response Team (CSIRT) Management”.

14. DISA, “Introduction to Computer Incident Response Team (CSIRT) Management”.

15. DISA, “Introduction to Computer Incident Response Team (CSIRT) Management”.

16. DISA, “Introduction to Computer Incident Response Team (CSIRT) Management”.

17. T. Dawson and A. Aguirre, “How Work Schedules Impact the Costs, Risks and Liabilities of Extended Hours Operations; Recommendations for Improvement,” Circadian Technologies Inc. white paper, 2005. Available free (registration required) from www.circadian.com/contactforms/workfactorsform.php.

18. Canadian Centre for Occupational Health and Safety, “Rotational Shiftwork,” 1998, www.ccohs.ca/oshanswers/work_schedules/shiftwrk.html.

19. J. LaDou, “Health Effects of Shift Work,” Western Journal of Medicine 137, No. 6 (December 1982, www.pubmedcentral.nih.gov/articlerender.fcgi?artid=1274227.

20. R. R. Rosa and M. J. Colligan, “Plain Language about Shiftwork,” U.S. Department of Health and Human Services, Public Health Service, Centers for Disease Control and Prevention, National Institute for Occupational Safety and Health, 1997, www.cdc.gov/niosh/pdfs/97-145.pdf.

21. M. E. Kabay, “On Not Knowing,” 2004, www2.norwich.edu/mkabay/opinion/index.htm.

22. M. E. Kabay, “Egoless Work: Take Your Ego out of the Equation,” Network World Security Strategies Newsletter, February 2, 2006, www.networkworld.com/newsletters/sec/2006/0130sec2.html.

23. T. Grance, K. Kent, and B. Kim, Computer Security Incident Handling Guide, NIST Special Publication SP800-61, 2004, http://csrc.nist.gov/publications/nistpubs/800-61/sp800-61.pdf.

24. Grance, Kent, and Kim, Computer Security Incident Handling Guide.

25. www.cert.org/tech_tips/incident_reporting.html.

26. www.first.org/.

27. www.first.org/conference/2007/.

28. www.gocsi.com/netsec/.

29. www.misti.com/default.asp?Page=70.

30. www.rsaconference.com/2008/US/.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.199.184