CHAPTER 19

Secure Software Operations and Maintenance

In this chapter you will

•   Explore the two elements of the sustainment function

•   Learn basic terminology

•   Discover how secure software sustainment is carried out

•   Examine the details of sustainment management

•   Explore the basics of standard sustainment activity

The software development profession uses the term sustainment to describe the traditional post-coding processes associated with software operations and software maintenance. Security plays a key role in sustainment operations. Sustainment characterized the actions that happen in that part of the lifecycle normally referred to as “use.” Since the period of use can amount to well over 90 percent of the entire lifecycle of a product, operations and maintenance are extremely influential in ensuring the continuing security and quality of any given piece of software.

Unfortunately, however, whereas the development phase of the overall lifecycle has been given a lot of attention over the years, the formal methods and techniques that can be used to ensure good operations and maintenance practice have traditionally not been given the emphasis that they deserve. That lack of attention was addressed in the publication of the ISO 12207:2017 Standard. That standard specifies a set of activities and tasks for five primary processes. These processes were designed to characterize best practices for the software lifecycle, and two of these five processes are operations and maintenance.

The current revision of 12207, which was published in 2017, in order to harmonize the software lifecycle model with the system lifecycle model, continues to include operations and maintenance processes within the Technical Process group. At the same time, the Department of Homeland Security’s Common Body of Knowledge to Produce, Sustain, and Acquire Secure Software (2006) places both operations and maintenance into the single category of “sustainment.” In all respects, however, the activity and task specifications of the 12207 standard continue to be maintained. Nevertheless, the placement of these two processes into a single seamless process for oversight and control of the secure evolution of the product tends to provide a better practical, end-to-end view of the conduct of the security assurance aspects of the process than would be possible if operations and maintenance were viewed as two separate processes.

Given the existence of the commonly recognized and accepted activity and task specifications for the operations and maintenance processes, it is possible to talk about how to execute those processes in specific, best-practice terms. Therefore, the aim of this chapter is to convey an understanding of how the essentially interrelated operations and maintenance processes work together to ensure a secure product throughout its useful life. Because operations acts as the interface between the user community and the maintenance function, we will discuss that process first.

Secure Software Operations

As the name implies, the purpose of the software operations process is to “operate” the software product in its intended environment. Essentially, this implies a focus on the assurance of product effectiveness, as well as product support for the user community. Moreover, since the operations process almost always involves attention to the hardware, the operational activities it applies to software assurance should be applied to hardware as well. Because the operations process is an ongoing organizational function, it requires an operational strategy to guide its everyday execution. That strategy centers on getting a standard definition of the terms, conditions, and criteria that will be used to judge the quality of the execution of the product within its intended environment. In addition to overall operational planning, there has to be a standard assurance plan that will guide the routine tests and reviews of the product that have to be done in order to ensure its continuing secure operation within its intended environment. Finally, there has to be an organization-wide policy and procedure that are designed to ensure that customers are given adequate advice and consultation about product operation.

Because of its focus on ensuring the everyday operation of the organization’s software assets, operations work can tend to be viewed as “routine” and therefore lacking in the more interesting creative aspects that characterize development work. Yet, since operations maintains oversight over the day-to-day functioning of the software and supports the organization’s users, it is also the one process that most directly impacts the everyday work of the corporation. Moreover, since it requires adequate numbers of staff to perform all of these everyday functions, operations is arguably one of the most expensive items in the general budget of an IT organization.

The operations process involves a diverse set of tasks, which are performed across the entire organization. Therefore, an appropriate set of standard guidelines and work instructions should be developed and publicized to guide the execution of the everyday process. The overall goal is to define and establish a stable, routine set of activities and tasks to ensure the proper operation of each system, software product, or software service within the organization.

From the point where the system is placed into operational use, the activities in the operations process manage and assure the continuing execution of the system. That responsibility will usually include a diverse range of conventional user support activities as well as interfacing with the separate change management and problem resolution functions. In most instances, the actual management control of the everyday operations will rest with managers at the system or project level. As a result, arrangements have to be made to integrate the work of the operations process into the routine monitoring and control activities of the specific project management process for that system or project.

Operations Process Implementation

In the mainframe days, operators were low-level support people whose duties were primarily aimed at routine user and machine support activities, as well as everyday hardware and system software maintenance that was aimed at ensuring reliable operation. Since all of the operation was in-house and all reporting was through local lines, the operation function did not have to be tied to a formal configuration management process. Problem reports could be handled directly by the maintenance manager.

With the advent of distributed computing, and particularly the use of commercial off-the-shelf software products, the lifecycle could logically be broken into two separate phases. The first phase entailed all of the activities that are needed to either develop or purchase and install the product. The second phase included all of the activities that are carried out to sustain the product as built throughout its useful lifecycle. The operations process is a key player in achieving the goals of this second phase.

In essence, the aim of the operations process is to perform a standard set of activities and their constituent tasks on a day-to-day basis. This everyday work is meant to systematically: 1) monitor and assure the effective everyday operation of the software or system and call any deviations from anticipated execution to the attention of management, and 2) document and track every user-generated problem report or request for change. In that respect, then, the operations process now involves the routine execution of a specific set of procedures that are meant to recurrently test, monitor, and control the operation of each of the products within the organization’s overall software portfolio for the purpose of ensuring each product’s responsiveness to any problems and changes as they arise.

Connecting Operation to Change

In practice, the operator establishes the routine schedules and procedures for the testing and assurance of the product, as well as the reception, recording, and tracking of problem reports and modification requests. In addition, the operator provides feedback about routine performance to the product stakeholders. Whenever problems are encountered in the performance of the routine monitoring activities, they are recorded and entered into the software change management system, which is part of configuration management.

Because of the importance of configuration management in the maintenance of each software asset’s operational integrity, the operator always has to implement, or establish an organizational interface with, the configuration management function. The purpose of this is to ensure that all of the problem reports and change requests that the organization receives are acted on by configuration management and that proper feedback regarding changes is passed back to the originators of the request. In that respect, operations acts as the necessary “front end” to configuration management.

Planning for Secure Operation

Like most processes, the operations process starts with strategic planning. The planners develop and document a plan that aligns the process with existing organizational policies for the secure sustainment of a given software product within that particular environment. The plan describes the specific activities and tasks that fulfill the routine goals and purposes of those policies. At the same time, the organization develops a strategy and an associated set of operational standards to carry out the common generic activities and tasks of the operations process.

The plan is normally documented and then deployed as a day-to-day set of practices. As a result, the operational plan ought to communicate a clear set of real-world practices, which will most effectively implement as well as support the specific operations process within its intended environment. In addition, the operational plan should specify a mechanism for providing direct feedback to management about any problem or change request that might have been submitted by the user community. In regard to the latter requirement, the operational plan needs to specify a concrete set of procedures for receiving, recording, resolving, and tracking problems.

For the purpose of efficient communication, relevant information about any proposed changes, such as costs and impacts, should be routed both to the stakeholder who submitted the change request in the first place and to the organization’s decision makers who will be responsible for approving any ultimate action. A mechanism should also exist for documenting and recording any unresolved problems that might exist and for entering those into the problem resolution process. Finally, in order to assess whether the change has been made correctly, there must be procedures for verifying and validating the product.

Because the maintenance process will be required to maintain a change once it has occurred, there must be an organizational mechanism for moving all of the documentation that is generated about the proposed change into the maintenance process. In order to facilitate that exchange, the operator should establish procedures for obtaining operational information about the functioning of the software product in its operational environment and then passing that information along to the software maintenance process. The change is then made within maintenance, and the resultant new version and its documentation are released for operational use.

Operational Monitoring and Control

In the division of labor within the overall sustainment process, it is generally the operations function that is responsible for performing the routine procedural testing on every released version of the software. Upon satisfying whatever criteria that are specified for each new release, the operator is empowered to promote the product to full operational status. In order to make this decision, the operator must confirm that all meaningful operational aspects of the new release, including the status of the code and the functioning of the database, conform to the specifications of the change or modification plan. Upon demonstrating satisfaction of the specified criteria, the appropriate decision maker then authorizes the product for operational use. The standard assessments that are used by the operator to support that decision include confirmation that the software code and database initialize, execute, and terminate as described in the plan.

Finally, once an item has been approved for release, there has to be a formal scheme to recognize that the product has actually been delivered back for operational use in the organization. In addition, that recognition has to be communicated back to the entity that requested the change. There might be instances where there is an agreement to assure that a new system that is replacing an existing system maintains a similar degree of capacity and quality. In those instances, the developer might also be obligated to operate the new product during a specified period of concurrent operation in order to ensure that the product continues to conform to the requesting stakeholder’s requirements.

Customer Support

Given its status as the designated interface with end users, the operations process is responsible for providing all necessary advice and product support as needed. Each user request for support and the subsequent actions that are taken to resolve it must be carefully recorded and maintained as part of the configuration management process. The operations manager, who in many small companies may also fulfill the role of configuration manager, distributes the routine user requests that the company receives to the appropriate parties for resolution. These requests are normally termed “change requests” or CRs.

Because part of customer support involves receiving change requests from the user community, the operator also has to establish a reliable process to forward user requests to the software maintenance process for resolution. Normally, the steps that are taken to respond to any formal change request that involves the development of new code, or even new versions, are actually taken by the staff of the maintenance process. All new requests must be analyzed and recommendations for addressing them must be planned and authorized for subsequent action. However, once a request has been analyzed and approved and corrective actions are taken, it is the responsibility of the operations process to report the outcome and any potential impacts of the change to the originators of the request.

All resolutions to reported problems are ensured by the customer support process. It should go without saying that the operations process is very dependent on configuration management to enforce the necessary control during the problem resolution process itself. Because of that dependency, there is an explicit requirement that any change requests submitted to operations must be monitored to their conclusion and a sign-off from configuration management obtained once completion conditions have been satisfied. Sometimes, because of the realities of business operations, temporary bridges may have to be created to address a problem. These are usually intended to furnish short-term relief to the business stakeholder while a long-term solution is being prepared. If it is necessary to adopt a work-around, the originator of the affected change request must be given the opportunity to choose whether to employ the proposed short-term solution while the problem is being corrected.

As part of good practice, the operator always provides assistance and consultation to the users. Assistance and consultation include the provision of training, documentation, and other support services necessary to ensure the effective use of the product. Support is generally provided as requested by the users in the field, but it can come as a result of planning—for example, a scheduled training. If the request for support comes from the user community, those requests and subsequent actions should be recorded and monitored for their proper resolution.

Ensuring the Service Operation

Ideally, all software assets that fall under the authority of the operations process are continuously ensured to be functioning as built within their intended environment. However, while this routine product assurance is taking place, the operations process also monitors any routine service activity that might be part of the operational portfolio of the organization.

Where appropriate, this monitoring takes place against defined criteria for service or a service level agreement (SLA). Judging whether a service operation has been done correctly requires the organization to develop a set of service criteria for operational use. These criteria are necessary so that compliance with contractual requirements can be documented, and the performance of operational testing will then be able to demonstrate satisfactory results against those criteria. In addition to operational service criteria, it is necessary to identify and monitor risks that might arise in the subsequent performance of the service operation.

The Software Maintenance Process

The specific purpose of the software maintenance process is to provide cost-effective modifications and operational support for each of the software artifacts in the organizational portfolio. In addition, maintenance provides ad hoc services for the organization’s stakeholders, including such activities as training or operating a help desk. In real-world practice, maintenance goes hand in hand with the operations process. Normally, the maintenance process involves those activities and tasks that are typically undertaken by the organizational entity that has been designated to fulfill this role. However, since maintenance is also likely to entail other organizational processes, specifically development, it must be understood that the maintenance operation should never function as a stand-alone element of the organization.

A maintenance strategy is the first requirement for the successful establishment of the software maintenance process. Generally, that strategy centers on understanding the impacts of changes to the existing information system or organizational process. All operations or interfaces affected by any proposed change are identified, and the system and software documentation that is involved are updated as specified by the plan. Maintenance oversees all proposed modifications and conducts any tests that are necessary to demonstrate that overall system requirements have not been compromised.

In addition, maintenance is responsible for migrating all product upgrades into the customer’s operational environment. Maintenance communicates to all affected parties the extent and impacts of any modifications that have been performed. The goal of this aspect of the maintenance process is to ensure that the integrity of all of the organization’s software products and systems is preserved while undergoing change.

Because of its focus on preserving integrity, the maintenance process is built around the rational management of change. This particular aspect of maintenance is an important function in a software organization. That is because, if properly carried out, maintenance will curtail the normal degradation of control over the logic and understandability of the organization’s software assets—typically termed “spaghetti code.” The maintenance process does this by establishing a rational framework within which the natural consequences of technical evolution can be effectively and efficiently managed.

The maintenance process normally comes into play when it is necessary to modify the code or associated documentation for a project. This modification is usually a consequence of a reported problem or a request for a change or refinement. As we saw in the prior section, in practice, a problem report is passed to the maintenance process through the actions of the operations process. However, we will also see that besides modification, maintenance’s responsibilities include tasks that support the migration of the software product as well as the retirement of obsolete software products.

In general, maintenance is composed of planning, control, and assurance and communication activities. In effect, the maintenance process originates from a user-generated request to change, modify, or enhance an existing system, software product, or service. The goal of maintenance is to control those changes in such a way that the integrity and quality of the product are preserved. In practice, the execution of the maintenance process itself is concerned primarily with the consistent documentation and tracking of information about the artifact. In conjunction with the routine practices of everyday housekeeping, the maintainer may also be required to perform activities that would normally be associated with development.

Images

EXAM TIP    A CSSLP should be familiar with the security aspects of the following ongoing activities associated with operations/maintenance of software: monitoring, incident management, problem management, and change management (patching/updating).

The maintainer executes the maintenance process activities at the project level, and the process itself is administered through activity instantiated by the project management functions of the organization. The maintainer establishes a standard infrastructure for executing maintenance activities and tailors individual maintenance work by following a standard process. In addition, the maintainer enhances the process at the overall organizational level following the recommendations of the training process. When the maintainer performs general maintenance service, the maintainer typically does that by contract.

Any formally documented problems (e.g., those problems for which a change request [CR] has been filed) must progress through a problem resolution procedure. Accordingly, the maintenance plan has to describe a mechanism for interfacing between maintenance and problem resolution. Furthermore, since configuration management is the normal mechanism employed by the organization to authorize and control change to the software portfolio, the maintenance plan also has to explicitly describe the steps that will link the maintenance process with configuration management.

Monitoring

The monitoring portion of the operations and maintenance phase is characterized by a single key task, which is to monitor the state of assurance associated with the system within its target environment, as well as respond to any incidents as they occur. In essence, the goal of this type of everyday assurance is to maintain an ongoing state of trust among all of the organization’s stakeholders that the system is secure and performing its functions according to plan. It is impossible to ensure every artifact in the organization’s inventory, so a carefully prioritized review process is usually the mechanism that is adopted to underwrite operational assurance.

Several givens need to be recognized in order for an operational assurance process to be effective. First, the routing, day-to-day monitoring, and reporting responsibility has to be connected to, but not directly a part of, the overall assurance effort. Second, the operational assurance activity has to have an explicit enforcement mechanism built into it that is designed to ensure that any and all requisite monitoring and reporting practices are being followed. Finally, a formal conflict resolution process, which is conventionally termed “problem resolution,” is required to ensure that all open items and nonconcurrences are resolved.

Organizationally, there are a few simple rules to follow when establishing a continuing operational review and reporting activity. First, logically, the organizational entity that is responsible for doing the reviews should not report directly to the manager of the project that is under review. However, the review team should always report to a designated agent of local management, not some distant decision maker. Finally, from an organizational placement standpoint, there should be no more than one position between the manager of the review team and a senior site manager.

Human factors also have to be considered when creating and staffing a review team. First, review professionals perform different functions than the people in development, so it is important to utilize professionals who have a sustainment rather than a development focus. In that respect, the responsibilities fall under the operational review role. People who do operational reviews should be allowed to inspect existing development and maintenance plans for alignment with organizational goals. They should have the technical capabilities to be able to participate in design and code inspections, as well as be able to review all unit test plans for adherence to contractual standards and criteria. And finally, operational review professionals must be able to interpret existing test results in order to determine whether the tests themselves have adhered to testing plans. Then, if a deviation is encountered, review professionals must be empowered to register and act on nonconcurrences.

When examining the monitoring requirements of the output of a software development lifecycle (SDLC)—that is, a piece of software—these requirements should be specified and delineated during the requirements process. What is to be logged, how often, and how it is to be used during the operations phase of the software are important questions that need to be fully developed in the design of the software. The implications of this design provide necessary information during the operations phase through elements that can then be monitored. A guide to what needs to be monitored can be summed up from the axiom “to manage something, one must measure it.”

Images

NOTE    Connecting the operational dots between what needs to be monitored to ensure proper and secure operation relies upon the necessary information being exposed by the software in some form, typically either alerts or logs.

Incident Management

An incident is any event that disrupts normal operation of the software or system. Incidents can be caused by everything from user errors to hacking exploits and other malicious activity. The role of incident management is to maintain an incident response capability for the software organization over time. Generic incident response involves a set of logical monitoring, analysis, and response actions. The incident response management function deploys those actions as a substantive and appropriate response to each adverse event as it happens. Incident response management ensures that any potentially harmful occurrence is first identified and then reported and the response coordinated and managed.

The incident response management process applies whether the organization is reacting to a foreseen event or is responding to an incident that was not anticipated. The only difference in the actual performance of the process is in whether the substantive steps that are taken to mitigate any given incident have been planned in advance. For example, most organizations have a set of specific procedures in place to respond to the identification of a new vulnerability in the product’s code. The actual organizational response to that identification can be prescribed in advance because the presence of vulnerabilities is a given in software. Therefore, appropriate analysis, design and coding, and distribution of patches can take place along standard procedural lines. In that respect then, the presence of a standard operating procedure to address a newly identified vulnerability will ensure a timely and generally appropriate response to the problem of chronic defects in code.

However, many types of incidents are unforeseen. If a problem with the software is unforeseen, the aim of incident response management is to ensure that the nature of the incident is quickly understood and that the best possible response is deployed to address it. The key to ensuring effective response is a well-defined and efficient incident reporting and handling (aka response) process. For the sake of effective coordination, the actual report of any new occurrence should be submitted to a single central entity for management. Central coordination is an important aspect of responding to unforeseen incidents because novel threats do not usually present themselves in a single neat package. Instead, undesirable occurrences can pop up that are characteristic of an impending failure, or even an outright attack. In order to assemble a meaningful picture, it is important to have a single entity that is responsible for receiving, analyzing, and responding to diverse data coming in from disparate sources. Then, once the nature of the incident is understood, the coordinating entity can ensure that the proper information is sent to the right authorities who can make a decision about how to respond.

Monitoring and Incident Identification

Incident response is initiated when a potentially harmful event occurs. The incident response is then set in motion through a formal incident reporting process. Incident reporting ensures that every possibly damaging event gets an organizationally sanctioned response. Consequently, effective incident reporting is founded on a monitoring function. The monitoring must gather objective data that decision makers can understand and act on. In addition, the monitoring has to provide the most timely incident analysis possible. The goal of effective incident identification is to be able to distinguish a potential vulnerability in the code, or an attempt to exploit the software in some fashion, or even the commission of an unintentional user error.

The aim of the incident monitoring process is to identify a potentially harmful event in as timely a fashion as possible. The monitoring techniques that are used to ensure such timeliness can range from tests, reviews, and audits of system logs all the way up to automated incident management systems, dynamic testing tools, or code scanners.

Incident Reporting and Management Control

For the purpose of effective management control, incident reports have to reach the right decision makers in as timely a manner as possible. Once the occurrence of an incident can be confirmed and its nature understood, an incident report is filed with the manager who is responsible for coordinating the response. That individual is normally called an “incident manager.” The incident report will document both the type and assessed impact of the event. The incidents that are reported should not just be limited to major events. Incidents that might be reported would include everything from routine defects that are found in the code, such as incorrect or missing parameters, all the way up through the discovery of intentional objects embedded in a piece of software, such as trapdoors and Trojan horses. If the incident has been foreseen, the response would typically follow the agreed-upon procedure for its resolution. That procedure is normally specified in an incident response plan that would be utilized by a formally designated incident response team.

Whatever the nature of the incident, the incident reporting process has to lead directly to a reliable and appropriate response. Executed properly, that response should be handled by a formally designated and fittingly capable incident response team. Incident response teams work like the fire department. They are specialists who are called upon as soon as the event happens (or as soon as reasonably possible after the event happens), and they follow a process that is drilled into them to ensure the best possible response. In most instances, the incident response team should be given specialized training and equipment that are designed to ensure the best possible solution to the problem. The fact that the incident response team has been given that training and equipment to achieve that purpose also justifies why a designated set of responders should deal with the event rather than the local people who might have reported it in the first place.

Software can be designed to assist in incident response efforts through the proactive logging of information. How can the team decide what to add in for monitoring? Threat modeling can provide valuable information that can be used to proactively monitor the system at the points that the information is most easily exposed for monitoring.

Images

NOTE    When developing software, it is important to take a holistic view toward security. There are many opportunities to learn crucial pieces of information that can be used at other points of the development process. Properly communicating them across team and development boundaries can result in stronger system performance and security. Many items that are useful in the secure operation of software as part of an enterprise need to be developed in earlier portions of the SDLC process. Beginning with the end in mind can assist in the creation of a securable system.

Anticipating Potential Incidents

In practice, incidents are either potential or active. Potential incidents include things like defects that are identified but that don’t appear to have a present threat associated with them, or previously unforeseen flaws that are identified in the organization’s inspection and testing capabilities. One other source could be the formal notification of the existence of a potential security vulnerability by an outside organization, such as Microsoft or United States Computer Emergency Readiness Team (US-CERT). Potential incident reports can also result from internal analyses done by the actual users of the software—for instance, IT management can get a notice from the user community that a fault or vulnerability has been identified during the use of the product.

The obvious advantage that potential incidents have over active incidents is that it is possible to craft a proper response or patch for the danger before it happens. The process of thinking through that response is typically supported by a comprehensive analysis of all of the technical and business factors that might be associated with any potentially harmful outcomes. Once the likelihood and impact of a known flaw are understood, a patch or change in procedure can be developed and put in place that most appropriately addresses the problem.

Responding to Active Incidents

With active incidents, the organization does not have the luxury of time to develop a proper response. If it is possible to confirm that an active incident, such as an exploitation of a previously unknown flaw, is taking place in the software, then the appropriate corrective action must be immediately undertaken. Corrective actions are dictated by the circumstances. They can range from immediately developing and applying a patch to the defect or reconfiguration, all the way to a change in procedure or the implementation of a new kind of enforcement approach during the review and testing phases.

Because they have the explicit expertise, the incident response team that was discussed earlier should also be responsible for responding to an active incident. The goal of the incident response team with regard to an active incident is to work to limit any additional damage and to study the circumstances of the occurrence in order to prevent it from happening again. Where the incident originated from problems with the software, the incident response function supervises the implementation of the patch or change to the target system. Where a change to overall organizational policy or procedure is necessary, the incident response team facilitates any necessary coordination and training to prevent any recurrences.

Establishing a Structured Response

As we have seen, it is a given that problems will be identified in software. Therefore, it is essential to be able to respond to any identified problem in as effective a fashion as possible. The level of that effectiveness is determined by how well the organization has prepared itself to deal with incidents. The role of the incident management function is to ensure that adequate preparation has taken place to respond to incidents as they occur. Given that adequate preparation is a key requirement, it is essential to develop specifications for procedures that will dictate the precise steps to be taken in response to both passive and active incidents. Accordingly, a detailed set of best practices should be identified and documented to establish the organization’s incident response capability.

The structure of the overall incident response process will vary depending on the organization and its business requirements. Nevertheless, there are some standard issues that should be considered when putting together an incident response package. Practical management considerations, such as who is authorized to initiate the incident response and how much specific authority is required to direct a response to a given circumstance, have to be specified in order to ensure the right response to a specific event. The set of itemized practices for a given situation is normally formalized in an organizational procedure manual that guides the incident response process. That procedure manual will typically describe the ideal set of actions that is required to address each common type of incident.

Along with a prescription of organizationally standard practices, the incident response manual provides a definition of terms and concepts for the process. This is needed in order to prevent misinterpretations and misunderstandings in the incident response team. In that respect, the procedure manual should also specify the precise role of the members of the incident response team. A clear definition of what those specific roles and responsibilities are makes certain that every member of the team will be on the same page during the execution of the incident management process.

Ensure Enough Resources

One of the first tasks of incident response management is to make certain that the right resources are available to address every foreseen incident. In that regard then, a proper balance has to be struck between deploying an appropriate response and overreacting to the incident. The designated incident response manager is the person who gathers the facts about the incident and analyzes and deploys the initial response. Then, once resources have been deployed, the manager responsible for doing the actual incident response work can decide whether enough resources have been deployed or whether to escalate the incident to a higher management level in the organization.

The decision to escalate is based on the analysis of the particular incident in question and any appropriate guidance that is provided by incident management policies. Numerous factors go into any incident response decision. Not responding can result in losses and interruptions to business. Improper or delayed responses can also lead to losses and interruptions. Delays in responding to incidents increase the risk exposure time to the enterprise. Errors in the response approach can impede any subsequent legal actions against malicious parties. A set of well-designed policies can assist incident response managers in their decision making, as well as improve the odds of a successful response.

Images

NOTE    Metrics can provide management information as to the effectiveness of and trends associated with security processes. Measuring items such as the times between the following items can provide insight into the effectiveness of the response function: incident occurrence and detection; detection and response; response and containment; and containment and resumption.

While tracking these factors as a whole helps improve overall response time, tracking them individually enables the identification of “targets of opportunity” to reduce overall response time.

Managing the Incident Response Team

The strategy and goals for the incident response process will dictate the composition and actions of the incident response team. Selecting the right mix of staff is an important part of creating the team. Usually, the incident response team is composed of an experienced team manager, expert software analysts and programmers, cybersecurity and computer crime specialists, and sometimes even legal and governmental or public affairs experts.

With each incident, the team manager examines all available information in order to determine which members of the team are immediately needed for the investigation. Factors that the team manager might consider include the number and type of applications and operating systems affected by the reported problem, the system assets that were attacked and the sophistication of the attack, business impact, adverse publicity, internal political issues, and any corporate liability. The aim is to provide the optimum response to every known aspect of the incident without deploying unnecessary or unneeded personnel.

Problem Management

Because the operations and maintenance processes interact constantly with the user community, one of their most important tasks is to forward any reported problem to management for resolution. As we said in the prior section, if a reported problem has a temporary work-around before a permanent solution can be released, the originator of the problem report might be given the option to use it. However, permanent corrections, releases that include previously omitted functions or features, and system improvements should be applied to the operational software product using the software configuration management process.

Since software is complex, all of the effects of the problem may not be readily apparent at the time it is reported. Therefore, once a formal problem report or software change request has been submitted, the organization is obliged to offer a resolution. That resolution begins with a thorough analysis of the potential impact of any reported problem or proposed change. The maintenance team first analyzes the problem report or modification request in order to determine its impact on the organization, the affected systems, and any interfaces with those systems in order to determine the type, scope, and criticality of the proposed action. Type simply refers to the corrective, improvement, preventive, or adaptive action that might be required. Scope is a measure of the size of the modification, as well as all of the costs involved, and the time it will take to perform the modification. Criticality assesses the impact of the requested change on the overall performance, safety, or security of the organization.

The analyst must first determine whether a problem actually exists. That typically entails attempting to either replicate or verify the problem’s presence. Then, based on the results of the analysis, options for implementing the modification are examined. At a minimum, the analysis must provide a map of all interfaces between the items that will be changed and an analysis of the anticipated impacts on affected items, along with an estimation of the resources required to execute the change. Following that analysis, the maintainer develops options for implementing the modification. Once these options are characterized, the maintainer prepares a report detailing the elements of the request and the results of the analysis, as well as the various implementation options that might be selected. This report is presented to the appropriate authorization agent, which is often the designated configuration control board. The decision makers then authorize or approve the selected modifications.

Images

NOTE    The use of management tools and techniques such as root cause analysis can assist in secure operations. Bug and vulnerability tracking and end-user support are inexplicably linked, and proper data logging can improve the ability to use this information in correcting issues in software across releases.

Once the authorizing agent documents organizational approval, a precise statement of the work that will be required is developed, documented, and communicated to the appropriate body to implement the change. The role that coordinates the performance of the actual change request activity is often called a change manager. A person fulfilling the change management role performs all of the monitoring, management control, and reporting functions to ensure that the change is done correctly. Whatever the title of this person, it is essential that an impact analysis is done in order to determine whether any residual effects are present. This analysis includes identifying any areas of continuing trouble with any of the systems that have undergone change, as well as any other systems that might interact with those systems.

Modification Implementation

Every software product involves a number of different performance issues. So in order to determine the steps necessary to implement a modification after it has been approved, the change manager has to first do an analysis to determine which software unit and/or version is to be included in that modification. Once the affected items are identified, all aspects of their structure and interfaces are documented, and the maintainer employs the appropriate technical processes that are necessary to implement the modifications.

First, the change agent, which in most cases is actually the development function, performs a thorough analysis to determine exactly which software items require change. That includes examining all of the documentation for all of the versions that have been identified in the authorized statement of work (SOW). The outcome of this analysis is usually a formal specification of software requirements (SRS) for the change. Then the change agent implements that modification as specified.

As we just said, this is usually done by development. The requirements of the technical processes are supplemented by any test and evaluation criteria that might be used to evaluate the outcome of the resulting modifications. Since those modifications are likely to involve both affected and unaffected components of the same system, it is necessary to record the actions that have taken place only on the affected components. The implementation of the new and modified requirements also has to be subsequently ensured for completeness and correctness. More importantly, steps need to be taken to ensure that any unmodified requirements of the system were not affected.

Maintenance Review/Acceptance

Once the modification has been made, the change manager has to conduct a review(s) with the requesting organization in order to determine that the integrity of the modified system has been preserved. The change agent has to ensure the satisfactory completion of the modification as specified in the statement of work. Therefore, the maintainer conducts a review with the individual or organizational entity that wrote that SOW. This review is undertaken as soon as the change agent, usually development, confirms that the change has been made.

The purpose of this latter step is to attest that the change is correct and complete. The documentation artifact that correctness is verified against is the specification that was to guide the change. Once the change has been verified to be correct, the organization must ensure that the change is properly integrated back into the system. The approval for reintegration is provided by the authorizing agent, which is usually the appropriate configuration control board. Approval is based on the ability to reliably map the modified components back into the changed system.

Finally, once the change has been verified and the integration plan approved, some sort of final sign-off is required. This acceptance can amount to something as simple as a note in a system log that the change was successfully made, to a formal change audit procedure. In every respect, however, the sign-off must be obtained in order to close the loop on each change process.

Change Management

Software is an element of the enterprise that frequently undergoes an upgrade process. Whether the change is to fix a bug or a vulnerability (referred to as a patch), or the change is to introduce new functionality, as in an upgrade, it is important that these changes be performed under a change management process. The primary purpose of a change management process is to protect the enterprise from risk associated with changing of functioning systems.

Patching

Changes to software to fix vulnerabilities or bugs occur as the result of the application of patches from the vendor. There are numerous methods of delivery and packaging of patches from a vendor. Patches can be labeled as patches, hot-fixes, or quick fix engineering (QFE). The issue isn’t in the naming, but in what the patch changes in the software. Because patches are issued as a form of repair, the questions that need to be understood before blindly applying them in production are, “What does the patch repair?” and “Is it necessary to do so in production?”

Patch release schedules vary, from immediate to regularly scheduled events per a previously determined calendar date—for example, Patch Tuesday. Frequently, patches are packaged together upon release, making the operational implementation easier for system administrators. Periodically, large groups of patches are bundled together into larger delivery vehicles called service packs. Service packs primarily exist to simplify new installations, bringing them up to date with current release levels with less effort than applying the myriad of individual patches.

One of the challenges associated with the patching of software is in the regression testing of the patch against all of the different configurations of software to ensure that a fix for one problem does not create other problems. Although most software users rely upon the vendor to perform regression tests to ensure that the patch they receive provides the value needed, software vendors are handicapped in their ability to completely perform this duty. Only the end users can completely model the software in the enterprise as deployed, making the final level of regression testing one that should be done prior to introducing the patch into the production environment.

Images

EXAM TIP    Patch management is a crucial element of a secure production environment. Integrating the patching process in a structured way within the change management process is important to ensure stability and completeness.

Backup, Recovery, and Archiving

Backups are one of the cornerstones of a security program. The process of backing up software as well as the data can be a significant issue if there is ever a need to restore a system to an earlier operational point. Maintaining archives of earlier releases and the associated datasets is an important consideration as production environments are upgraded to newer versions. It is also important to remember the security implications of storing archive sets. Employing encryption to protect the sensitivity of these archives is a basic, but often overlooked, step. In the event it is legally necessary to produce data from an earlier version of the system, a restore of not just the data but the software version as well may be necessary.

A retention cycle is defined as a complete set of backups needed to restore data. This can be a full backup set or a full backup plus incremental sets. A cycle is stored for a retention period as a group, for the entire set will be needed to restore. Managing the retention cycles of data and programs with the retention periods determined as appropriate is one of the many considerations that operations personnel need to keep in mind when creating archives and storing them for future use. Whether this process is done via the change management process or part of the system lifecycle process is less important than ensuring that it is actually performed in the enterprise.

Secure DevOps

DevOps is a form of software maintenance where changes are introduced virtually directly into production. On its face, this seems crazy and fraught with risk, but the processes involved in DevOps include highly refined and automated processes to reduce risk. Risk can come from large software changes that create significant changes in a program, as opposed to smaller, more focused changes. Rather than wait until a significant number of changes are collected, DevOps works on the principle of applying changes as they are created. DevOps was developed in response to the slow change that quarterly, semiannual, or annual update cycles force on the marketplace. With these larger changes, the level of testing required to ensure against different element interactions became one of the major time drivers. Many small changes were faster to implement, faster to test, and faster to correct if issues arose. One of the major safeguards that is used in DevOps is a high level of automation to implement changes and back out changes if they fail.

DevOps has become the principal method of advancing large technological platforms with multitudes of redundant servers. In Gmail, Google has deployed tens of thousands of Gmail servers worldwide to manage the vast email network. As changes are rolled out to this network, they are done so incrementally. Only portions of the system are changed at a given time. Other automated tasks manage the actual change process, including the implementation of the change, post-change operational testing, and automated rollback (including subsequent halting of the change process across other servers) when issues arise. This enables developers to incrementally fix and evolve code. DevOps is not skipping testing—testing still occurs before moving code to production. The magic behind DevOps is the reduction of lengthy test cycles that accrue from the large-scale changes in major releases, where multitudes of potential interactions must be tested.

Secure DevOps is a process where the development team is responsible for all aspects of code, including security all the way to production. Gone are the hand-offs to another team that may or may not have the knowledge needed to handle code-related issues if errors arise. Through the use of high degrees of automation to manage the error-prone processes of deployment, post-deployment testing, and rollback, DevOps has brought the advantages and ideas of agile methods into software maintenance and deployment. Just as adding security to agile methods does not change the nature of agile methods, security can be added to DevOps methods to manage the risk associated with maintenance efforts.

Continuous Integration/Continuous Development (CI/CD)

Another name for DevOps is Continuous Integration/Continuous Development (CI/CD) where organizations that have embraced the DevOps philosophy make use of automation to manage many maintenance functions. Scripts are used to deploy updates to production without human intervention rather than use a manual review/approval process. Security gates are coded instead of manually reviewing/testing, with the eventual desired outcome of infrastructure and security as code, not as documentation to be reviewed or approved. This has led to many advantages, as changes can be invoked in much smaller increments, making detection of issues easier to observe and correct.

Secure Software Disposal

The purpose of the software disposal process is to safely terminate the existence of a system or a software entity. Disposal is an important adjunct to security because of magnetic remanence. In simple terms, old systems retain their information even if they have been put on the shelf. So it is necessary to dispose of all software and hardware products in a way that ensures that all of the information that is contained therein has been secured. Also, the decision to perform a disposal process will, by definition, cease the active support of the product by the organization. So at the same time the information is being secured, disposal will guide the safe deactivation, disassembly, and removal of all elements of the affected product. Disposal then transitions the functions performed by the retired system to a final condition of deactivation and leaves the remaining elements of the environment in an acceptable state of operation. The disposal process will typically delete or store the system and software elements and related product deliverables in a sound manner, in accordance with any legal agreements, organizational constraints, and stakeholder requirements. Where required, the disposal process also maintains a record of the disposals that may be audited.

Common sense dictates that the affected software or system can only be retired at the request of the owner. Just as in every other aspect of IT work, a formally documented disposal plan is drawn up to carry out the disposal process. That plan will detail the approach that will be adopted by the operation and maintenance organization to drop support of the system, software product, or service that has been designated for retirement.

In both the case of disposal and retirement, the affected users have to be kept fully and completely aware of the execution of retirement plans. Best practice stipulates that every notification should include a description of the replacement or upgrade of the retiring product with a date when the new product will be made available. The notification should also include a statement of why the software product is no longer being supported and a description of other support options that might be available once support has been dropped. Since a system is usually retired in favor of another system, migration requirements likely will be involved also in the disposal process. These requirements often include conducting parallel operations and training activities for the new product.

When it is time to switch over to the new system, the users have to be formally notified that the change has taken place. Also, for the sake of preserving future integrity, all of the artifacts of the old system have to be securely archived. Finally, in order to maintain a sense of organizational history, all of the data and associated documentation that were part of the old system have to be made readily available to any interested users. The procedures for accessing this information are usually spelled out in the plan that provides the guidance for the specific retirement activity.

A disposal strategy has to be provided and disposal constraints have to be defined in order to ensure the successful implementation of the disposal process. Once the plan is drawn up and approved, the system’s software elements are deleted or stored and the subsequent environment is left in an agreed-upon state. Any records that provide knowledge of disposal actions and any analysis of long-term impacts are archived and kept available.

Images

EXAM TIP    End-of-life (EOL) policies should include sunsetting criteria, a notice of all the hardware and software that are being discontinued or replaced, and the duration of support for technical issues from the date of sale and how long that would be valid after the notice of end of life has been published. This allows customers to align their operational timelines appropriately with the developer’s product timeline.

Software Disposal Planning

Like all formal IT processes, disposal is conducted according to a plan. The plan defines schedules, actions, and resources that: 1) terminate the delivery of software services; 2) transform the system into, or retain it in, a socially and physically acceptable state; and 3) take account of the health, safety, security, and privacy applicable to disposal actions and to the long-term condition of resulting physical material and information. Disposal constraints are defined as the basis for carrying out the planned disposal activities. Therefore, a disposal strategy is defined and documented as a first step in the process. This plan stipulates the steps that will be taken by the operations and maintenance organizations to remove active support.

The key element in this strategy is ensuring a smooth transition from the retiring system, so any planning activities have to include input from the users. The software disposal plan defines for those users when and in what manner active support will be withdrawn and the timeframe for doing that, as well as how the product and its associated elements, including documentation, will be archived. Responsibilities for any residual support issues are defined by the strategy, as well as how the organization will transition to a new product replacement, if that is the eventual goal of the organization. Finally, the method of archiving and ensuring accessibility to relevant records is defined and publicized to all affected stakeholders.

Software Disposal Execution

Then the software disposal plan is executed. The key aim of this plan is to ensure an efficient transition into retirement. Therefore, users have to be given sufficient timely notification of the plans and activities for the retirement of the affected product. These notifications ought to include such things as a description of any replacement or upgrade to the product, with its date of availability, as well as a statement of why the product will no longer be supported. Alternatively, a description of other support options, once support has been dropped, can also be provided.

It is good practice to conduct parallel operations of the retiring product and any replacement product in order to ensure a smooth transition to the new system. During this period, user training ought to be provided as specified in the contract. Then, when the scheduled retirement point is reached, notifications are sent to all concerned stakeholders. Also, all associated development documentation, logs, and code are placed in archives, when appropriate. In conjunction with this, data used by or associated with the retired software product needs to be made accessible in accordance with any contract requirements for audits.

Chapter Review

This chapter opened with an examination of the issues associated with secure software operations and maintenance. Issues pertaining to monitoring software activity and performance, including metrics, audits, and service level agreements, were covered. The integration of software and logging artifacts in support of incident management as well as problem management was covered. The value of change management processes with regard to patching was presented as a structured methodology to manage any associated risk.

The role of backups and archives—not just for data, but also for software versions—was presented as a means to mitigate risk in the event that restores are needed to previous versions of software. The chapter concluded with a discussion of the elements associated with software disposal activities at the end of the lifecycle.

Quick Tips

•   The operations process is not, strictly speaking, part of the development lifecycle. Instead, operation starts when the product is released from development.

•   Customer support requests need to be coordinated by a single entity. This entity is normally part of the operations process.

•   Operations activities are often written into the contract to provide post-release sustainment services.

•   The interface between the user community and the system operation is a critical point of failure, so the establishment of a problem reporting process is a critical role.

•   Operations’ day-to-day function is to perform routine tests and reviews to ensure that the system continues to function as built and meets stakeholder requirements.

•   Maintenance and operations work together to ensure that any problem or request for modification is responded to appropriately by the organization’s management.

•   All changes have to be verified prior to releasing the changed software for operational use.

•   Incident management is a critical function that has to be planned. Planning creates responses to incidents that can be foreseen, as well as procedures for incidents that are not foreseen.

•   Because incidents can impact the entire organization, incident response planning is strategic. As a result, upper-level management has to be involved in the development of the incident response manual.

•   Besides being planned, incident response should also be executed as a specialized function of the organization, with particular staffing and equipment requirements.

•   Secure disposal is an often-overlooked process because it involves a post-lifecycle period. However, because of magnetic remanence, it is essential to ensure that all retired products have been disposed of properly; otherwise, their information can be stolen.

•   Secure disposal ensures an effective retirement of the product. All stakeholders have to be kept in the loop with the change, and the continuation of all user functions must be assured.

Questions

To further help you prepare for the CSSLP exam, and to provide you with a feel for your level of preparedness, answer the following questions and then check your answers against the list of correct answers found at the end of the chapter.

1.   Operations involve:

A.   Writing code

B.   Performing tests and reviews

C.   Developing user specifications

D.   Ensuring proper design

2.   It is important to test the changed product:

A.   Prior to acceptance

B.   After use

C.   Prior to reintegration

D.   Within its environment

3.   Part of customer service is:

A.   Receiving change requests

B.   Developing new code

C.   Assuring personnel

D.   Ensuring adequate resources

4.   Configuration management exercises:

A.   Rational control over the testing of the code

B.   Rational control over the design

C.   Rational control over the change and reintegration process

D.   Enforcement of the procedures for customer support

5.   Incident response processes should be:

A.   Routinely executed

B.   Operationally complex

C.   Strategically planned

D.   Totally constrained

6.   Maintenance functions are independent of:

A.   Operations

B.   Development

C.   Testing

D.   Acceptance

7.   Disposal involves deployment of processes to:

A.   Ensure against magnetic remanence

B.   Ensure magnetic remanence

C.   Create magnetic remanence

D.   Ensure effective remanence

8.   Maintenance handles the:

A.   Analysis of the design and code

B.   Analysis of the problem report

C.   Submission of the problem report

D.   Routine testing and reviews

9.   Incidents can be:

A.   Positive or negative

B.   Active or potential

C.   Passive

D.   Major changes

10.   Decisions about requested changes are made based on their potential:

A.   Impact and likelihood

B.   Efficiency and effectiveness

C.   Validation and verification requirements

D.   Project management features

11.   The useful lifecycle describes:

A.   The time that it takes to develop a useful system

B.   The process of deciding whether a system is useful

C.   The lifecycle of the product after release

D.   The actual timing requirements for release

12.   The monitoring process is:

A.   Difficult

B.   Resource intensive

C.   Rational control of change

D.   Continuous after release

13.   Disposal must always closely coordinate with:

A.   System development

B.   The system stakeholders

C.   The documentation process

D.   Managers of the testing and review processes

14.   The incident management team is:

A.   A strictly technical operation

B.   Composed of the best programmers

C.   Strictly composed of managers

D.   Usually a diverse bunch of people representing all relevant disciplines

15.   The operations and management processes are lumped together into sustainment because:

A.   They are at the end of the lifecycle.

B.   They are the major activities during the software use lifecycle period.

C.   They are neither development nor acquisition.

D.   They are strictly control processes for sustaining assurance.

Answers

1.   B. Operations is the primary testing and review facilitator during use.

2.   C. The product has to be proved correct prior to reintegration.

3.   A. Receiving requests for change from the customer is a critical function.

4.   C. Configuration management ensures rational control over change and reintegration.

5.   C. Incident response is strategic and must be planned.

6.   B. Maintenance functions independently of development. Development writes the code.

7.   A. Magnetic remanence must be disposed of.

8.   B. Maintenance handles analysis of the problem report.

9.   B. Incidents are active or potential.

10.   A. Decisions about change are based on impact and their likelihood.

11.   C. The useful lifecycle is the lifecycle of the product after release.

12.   D. Monitoring is continuous after release.

13.   B. Stakeholders have to be constantly kept in the loop.

14.   D. The incident management team is diverse and staffed by the people who are best qualified to address any incident.

15.   B. Sustainment is a primary-use activity composed of operations and maintenance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.236.174