CHAPTER 47

OPERATIONS SECURITY AND PRODUCTION CONTROLS

M. E. Kabay, Don Holden, and Myles Walsh

47.1 INTRODUCTION

47.1.1 What Are Production Systems?

47.1.2 What Are Operations?

47.1.3 What Are Computer Programs?

47.1.4 What Are Procedures?

47.1.5 What Are Data Files?

47.2 OPERATIONS MANAGEMENT

47.2.1 Separation of Duties

47.2.2 Security Officer or Security Administrator

47.2.3 Limit Access to Operations Center

47.2.4 Change-Control Procedures from the Operations Perspective

47.2.5 Using Externally Supplied Software

47.2.6 Quality Control versus Quality Assurance

47.3 PROVIDING A TRUSTED OPERATING SYSTEM

47.3.1 Creating Known-Good Boot Medium

47.3.2 Installing a New Version of the Operating System

47.3.3 Patching the Operating System

47.4 PROTECTION OF DATA

47.4.1 Access to Production Programs and Control Data

47.4.2 Separating Production, Development, and Test Data

47.4.3 Controlling User Access to Files and Databases

47.5 DATA VALIDATION

47.5.1 Edit Checks

47.5.2 Check Digits and Log Files

47.5.3 Handling External Data

47.6 CONCLUDING REMARKS

47.7 FURTHER READING

47.8 NOTES

47.1 INTRODUCTION.

Despite the enormous increase in individual computing on personal computers and workstations in the years since the first edition of this Handbook was published in 1975, many mainframe computers and their networks are still used for enterprise computing in applications devoted to the core business of the enterprise. This chapter focuses on how to run vital computers and networks safely and effectively.

Readers with a military background will note that operations security in the civilian sector is different from the OPSEC (a military acronym for “operational security”) designation used in military discourse. As defined by the Joint Chiefs of Staff of the United States military, “OPSEC seeks to deny real information to an adversary, and prevent correct deduction of friendly plans.”1

In determining what operations security and production controls are required for any system, a thorough risk analysis should be conducted; you will find references throughout this Handbook. For a quick analysis, it may be helpful to use a few other common military acronyms and list the threats from their perspectives.

EMPCOA—enemy's most probable course of action. What is the most likely course of action an attacker will take against your systems?

EMDCOA—enemy's most dangerous course of action. What is the worst possible thing an attacker could accomplish?

Weighing these as part of a risk analysis can help tremendously to decide how to employ limited resources. The acronym “METT-TC,” which is essentialy a larger version of the engineering triad of cost, time, and quality, may help. There are various versions of the METT-TC acronym, but perhaps the most useful stands for mission (what we need to do), equipment (what we have), time (by when we need to do it), troops (whom we have to do it), terrain (where are we doing it), and culture (the possible cultural considerations). Each part of our METT-TC analysis should be self-evident, except for perhaps culture. Ignoring the management, political, and community cultures of the locations in which to install security controls is a frequent mistake, with usually dire consequences for both security and the careers of security professionals.

It is critical to be able to define precisely the terms associated with operations security and production controls before engaging in a discourse about them.

47.1.1 What Are Production Systems?

A production system is one upon which an enterprise depends for critically important functions. Examples include systems for handling accounts receivable, accounts payable, payroll, inventory, manufacturing systems, real-time process control, data entry systems, Web-based client interfaces for e-commerce, critical information systems, portable data handling systems, and management information systems.

47.1.2 What Are Operations?

Operations consist of the requirements for control, maintenance, and support of production systems. Operations staff are responsible for such functions as:

  • Integrating new software systems into an existing configuration
  • Running programs and batch jobs to update databases and create reports
  • Installing new versions of production programs
  • Maintaining production databases for maximal efficiency
  • Managing backups (creation, labeling, storage, and disposal)
  • Responding to emergencies and recovering functionality
  • Mounting storage volumes of tapes, cartridges, or disks in response to user or program requests
  • Handling special forms for particular printouts (e.g., check blanks)
  • Managing all aspects of production networks, such as configuring routers, bridges, gateways, wireless propagation, and firewalls

47.1.3 What Are Computer Programs?

A computer program is a set of instructions that tells a computer what to do to perform a task. Computer programs may be acquired, or they may be internally developed. Internally developed programs are stored in computer systems in two basic forms. Source programs are in the form in which they were written (coded) by computer programmers. The statements in source programs are in languages such as COBOL, Visual BASIC, C++, or Java. Source language programs are kept in files and stored on disk folders called source libraries or program libraries. Executable programs have been converted from source code, by compilation or interpretation, into a program that the computer can execute. Executable programs may be maintained in two separate forms: object and load. An object program is a partially executable module that must be linked to other executable modules, such as input/output modules, to become a load module. As load modules, the programs are said to be in executable form. Executable programs are kept in production libraries, from which they are called when needed. Acquired programs are generally in object and load form. The source code is proprietary to the organization that developed it and is rarely given to the acquiring enterprise.

When internally developed programs have to be changed, programmers work with copies of the source programs. The copies are stored in another type of library, referred to as programmer libraries. The programmers make changes to their copy of the source programs, and go through a process of recompiling and testing until the modified program is working properly. When acquired programs require changes, often a contract to make the modifications is issued to the organization from which the programs were acquired. In some situations, internal programmers generate new programs and interfaces to the original acquired programs. The same libraries are used: source libraries for source code, production libraries for executable modules, and programmer libraries for work in progress. These libraries need to be protected. Loss or damage can entail huge costs and considerable inconvenience; recovering them can require a long time and great expense.

47.1.4 What Are Procedures?

Procedures are sets of statements that tell a computer what to do in certain situations. They are unlike programs in that they are not compiled. Stored in files or databases, they are invoked as needed. Procedural statements are made up of operational commands and parameters. The operational commands tell the computer what to do, and the parameters tell the computer which entity to act upon. Job Control Language (JCL) is an example of a procedural language. Procedural language statements often are used in database management systems and in security software products.

47.1.5 What Are Data Files?

Everything stored and maintained in a computer system takes the form of a file. Programs, procedures, information, all are stored in files, using the concept of a file in its broadest sense; that is, a collection of related items. This usage has become most apparent with the ubiquitous personal computer (PC). Data files, as distinguished from program files and other types, are those that store information. In a PC environment, data files may be called documents. Documents are created by word processors, spreadsheet programs, graphics generators, and other application programs. In mainframe and midsize computer environments, data files are those created and maintained by applications such as payroll, accounting, inventory, order entry, and sales.

Some data files are transient; that is, they are created, used, and deleted within a short period of time. If lost or damaged, they can be reconstructed quickly, with little difficulty. There is usually no need to protect transient files. Other files, such as master files or organizational databases (groups of files that are linked to one another), contain information that is vital, confidential, or virtually irreplaceable. These files, generated by PCs, mainframes, and midsize computer systems, must be protected by security software and backup procedures, to ensure against loss, destruction, theft, and unauthorized disclosure.

47.2 OPERATIONS MANAGEMENT.

The processes for effective and efficient management of operations have direct benefits on information assurance. In particular, these aspects of operations management are of special value for improving and maintaining security:

  • Separation of duties
  • Defining the role of the security officer or security administrator
  • Limiting access to the operations center
  • Defining secure change-control processes
  • Careful controls over externally supplied software
  • Managing quality assurance and quality control.

47.2.1 Separation of Duties.

Separation of duties is a key control that should be applied to development and modification of programs. In enterprises where there are systems and programming departments that create and maintain custom programs, each individual programmer is assigned a user ID and a password. In these enterprises, where programs are developed and maintained internally, changes are constantly made to programs in order to meet changing business requirements. Modified executable programs, after recompilation and testing by programmers, are moved from their libraries into production libraries. Modified source programs are moved into source libraries. The programmers are responsible for keeping the source libraries current, while computer operations, or some other functional group separated from programming, may be responsible for maintaining the production libraries. When an updated executable program is transferred from the programmer's library to the production library, a transmittal is included, signed off by a manager in the programming department.

A particular consequence of the separation of duties is that a member of the operations staff should always be involved in the functional analysis and requirements definition phases for changes to production programs. The operations perspective is not always clear to programmers, and such issues as logging, backout, and recovery, as discussed later in this chapter, need to be brought to their attention early in the development and maintenance cycles.

47.2.2 Security Officer or Security Administrator.

Contemporary enterprises typically include a mix of mainframes, midsize computers, and local area networks (LANs) comprised of hundreds or thousands of workstations, PCs, terminals, and other devices, all interconnected with one another, and with the same mix in other enterprises throughout the world via the Internet. A department, or an individual in smaller enterprises, has the responsibility for providing and maintaining the security of files, databases, and programs. The title that is often associated with this function is information security officer. This individual or department has the mandate to carry out the security policy as set down by the senior management of the enterprise. The security officer is empowered to allow or to disallow access to files, databases, and programs. In the language of the security officer, procedures are set up and maintained that establish relationships among individuals, programs, and files. Users, programmers, and technicians are granted privileges for full access, update only, or even read only. The security officer has the power to change or to revoke privileges.

47.2.3 Limit Access to Operations Center.

Physical access to the operations center grants a person enormous power to disrupt production systems. Such access must be tightly controlled.

47.2.3.1 Need, Not Status, Determines Access.

A fundamental principle for effective security is that access to restricted areas is granted on the basis of roles. Employees whose roles do not justify access should be excluded from autonomous access to production systems. In particular, high-placed executives, such as the president, chief executive officer, chief financial officer, chief operating officer, chief technical officer, and all vice presidents, should examine their own roles and determine if they should be able to enter the operations center unaccompanied; in most cases, such access is unjustified. Limiting their own access sets an important model for other aspects of security policy and demonstrates that need, not social status or position within the corporate hierarchy, determines access to restricted areas.

47.2.3.2 Basic Methods of Access Control.

As explained in Chapter 28 in this Handbook, access control depends on identification and authentication (I&A). I&A can be based on:

  • What one has (tokens such as physical keys or smart cards)
  • What one knows (user IDs and passwords or passphrases)
  • What one is (static biometric attributes such as fingerprints, iris patterns, retinal patterns, and facial features)
  • What one does (dynamic biometrics such as voice patterns, typing patterns, and signature dynamics)

For more information on biometric authentication, see Chapter 29.

A typical arrangement for secure access to an operations center may involve keypads for entry of a particular code or card readers programmed to admit the holders of specific magnetic-stripe or smart card. If the operations center is a 24-hour operation with full-time staffing, the presence of operators provides an additional layer of security to preclude unauthorized access. Remote monitoring of sensitive areas increases security by discouraging unauthorized access or unauthorized behavior and speeds up the response to possible sabotage. For extensive discussion of physical and facilities security, see Chapters 22 and 23.

47.2.3.3 Log In and Badge Visitors.

Visitors to a facility that houses sensitive systems should be logged in at a controlled entrance and provided with visitor badges. In high-security applications, an additional login may be required when entering the operations center itself. To encourage return of visitor badges, some security policies require the visitor to deposit a valuable document, such as a driver's license, with the security guards at the main entrance. The effectiveness of visitor badges depends entirely on the use of badges by all personnel at all times; if not wearing a badge is acceptable and common, a malicious visitor could simply hide a visitor badge to pass as an authorized employee.

The time of login and of logout can be valuable forensic evidence if malfeasance is detected. However, such records can be shown to be reliable only if the guards responsible for keeping the logs consistently verify the completeness and correctness of all information written into the logs.

47.2.3.4 Accompany Visitors.

No unaccompanied visitors should be permitted to circulate in the operations center or in the facility housing such a center. In high-security facilities, someone must even accompany the visitor to the washroom and wait for the visitor outside the door.

If a consultant or temporary employee is to work on a project for longer than a day, it may be acceptable to grant that person a restricted pass for low-security areas; however, high-security areas, such as the operations center, would still require such a person to be accompanied.

47.2.4 Change-Control Procedures from the Operations Perspective.

When programmers have made changes to production programs and all documentation and testing procedures are complete, the new versions are formally turned over to the operations staff for integration into production.

47.2.4.1 Moving New Versions of Software into Production.

Operations managers and staff must meet these demands when moving new versions of software into production:

  • Identification—tracking which software is in use
  • Authorization—controlling changes
  • Scheduling—minimizing disruptions to production
  • Backups—ensuring that all requisite information is available to restore a prior state
  • Logging—keeping track of data input for recovery, and of errors for diagnosis of problems
  • Backout—returning to a prior production version in case of catastrophic errors
47.2.4.1.1 Identification.

Knowing precisely which versions of all production software are in use is the basis of production controls. Every module must have a unique identification that allows immediate tracking between executable code and source code; all changes to a particular module must be fully documented by the programming group. Unique identifiers allow the quality assurance process to ensure that the only modules that go into production are those that have been properly tested.

Most production shops use a three-level numbering scheme to track versions. Typically, version a.b.c (e.g., 7.13.201) is defined in this way:

  • c changes every time anything at all—even a spelling mistake—is changed.
  • b changes when program managers decide to group a number of fixes to errors into a new version for release to production.
  • a changes when significant new functions are added; often the source code is completely renumbered if the changes are great enough.

The version number of object code must match the number of its source code. All object code should include internal documentation of its version number so that the version can be ascertained instantly, without having to consult possibly inaccurate external documentation.

47.2.4.1.2 Authorization.

Strict procedures must be in place to preclude rogue programmers from introducing modified code into the production suite. In addition to the dangers of introducing untested or undocumented changes, allowing any individual to modify production processes without verification and authorization by an appropriate chain of responsibility can allow hostile code, such as Trojan horses and backdoors, to be introduced into the systems.

47.2.4.1.3 Scheduling.

Implementing any new version of a production system requires careful planning and scheduling. Operations staff must prepare for changes in all aspects of production that depend on the system in question; for example, there may be requirements for new printer forms, additional magnetic tapes, and other supplies that must be ordered in advance. New requests for operator intervention, or changes in status and error messages during production, necessitate appropriate documentation and training. Effects on other programs may require special preparations that must take into account the scheduling requirements of the other systems that are affected. In addition, new versions of software often are implemented immediately after major production jobs, such as end-of-year or quarterly processing, to maintain consistency within an accounting period. For all these reasons, scheduling is critically important for trouble-free operations.

47.2.4.1.4 Backups.

When modifying production systems, operations staffs usually take one or more complete backups of the software and data to be modified. This procedure is essential to allow complete restoration of the previous working environment should there be catastrophic failure of the new software and data structures.

47.2.4.1.5 Logging.

To allow recovery or backout without losing the new data and changes to existing data that may have been carried out using new software and data structures, all production programs should include a logging facility. Logging keeps a journal of all information required to track changes in data and to regenerate a valid version of the data by applying all changes to an initial starting condition. Logging requires synchronization with backups to avoid data loss or data corruption. Special requirements may exist when a new version of the production system involves changes to data structures; in such cases, applying the information about changes to the older data structures may require special-purpose application programs. Since programmers sometimes forget about such possibilities, operations staff should be prepared to remind the programming staff about such requirements during the design phases for all changes.

47.2.4.1.6 Backout and Recovery.

Sometimes a new version of production software is unacceptable and must be removed from the production environment. This decision may be made immediately, or it may occur after a significant amount of data entry and data manipulation has taken place. In either case, operations should be able to return to the previous version of a production system without data loss. This process involves restoring the earlier complete operating environment, with software and data in synchrony, and then using log files to repeat the data input and data modifications from the moment of changeover to the moment of fallback.

Not all of the changes that were made using a new version will necessarily be applicable to the previous data; for example, if new fields were added to a database, the data stored in those fields would not be usable for an older, simpler data structure. Similarly, if fields were removed in the newer database, recovery will involve providing values for those fields in the older database. All of these functions must be available in the recovery programs that should accompany any version change in production systems.

47.2.4.2 Using Digital Signatures to Validate Production Programs.

If an unauthorized intruder or a disgruntled employee were discovered to have gained access to the production libraries, it would be necessary to determine if there had been unauthorized modifications to the production programs.

Date and time stamps on programs can record the timing of changes, but many operating environments allow such information to be modified using system utilities that read and write directly to disk without passing through normal system calls. In those instances, there would be no time stamp or log entry.

One approach that has been used successfully is to apply checksums to all production components. Checksum software applies computations to programs as if the codes were simply numbers; the results can be sensitive to changes as small as a single bit. However, if the checksums are computed the same way for all programs, access to the checksum utility could allow a malefactor to change a module and then run the checksum utility to create the appropriate new checksum, thus concealing the evidence of change. To make such subterfuge harder, the checksums can be stored in a database. Naturally, this database of checksums itself must be protected against unauthorized changes. Storing checksums may make unauthorized changes more difficult to disguise, but it also extends the chain of vulnerabilities.

A better way of determining whether object code or source code has been modified is to use digital signatures. Digital signatures are similar to checksums, but they require input of a private key that can, and must, be protected against disclosure. Verifying the digital signature may be done using a corresponding public key that can be made available without compromising the secrecy of the private key. For more information on public and private keys, see PKI and certificate authorities in Chapter 37 of this Handbook.

When digital signatures are used to authenticate code, it may be possible to validate production systems routinely, provided that the process is not too arduous to be accomplished as part of the normal production process. For example, it should be possible to validate all digital signatures in no more than a few minutes, before allowing the daily production cycle to start.

47.2.5 Using Externally Supplied Software.

Production often uses software from outside the organization; such software may be commercial off-the-shelf (COTS) programs or it may consist of programs modified for, or written especially for, the organization by a software supplier. In any case, external software poses special problems of trust for the production team. There have been documented cases in which production versions of software from reputable software houses have contained viruses or Easter eggs (undocumented features, such as the well-known Flight Simulator in MS-Excel versions, which pops up a graphic landscape that includes a monitor showing the names of the Excel development team). In addition, some consultants have publicly admitted that they deliberately include Trojan horse code (undocumented malicious programming) that allows them to damage data or inactivate the programs they have installed at client sites if their fees are not paid.

In large data centers, it may be possible to run quality assurance procedures on externally supplied code. Such tests should include coverage monitoring, in which a test suite exercises all the compiled code corresponding to every line of source code. However, it is rare that an operations group has the resources necessary for such testing.

The trustworthiness of proprietary external software written or adapted especially for a client ultimately may depend on the legal contracts between supplier and user. Such legal constraints may not prevent a disgruntled or dishonest employee in the supplier organization from including harmful code, but at least they may offer a basis for compensation should there be trouble.

47.2.5.1 Verify Digital Signatures on Source Code If Possible

If externally supplied code is provided with its source library as well as with compiled modules, operations should try to have the supplier provide digital signatures for all such programs. Digital signatures will permit authentication of the code's origins and may make it harder for malefactors to supply modified code to the user. In addition, the digital signatures can support nonrepudiation of the code (i.e., the supplier will be unable credibly to claim that it did not supply the code) and therefore the signatures may be useful in legal action, if necessary.

47.2.5.2 Compile from Source When Possible.

Wherever possible, it is highly desirable to be able to compile executables from source code on the target machine. Compiling from source allows quality assurance processes to check the source for undocumented features that might be security violations and to couple the executables tightly to the verified source. In addition, compiling on the local system ensures that all calls to system routines will be satisfied by linking to executables such as dynamic link libraries (DLLs) supplied in the current version of the operating system.

However, compilation on a local system has additional implications that complicate implementation of new applications: Because the system routines being linked to the compiled code may not be identical to those used during the manufacturer's quality assurance tests, the customer organization must plan for its own quality assurance testing.

Operations staff should express this preference for source code clearly to the person or group controlling acquisition of external software. For more information about writing secure code, see Chapter 38 in this Handbook; for information about secure software development and quality assurance, see Chapter 39.

47.2.6 Quality Control versus Quality Assurance.

Throughout this chapter, quality assurance has been mentioned as an essential underpinning for operations security. Quality assurance refers to the processes designed to ensure and to verify the validity of production programs. However, another aspect of quality concerns the operations group: the quality of output. The process of verifying and ensuring the quality of output is known as quality control.

47.2.6.1 Service-Level Agreements.

Unlike mathematical truth, there is no absolute standard of quality for computing operations. Every organization must define the level of quality that is suitable for a particular application. A commonly quoted principle in programming and operations is that there is a complex relationship among quality, cost, and development time: Increasing quality increases both cost and development time; shortening development time increases cost, if quality is to be maintained. It follows that every system should include a definition of acceptable performance; such definitions are known as service-level agreements (SLAs).

SLAs typically include minimum and maximum limits for performance, resource utilization, and output quality. The limits should be expressed in statistical terms; for example, “The response time measured as the time between pressing ENTER and seeing the completed response appear on screen shall be less than three seconds in 95 percent of all transactions and shall not exceed four seconds at any time.” SLAs may define different standards for different types of transactions if the business needs of the users so dictate.

47.2.6.2 Monitoring Performance.

Computer-system performance depends on four elements:

  • 1. Access time and speed of the central processing unit(s) (CPU)
  • 2. Access time and speed of mass storage (disks)
  • 3. Access time and speed of fast memory (RAM)
  • 4. Application design

In addition, network performance depends on communications-channel bandwidth and traffic.

Operations groups should monitor performance to ensure that the requirements of the SLAs are met. There are two approaches to such monitoring: (1) analysis of log files and (2) real-time data capture and analysis.

Log files that are designed with performance analysis in mind can capture the precise times of any events of interest; for example, one might have a record in the log file to show when a particular user initiated a read request for specific data and another record to show when the data were displayed on the user's screen. Such level of detail is invaluable for performance analysis because the data permit analysts to look at any kind of transaction and compute statistics about the distribution of response times. In turn, these data may be used for trend analysis that sometimes can highlight problems in program or data structure, design, or maintenance. For example, an excessively long lookup time in a data table may indicate that the system is using serial data access because it had been designed without an appropriate index that would permit rapid random access to the needed records.

Another approach to performance monitoring and analysis is to use real-time monitors that can alert operations staff to abnormal performance. For example, an application program may be designed to calculate response times on the fly; the results may be displayed numerically or graphically on a dashboard for the operations staff. Values falling below a specified parameter may signal an abnormal condition, using color or sound to alert the operators to the drop in performance. Such integrated performance metrics allow the fastest possible response to performance problems.

Even if the application programs lack integrated performance metrics, it is sometimes possible to use system-level online performance tools to analyze system activity. In one instance, for example, a software supplier had promised a response time of 10 seconds or less for all transactions, but one particular operation was taking 43 minutes. Using the online performance tool, it quickly became obvious that the transaction in question was generating an enormous amount of disk I/O (read and write operations). Investigation revealed that the program design was forcing 80,000 random-access reads in a particular data set to locate a few target records. Installing and using an appropriate index and compacting the data set to provide rapid access to blocks of related records reduced response time to 6 seconds.

47.2.6.3 Monitoring Resources

Consistent monitoring of resource utilization is one of the most valuable roles of the operations staff. Data center operations should include regular analysis of system log files to track changes in the number of files, amount of disk free space available, number of CPU cycles consumed, number of virtual memory swap operations, and less esoteric resource demands, such as numbers of lines or pages printed, number of tape mounts requested, number of backup tapes in use, and so on. These data should be graphed and subjected to trend analysis to project when particular resources will be saturated if the trend continues. Operations can then reduce demand either by improving aspects of production (e.g., optimizing programs to require fewer resources) or by increasing available resources (e.g., installing a memory upgrade).

Another level of analysis focuses on specific users and groups of users. Each functional group using the production systems should be analyzed separately to see if there are discontinuities in their trends of resource utilization. For example, a specific department might show a relatively slow and stable rise in CPU cycles consumed per month—until the rate of increase suddenly increases tenfold. If such a rate of increase were to continue, it could surpass all the rest of the system demands combined; operations therefore would investigate the situation before it caused problems. The cause of the discontinuity might be a programming error; for example, there might be a logical loop in one of the programs or a repeated computation that ought to have its result stored for reuse. However, the change in slope in CPU utilization might be due to introduction of new programs with a rapidly growing database; in such cases, operations would have to act to meet heavy new demands.

Disk space is often a key resource that can cause problems. If users fail to clean up unwanted files, disk space can disappear at an astounding rate. This problem is exacerbated by poor programming practices that allow temporary work files to remain permanently in place. Systems have been designed with tens of thousands of sequentially numbered “temporary” work files that had no function whatsoever after a production run was completed but that were accumulated over several years.

One of the methods widely used to reduce resource waste is chargeback. Using system and application log files, system administration charges the users of particular systems a fee based on their use of various resources. Sometimes these chargebacks are viewed as “funny money” because they are an accounting fiction—no money changes hands. However, requiring managers to budget carefully for computing resources can greatly improve attention to mundane housekeeping matters such as cleaning up useless files. If the chargeback system extends to aspects of program performance such as number of disk I/Os, it can even influence programmers to optimize their design and their code appropriately. Optimization is appropriate when the total costs of optimization are less than the savings in resources and the increases in productivity that result from optimization efforts, measured over the lifetime of the application.

47.2.6.4 Monitoring Output Quality.

The final component of quality control is the meticulous monitoring of everything that is produced in the data center and sent to users or clients. Although much printing is now performed on local printers controlled by users, in many situations the operations group is responsible for documents such as payroll checks, invoices, and account status reports. Every operations group must explicitly assign responsibility for verifying the quality of such output before it leaves the data center. Operators should keep careful logs that record various types of error (e.g., torn paper, misaligned forms, or poor print quality) so that management can identify areas requiring explicit attention to improve quality.

47.3 PROVIDING A TRUSTED OPERATING SYSTEM.

The operating system (OS) is usually the single biggest and most important example of externally supplied software in a data center. Because the OS affects everything that is done in production, it is essential to know that the software is trustworthy. To this effect, operations staff use procedures to ensure that known-good software is always available to reinstall on the system.

47.3.1 Creating Known-Good Boot Medium.

The simple principle that underlies known-good operating software is that there shall be an unbroken chain of copies of the OS that have never run any other software. That is, operations will create a boot medium (tape, cartridge, CD-ROM) immediately after installing known-good software.

For example, if boot-medium V1B0 is defined as version 1 of the OS as it is delivered from the manufacturer, its installation would require specific settings and parameters for the particular configuration of the system. Immediately after installing V1B0, but before running any other software, operations would create medium V1B1 and set it aside for later use if V1B0 had to be replaced.

47.3.2 Installing a New Version of the Operating System.

Continuing this example of how to maintain known-good operating software, it might become necessary to install a new version of the operating system—say, version 2 on medium V2B0. Before using V2B0, operations would reinstall the current known-good OS, say from V1B1. Only then would V2B0 be installed, and the new boot medium V2B1 would be created immediately.

47.3.3 Patching the Operating System.

Often, when it is necessary to modify a small part of the OS, rather than installing a whole new version, manufacturers ask users to patch the OS. The patch programs modify the compiled code in place. If checksums or digital signatures are in use to maintain OS integrity, these codes will have to be regenerated after the patch is applied. However, to maintain a known-good status, applying a patch should follow a rigid sequence:

  1. Load the current known-good software from the appropriate medium (e.g., V2B1).
  2. Install the patch.
  3. Immediately create a known-good boot medium before running any other software (in our example, this medium would be V2B2).

For extensive discussion of managing patches for production systems, see Chapter 40 in this Handbook.

47.4 PROTECTION OF DATA

47.4.1 Access to Production Programs and Control Data.

Just as the operations center needs restricted access, so do production programs and data. From a functional point of view, there are three categories of people who might be allowed access to programs and data on which the enterprise depends: users, programmers, and operations staff.

47.4.1.1 Users.

The only people who should have read and write access to production data are those users assigned to the particular systems who have been granted specific access privileges. For example, normally only the human resources staff would have access to personnel records; only the finance department staff would have full access to all accounts payable and accounts receivable records. Managers and other executive users would have access to particular subsets of data, such as productivity records or budget figures. Of course, no user should ever have write access to production programs.

47.4.1.2 Programming Staff.

Programmers create and maintain production programs; they naturally have to be able to access the versions of those programs on which they currently are working. However, programmers must not be able to modify the programs currently used in production. All changes to production programs must be documented, tested, and integrated into the production environment with the supervision of quality assurance, operations, and security personnel.

Programmers need to be able to use realistic data in their development, maintenance, and testing functions; however, programmers should not have privileged access to restricted data. For example, programmers should not be allowed to read confidential files from personnel records or to modify production data in the accounts payable system. Programmers can use extracts from the production databases, but particular fields may have to be randomized to prevent breaches of confidentiality. Programmers generally resent such constraints, but usually an effective process of education can convince them that maintaining barriers between production systems and systems under development is a wise policy.

47.4.1.3 Operations Staff.

Much as the programmers are responsible for developing and maintaining systems, so the operations staff are responsible for using and controlling these systems. Operations staff perform tasks such as scheduling, error handling, quality control, backups, recovery, and version management. However, operations staff should not be able to modify production programs or to access sensitive data in production databases.

47.4.2 Separating Production, Development, and Test Data.

For obvious reasons, testing with production data and production programs is an unacceptable practice, except in emergency situations. Therefore, programmers who develop new programs or modify existing programs must perform tests using their own libraries.

These are frequently referred to as test libraries. Experienced programmers keep copies of the source programs for which they are responsible as well as copies of some of the files that are used by the programs and subsets of others in their own test libraries. To avoid security violations, such copies and subsets should be anonymized to the degree necessary to protect confidentiality. For example, a system using personnel data might substitute random numbers and strings for the employee identifiers, names, and addresses.

It is important to include time stamps on all files and programs, including both production and test versions. This practice serves to resolve problems that arise about program malfunctions. If all programs and files have time stamps, it can be helpful in determining whether the most current version of the load program is in the production library and whether test files and production files have been synchronized.

Final testing prior to production release may entail more formal review by an independent quality assurance section or department; the quality assurance group may also control transfers of programs to the production library.

47.4.3 Controlling User Access to Files and Databases.

Access to files has to be controlled for two reasons:

  1. There is confidential information in files that is not to be made accessible to everyone.
  2. There are other files that are considered auditable.

The information in these files may be confidential, but that is not the reason for controlling access to them. The information in these files must be controlled because changing it is illegal. An example would be an enterprise's general ledger file once the books have been closed. Changing the information in these files gave birth to the pejorative phrase “cooking the books.” The original copies of these files are developed on the computer that handles the day-to-day transactions of an enterprise. Some time after the month-end closing of the books, copies of these files are archived. In this form they are the recorded history of the enterprise and cannot be changed. Storing this chiseled-in-stone historical information, combining details and summaries in database format, so as to be accessible for the purpose of analysis is known as data warehousing.

In most large enterprises, these files are created on mainframe and midsize computers, although as the speed and storage capacity of microcomputers continue to increase, these files are beginning to appear on microcomputer database servers in LANs. In any event, controlling user access to files is performed in several ways depending on what types of access are allowed. Remote access to online production databases and files is often done over leased lines—dedicated communication facilities paid for on a monthly basis as opposed to dial-up or switched lines paid for on a when-used basis. Access control may be readily accomplished through the use of front-end security software modules, which in turn feed into database and file handling software and finally into the application software. For example, in an environment using a software product for handling queries and updates of online databases and files, a number of different security software products could be installed. Such products use what are called rules or schemas to validate user IDs and passwords, to authorize types of transactions, and to allow access to files and databases.

Many information system installations allow remote communications for many kinds of transactions. Various individuals, including sales representatives entering order information and traveling executives wishing to access current information from databases or files, use public networks, such as wired or wireless Internet service providers at hotels, airports, and coffee shops. This type of access increases the potential for security breaches. The usual practice today is to use virtual private networks (VPNs), which consist of encrypted channels between the portable equipment and the enterprise networks. Even so, some inadequate implementations of VPNs allow cleartext transmission of the initial logon information, allowing the identification and authentication data to be captured and used for unauthorized access. Poorly secured network access also allows man-in-the-middle attacks on the user's traffic.

For more information about encryption, see Chapters 7 and 37 in this Handbook;for more about network and communications security, see Chapters 5, 25, 32, 33, and 34.

47.5 DATA VALIDATION.

Just as it is essential to have trusted operating systems and application software for production, the operations group must be able to demonstrate that data used for production are valid.

Validation controls normally are carried out dynamically throughout data entry and other processing tasks. Some validity checks are carried out automatically by database software; for example, inconsistencies between header records and details may be reported as errors by the database subsystems. Bad pointers are usually flagged immediately as errors by the database software; examples include:

  • Pointers from a particular master record to a detail record with the wrong key value or to a nonexistent location
  • Forward or backward pointers from a detail record to records that have the wrong key value for the chain or that do not exist at all

However, many errors cannot be caught by database subsystems because they involve specific constraints particular to the application rather than errors in the database itself. For example, it may be improper to allow two chemical substances to be mixed in a processing vat, yet there is nothing in the data themselves that the database software would recognize as precluding those two values to be recorded in the input variables. The programmers must include such restrictions in edit checks; often these relations among variables can be coded in a data dictionary. If the programming environment does not allow such dependencies, the programmers must incorporate the restrictions in lookup tables or in initialization of variables.

From a production point of view, operations staff must run appropriate validation programs created by the database suppliers and by the application programmers to assure the quality of all production data. The next sections review in more detail what is involved in such validation programs.

47.5.1 Edit Checks.

Operations should have access to diagnostic programs that scan entire databases looking for violations of edit criteria. For example, if a field is designated as requiring only alphanumeric characters but not special characters such as “#” and “@,” then part of the diagnostic sweep should be checking every occurrence of the field for compliance with those rules. Similarly, range checks (greater than, less than, greater than or equal, equal, less than or equal, between) are a normal part of such scans. Lookup tables listing allowed and forbidden data and combinations of data provide further sophistication for more complex relations and restrictions. In any case, the role of operations staff is to run the diagnostics and identify errors; correction of the errors should fall to authorized personnel, such as the database administrators.

Diagnostic programs should provide detailed information about every error located in the production files. Such details include:

  • Configurable view of the record or records constituting an error, showing some or all of the fields
  • Unique identification of such records by file name or number, record number, and optionally by physical location (cylinder, sector) on disk
  • Error code and optional full-text descriptions of the error, including exactly which constraints have been violated

A diagnostic program should, ideally, also allow for repair of the error. Such repair could be automatic, as, for example, insertion of the correct total in an order-header, or manual, by providing for the database administrator to correct a detail record known to be wrong.

47.5.2 Check Digits and Log Files.

Another form of verification relies on check digits. Programs can add the numerical or alphanumeric results of data manipulations to each record or to groups of records when transactions are completed properly. Finding records with the wrong check digits will signal inconsistencies and potential errors in the processing. Check digits are particularly useful to identify changes in production databases and other files that have been accomplished through utilities that bypass the constraints of application programs. For example, most databases come with a relatively simple ad hoc query tool that permits lookups, serial searches, views, and simple reporting. However, such tools often include the power to modify records in compliance with database subsystem constraints, but completely free of application program constraints. An even more powerful type of utility bypasses the file system entirely, and works by issuing commands directly to the low-level drivers or to the firmware responsible for memory and disk I/O. In the hands of the wrong people, both database and system utilities can damage data integrity. However, it is usually difficult for the users of these utilities to compute the correct checksums to hide evidence of their modifications. A diagnostic routine that recomputes checksums and compares the new values with the stored values can spot such unauthorized data manipulations immediately.

A similar technique for validating and repairing data uses database and application log files to record information such as before and after images of modified records. Such log files also can include special marker records to flag different steps in complex transactions; these flags can allow diagnostic programs to identify precisely when transactions have been interrupted or subjected to other forms of failure. Using these log files, it is often possible to identify which defective transactions need to be removed, repeated, or completed.

Commercial data-integrity software is available for a wide range of platforms and databases. Searching on “data integrity software” in the Google search engine (www.google.com) locates many references to such products.2

For more information on integrating security into application design, see Chapter 52 in this Handbook. For more about log files and other monitoring and control systems, see Chapter 53.

47.5.3 Handling External Data.

Before using data provided by external organizations, operations should routinely check for data purity. Diagnostic routines from the programming group should be available to check on all data before they are used in batch processing to update a production database. The same principles of data validation used in checking production databases should apply to all data received from clients, suppliers, governments, and any other organization. Special validation programs can and should be written or obtained to test the data received on any medium, including tapes, cartridges, removable discs, CD-ROMs, DVDs and data communications channels.

47.6 CONCLUDING REMARKS.

Up until the mid-1980s, the world of mainframes and minicomputers differed from that of PCs. However, from the mid-1980s and into the millennium, these worlds have merged. LANs have proliferated, the Internet has changed the way business is conducted, and bridges, routers, and gateways make it possible for information to move among computer platforms, regardless of type. Security requirements are now universal in scope. Numerous layers of software and hardware separate the user and the technician from the information to which they require access. These layers themselves contribute to security because they require some technical skill to get at the information. As this is being written in 2008, computer literacy is increasing rapidly. Children are taught to use computers in grade school, and tens of millions of workers routinely use PCs or workstations daily, so the security provided by the technology is not as significant as it once was.

Security is also an economic issue. If an individual with the requisite skill is determined to gain access to online files or databases, it is extremely expensive to prevent such access. Even with high expenditures, success in achieving complete security is never guaranteed. Nevertheless, if the value of the information and its confidentiality justifies additional expense, there are software products available that employ complex schemes to support security. When necessary, information security can be extended down to the field level in records of online files and databases.

Other related security measures, such as physical protection, communication security, encryption of data, auditing techniques, system application controls, and other topics, are covered in other chapters of this Handbook. No one measure can stand alone or provide the ultimate protection for security, but with a proper balance of measures, the exposures can be contained and managed.

47.7 FURTHER READING

Baumann, W. J., J.T. Fritsch, and K. J. Dooley. Network Maturity Model: An Integrated Process Framework for the Management, Development and Operation of High Quality Computer Networks. Parker, CO: Outskirts Press, 2007.

Benyon, R., and R. Johnston Service Agreements: A Management Guide. San Antonio, TX: Van Haren Publishing, 2006.

Blanding, S. Enterprise Operations Management Handbook, 2nd ed. Boca Raton, FL: Auerbach, 1999.

Cisco. “Cisco Data Center Management.” White paper, 2004. Available: www.cisco.com/web/about/ciscoitatwork/downloads/ciscoitatwork/pdf/Cisco_IT_Operational_Overview_Data_Center_Management.pdf or http://tinyurl.com/5vl2n6.

McCrie, R. Security Operations Management, 2nd ed. Upper Saddle River, NJ: Prentice-Hall, 2006.

47.8 NOTES

1. United States Joint Chiefs of Staff, “Information Operations,” Joint Publication 3–13, 2006, p. II-2. Available: www.dtic.mil/doctrine/jel/new_pubs/jp3_13.pdf.

2. The site http://dmoz.org/Computers/Software/Databases/Data_Warehousing/Data_Integrity_and_Cleansing_Tools listed 45 such tools at the time of writing (May 2008).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.221.175.164