Lester E. Nichols, M. E. Kabay, and Timothy Braithwaite
38.2 POLICY AND MANAGEMENT ISSUES
38.2.1 Software Total Quality Management
38.2.3 Regulatory and Compliance Considerations
38.3 TECHNICAL AND PROCEDURAL ISSUES
38.3.4 Best Practices and Guidelines
38.4.1 Internal Design or Implementation Errors
38.5 ASSURANCE TOOLS AND TECHNIQUES
38.5.2 Code Examination and Application Penetration Testing
38.5.3 Standards and Best Practices
The topic of secure coding cannot be adequately addressed in a single chapter. Unfortunately, programs are inherently difficult to secure because of the large number of ways that execution can traverse the code as a result of different input sequences and data values.
This chapter provides a starting point and additional resources for security professionals, system architects, and developers to build a successful and secure development methodology. Writing secure code takes coordination and cooperation of various functional areas within an organization, and may require fundamental changes in the way software development currently is designed, written, tested, and implemented.
There are countless security hurdles facing those writing code and developing software. Today dependence on the reliability and security of the automated system is nearly total. For an increasing number of organizations, distributed information processes, implemented via networked environments, have become the critical operating element of their business. Not only must the processing system work when needed, but the information processed must retain its integrity so that it can be trusted in use. Because of a general lack of basic IT organizational discipline, and of adherence to fundamental software and systems development and maintenance principles, most software development efforts have been significantly lacking at best. If the same inadequate practices are left unchanged in the face of increasing cyberthreats, they will continue contributing to the insecure systems of tomorrow, just as they have contributed to the insecure systems of today.
The fundamental problem underlying this kind of assertion is the difficulty of justifying what are often perceived as non-revenue-generating activities. The difficulty also stems from the infeasibility of proving a negative. For example, one cannot prove to the uninvolved or unimpressed observer that the time, money, and human resources spent on prevention are well spent simply because a disaster was averted. Such an observer will take the fact that a disaster did not occur as evidence that the possibility of disaster never really existed, or that the problem was exaggerated. In the same way, a skeptical observer may take the absence of security breaches as evidence that no such threat existed, or that it was never as bad as claimed. The problem is exacerbated after spending time and effort on establishing security policies, security controls, and enforcement mechanisms.
In both cases, the money, time, and effort expended to prevent adverse computing consequences from harming an organization are viewed with suspicion. However, such suspicions are not new in the world of information processing. Throughout the history of IT, this attitude has existed regarding problem-prevention activities such as quality assurance, quality control testing, documentation, configuration management, change management, and other system-management controls designed to improve quality, reliability, and system maintainability.
Problems with security are management problems. Management dictates how much time and money can be spent on security activities such as risk management and software testing. How much time and energy are invested in designing, building, testing, and fixing software is in the end a management call. Thus, the primary concern of those striving to improve software security must be the correction of these same basic and fundamental IT management defects; otherwise effective security can never be attained or maintained.
The nature of the computer security challenge is that of dynamically changing vulnerabilities and threats, based on rapidly changing technologies, used in increasingly complex business applications. Organizations need to build software around the idea of a continuous improvement model, as popularized by the total quality management (TQM) articulated in the ISO 9000 family of standards. For purposes of quality improvement, this model was viewed as a continuing cycle of plan-do-check-and-act. For software security, it can be thought of as a cycle of plan-fix-monitor-and-assess. A security management process to execute this model must be established and integrated into the day-to-day operations of the business. Integration means that the security initiative does not stand alone as a watchdog or act merely as an audit function, and it must move away from a policing function. Security must be considered an essential part of all other systems development and management activities, including requirements definition, systems design, programming, package integration, testing, systems documentation, training, configuration management, operations, and maintenance. Security must be viewed as good architecture.
A security management process must be designed in such a way that the activities of planning, fixing, monitoring, and assessing are accomplished in an iterative fashion for systems being defined, designed, and implemented and for systems that are currently executing the day-to-day business operations.
For many organizations, it has not been uncommon for software to be built with little or no attention to security concerns. Consequently, many business-critical applications have made their way into full production with inadequate or nonexistent security controls in place. For example, a quickly built application to resolve an initial need for tracking information within a database through a Web interface may not get all the access control configurations designed into the application that are needed for proper security. It serves a quick purpose but becomes a long-term tool because of its success. Because of the nature of business (moving from one task to the next), nothing is really done to develop it into the function it now fills. As a result, users or hackers may be unintentionally or intentionally capable of accessing and modifying critical information within the database or within the system itself. Fortunately, this trend has started to change, but it is still in need of significant attention. Because such systems are already operating, the only way realistically to identify and assess their security risks is to begin aggressively monitoring for intrusions and then to design and implement corrective security policies and controls based on what is discovered. The effective sequence of security activities for systems already in operational status will be to monitor current operations, assess suspicious activity, plan a corrective policy and technology control, implement the control on the system, and then monitor for continuing effectiveness.
To address the software and code security effectively, it must become an issue for management, just as other security concerns have began to demand management attention. Business executives and IT managers are now all too familiar with the concept of due diligence as applied to the uses of information technology. Because of the potential for legal fallout, it has become necessary to view IT-related actions through the definitions of due diligence and reasonable care.
The significance of the concepts of due diligence and reasonable care is that they allow for an evolving metric against which an organization's application security deliberations, decisions, and actions can be compared. The comparison is usually against a similar organization, in like circumstances of vulnerability and threat, and with similar predictable adverse impacts on customers, partners, shareholders, employees, and the public.
For example, if one company employs a security control and does not experience any security breaches that the technique was supposed to prevent, that fact could be used to establish a baseline against which other similar organizations could be compared. If enough similar organizations employ the same technique, the security control may become categorized as a best practice for that industry.
If, however, another company in the same industry did not employ the security control and did experience security breaches of the type the technique was suppose to prevent, it might clearly indicate a lack of due diligence or reasonable care. Software security decisions and implementations are not a one-time event, but need to be under a continuous process of risk evaluation, management, and improvement.
It is therefore imperative, in order to demonstrate an ability to exercise continual due diligence to management, to establish a documented computer security risk management program and to integrate it into the overall management processes of the software development process. Nothing less will demonstrate that a company, and its board, is capable of assessing computer security threats and of acting in a reasonable manner.
For software developers working on commercial software for sale to a specific client under contract, or even to a wide range of customers governed by end-user license agreements, the exercise of due diligence and improved security may reduce the risk of lawsuits claiming damages for negligence. Whether successful or not, such lawsuits are never positive publicity for makers of software products.
If an organization is subject to specific regulations (e.g., those dictated by Sarbanes-Oxley legislation), then documentation of the errors encountered, and the resulting internal reporting and remediation efforts, is critical. The error should clearly indicate how it was identified, who identified it, how it was reported to management, what the remediation will be, and when it is anticipated to be completed. Without these details, management may encounter significant difficulties in confirming that adequate internal control mechanisms exist, together with the appropriate and adequate involvement of management. In addition, an error could result in a control weakness being identified by the external auditors or, even worse, as a material weakness (a material weakness would be the use of a programming language that has a known inherent flaw rather than another more appropriate and without the flaw, similar to using aluminum instead of steel for a bridge), depending on its nature and severity.
Additionally, when errors are noted that affect multilocation systems or applications, the significance and materiality—that is to say the extent to which the errors impact or may impact the system or any other system to which the application interfaces—must be considered from the aspect of both the subsidiary and the headquarters locales. Since laws differ among states and countries, what might be considered legally acceptable standards of privacy or accounting in one locale might not be acceptable elsewhere.
Security should be integrated into every stage of the application life cycle. This includes the requirements analysis, the design stage for software, and the operating system security kernel, in corporate policy development, and in human awareness, training, and education programs.
Writing secure code can be a daunting and technically challenging undertaking. That is not to say that it cannot be done, but it requires the developer to work in conjunction with the rest of the project team to meet the challenges that must be defined, reconciled, and overcome prior to the release of the software or system. The additional push to market that can drive the rapid development process can also have a negative impact on the security of the application. The problem must be dealt with in two ways, technical and procedural. Technically, the programmers must be aware of the pitfalls associated with application development and avoid them. Procedurally, the development team and organization need to adhere to a consistent methodology of development. This consistency also needs to include the identification of security risks at every stage of the process, including the requirements analysis.
Security must be a part of the requirements analysis within any development project. Most application developers know that adding security after the fact increases cost and time, and becomes more complicated. Requirements can be derived from different sources:
The purpose of analysis is to determine what information and processes are needed to support the desired objectives and functions of the software and the business model.
The informational data gathered during the analysis gathering goes into the software design as requirements. What comes out of the analysis are the data, logic, and procedural design.
Developers take the data and the informational model data and transform them into the data structures that will be required to implement the software. The logic design defines the relationships between the major structures and components of the application. The procedural design transforms structural components into descriptive procedures. Access control mechanisms are chosen, rights and permissions are defined, any other security specifications are appraised, and solutions are determined. A work breakdown structure includes the development and implementation stages. The structure includes a timeline and detailed activities for testing, development, staging, integration testing, and product delivery.
The decisions made during the design phase are pivotal to application development. The design is the only way the requirements are translated into software components. It is in this way that software design is the foundation of the development process, and it greatly affects software quality and maintenance. If good product design is not put in place in the beginning, the rest of the development process will be that much more challenging.
The operating system security kernel is responsible for enforcing the security policy within the operating system and the application. As such, the architecture of the kernel operating system is typically layered, with the kernel at the most privileged layer. This is a small portion of the operating system, but all references to information and changes to authorization pass through the kernel.
To be secure the kernel must meet three basic criteria:
As mentioned, the kernel runs in the most privileged layer. Most operating systems have two processor access modes, user and kernel. General application code runs in user mode, while the operating system runs in kernel mode. Kernel mode allows the processor full access to all system memory and CPU instructions. When applications are not written securely, this separation of modes can become compromised, enabling exploitation of vulnerabilities through arbitrary code, buffer overflows, and other techniques.
“Best practices” is a term that can cause vigorous debate, especially with regard to security. Best practices in general, and particularly with regard to security, often fall prey to a dogmatic, and sometimes blind, devotion to nonsensical practices that have little to do with security and more to do with faith or tradition. Regardless, best practices help provide a set of guidelines that help provide structure. In addition, best practices can be adapted to help meet the needs of a particular situation. An example would be the National Institute of Standards and Technology (NIST) Special Publication Series 800. This series provides best practices and recommendations that can be adapted or integrated into other practices to assist in improving and eliminating the tradition to a particular practice that may no longer have a legitimate use. Considering this, most general security textbooks contain recommendations on security-related aspects of programming—see, for example, Stallings—that do in fact have a very real benefit in creating more secure software.
In addition to designing security into a system from the start, there are also some obvious guidelines that will help in developing secure software:
Mike Gerdes, former manager at AtomicTangerine, contributed these suggestions in the course of a discussion:
To date, no common computer languages have security specific features built-in. Java does include provisions for limiting access to resources outside the “sandbox” reserved for a process, as described in the books by Falten and McGraw. PASCAL uses strong typing and requires full definition of data structures, thus making it harder to access data and code outside the virtual machine defined for a given process. In contrast, C and C++ allow programmers to access any region of memory at any time the operating system permits it.
Computer languages allow developers to write code as well as they can. Strongly typed languages may offer better constraints on programmers, but the essential requirement is that the programmers continue to think about security as they design and build code.
There are several sets of security utilities and resources available for programmers; for example, RSA has a number of cryptographic toolkits. Some textbooks (e.g., Schneier's Applied Cryptography) include CD-ROMs with sample code. In addition, the Computer Emergency Response Team Coordination Center (CERT/CC) was started in December 1988 by the Defense Advanced Research Projects Agency, which was part of the U.S. Department of Defense. CERT/CC is located at the Software Engineering Institute, a federally funded research center operated by Carnegie Mellon University.
CERT/CC studies Internet security vulnerabilities, provides services to Web sites that have been attacked, and publishes security alerts. CERT/CC's research activities include the area of WAN computing and developing improved Internet security.
New programmers should review the range of root causes for software errors. Such review is particularly useful for students who have completed training that did not include discussions of systematic quality assurance methodology. The next sections can serve as a basis for creating effective sets of test data and test procedures for unit tests of new or modified routines.
A general definition of a software error is a mismatch between a program and its specifications; a more specific definition is the failure of a program to do what the end user reasonably expects. There are many types of software errors. Some of the most important include:
Initialization errors are insidious and difficult to find. The most insidious programs save initialization information to disk and fail only the first time used—that is, before they create the initialization file. The second time a given user activates the program, there are no further initialization errors. Thus, the bugs appear only to employees and customers when they activate a fresh copy of the defective program. Other programs with initialization errors may show odd calculations or other flaws the first time they are used or initialized; because they do not store their initialization values, these initialization errors will continue to reappear each time the program is used.
Modules pass control to each other or to other programs. If execution passes to the wrong module, a logic-flow error has occurred. Examples include calling the wrong function, or branching to a subroutine that lacks a RETURN instruction, so that execution falls through the logical end of a module and begins executing some other code module.
When a program misinterprets complicated formulas and loses precision as it calculates, it is likely that a calculation error has occurred; for example, an intermediate value may be stored in an array with 16 bits of precision when it needs 32 bits. This category of errors also includes computational errors due to incorrect algorithms.
Boundaries refer to the largest and smallest values with which a program can cope; for example, an array may be dimensioned with 365 values to account for days of the year, and then fail in a leap year when the program increments the day-counter to 366 and thereby attempts to store a value in an illegal address. Programs that set variable ranges and memory allocation may work correctly within the boundaries but, if incorrectly designed, may crash at or outside the boundaries. The first use of a program also can be considered a boundary condition.
One of the most important types of boundary violations is the buffer overflow. In this error, data placed into storage exceed the defined maximum size and overflow into a section of memory identified as belonging to one or more different variables. The consequences can include data corruption (e.g., if the overflow overwrites data that are interpreted as numerical or literal values) or changes in the flow of execution (e.g., if the altered data include logical flags that are tested in branch instructions).
Buffer overflows have been exploited by writers of malicious code who insert data that overflows into memory areas of interpreted programs and thus become executed as code.
All programs should verify that the data being stored in an array or in any memory location do not exceed the expected size of the input. Data exceeding the expected size should be rejected or, at least, truncated to prevent buffer overflow.
Sometimes there are errors in passing data back and forth among modules. For instance, a call to a function accidentally might pass the wrong variable name so that the function acts on the wrong values. When these parameter-passing errors occur, data may be corrupted, and the execution path may be affected because of incorrect results of calculations or comparisons. As a result, the latest changes to the data might be lost, or execution might fall into error-handling routines even though the intended data were correct.
When a race occurs between event A and event B, a specific sequence of events is required for correct operation, but the program does not ensure this sequence. For example, if process A locks resource 1 and waits for resource 2 to be unlocked while process B locks resource 2 and waits for resource 1 to be unlocked, there may be a deadly embrace that freezes the operations if the processes overlap in execution. If they do not happen to overlap, there is no problem at that time.
Race conditions can be expected in multiprocessing systems and interactive systems, but they can be difficult to replicate; for example, the deadly embrace just described might happen only once in 1,000 transactions if the average transaction time is very short. Consequently, race conditions are among the most difficult to detect during quality assurance testing and are best identified in code reviews. Programmers should establish and comply with standards on sequential operations that require exclusive access to more than one resource. For example, if all processes exclusive-lock resources in a given sequence and unlock them in the reverse order, there can be no deadly embrace.
All programs and systems have limits to storage capacity, numbers of users, transactions, and throughput. Load errors are caused by exceeding the volume limitations of storage, transactions, users, and networks can occur due to high volume, which includes a great deal of work over a long period, or high stress, which includes the maximum load all at one time. For example, if the total theoretical number of transactions causes a demand on the disk I/O system that exceeds the throughput of the disk controller, processes will necessarily begin piling up in a queue waiting for completion of disk I/Os. Although theoretical calculations can help to identify where possible bottlenecks can occur in CPU, memory, disk, and network resources, a useful adjunct is automated testing that permits simulation of maximum loads defined by service-level agreements.
The program running out of high-speed memory (RAM), mass storage (disk), central processing unit (CPU) cycles, operating system table entries, semaphores, network bandwidth, or other resources can cause failure of the program. For example, inadequate main memory may cause excessive swapping of data to disk (thrashing), typically causing drastic reductions in throughput, because disk I/O is typically 1,000 times slower than memory access.
With operating systems (OS) as complex as they are, OS manufacturers routinely distribute the code requirements and certain parameters to the application software manufacturers, so that the likelihood of program conflicts or unexpected stoppages are minimized. While this certainly helps reduce the number of problems and improves the forward and backward compatibility with previous OS versions, even the OS vendors on occasion experience or cause difficulties when they do not conform to the parameters established for their own programs.
It is not unusual for errors to occur where programs send bad data to devices, ignore error codes coming back, and even try to use devices that are busy or missing. The hardware might well be broken, but the software also is considered to be in error when it does not recover from such hardware conditions.
Additional errors can occur through improper builds of the executable; for example, if an old version of a module is linked to the latest version of the rest of the program, the wrong sign-on screens may pop up, the wrong copyright messages may be displayed, the wrong version numbers may appear, and various other inaccuracies may occur.
Generally speaking, the term “user interface” denotes all aspects of a system that are relevant to a user. It can be broadly described as the user virtual machine (UVM). This would include all screens, the mouse and keyboard, printed outputs, and all other elements with which the user interacts. A major problem arises when system designers cannot put themselves in the user's place and cannot foresee the problems that technologically challenged users will have with an interface designed by a technologically knowledgeable person.
Documentation is a crucial part of every system. Each phase of development—requirements, analysis, development, coding, testing, errors, error solutions and modifications, implementation, and maintenance—needs to be documented. All documents and their various versions need to be retained for both future reference and auditing purposes. Additionally, it is important to document the correct use of the system and to provide adequate instructional and reference materials to the user. Security policies and related enforcement and penalties also need to be documented. Ideally, the documentation should enable any technically qualified person to repair or modify any element, as long as the system remains operational.
A program has a functionality error if performance that can reasonably be expected is confusing, awkward, difficult, or impossible. Functionality errors often involve key features or functions that have never been implemented. Additional functionality errors exist when:
Control structure errors can cause serious problems because they can result in:
Some common errors include:
Speed is important in interactive software. If a user feels that the program is working slowly, that can be an immediate problem. Performance errors include slow response, unannounced case sensitivity, uncontrollable and excessively frequent automatic saves, inability to save, and limited scrolling speed. Slow operation can depend on (but is not limited to) the OS, the other applications running, memory saturation and thrashing (excessive swapping of memory contents to virtual memory on disk or on other, relatively slow, memory resources such as flash drives), memory leakage (the failure to deallocate memory that is no longer needed), disk I/O inefficiencies (e.g., reading single records from very large blocks), and program conflicts (e.g., locking errors). For more information on program performance, see Chapter 52 in this Handbook.
At another level, performance suffers when program designs make it difficult to change their functionality in response to changing requirements. In a database design, defining a primary index field that determines the sequence in which records are stored on disk can greatly speed access to records during sequential reads on key values for that index—but it can be counterproductive if the predominant method for accessing the records is sequential reads on a completely different index.
Output format errors can be frustrating and time consuming. An error is considered to have occurred when the user cannot change fonts, underlining, boldface, and spacing that influence the final look of the output; alternatively, delays or errors when printing or saving document may occur. Errors occur when the user cannot control the content, scaling, and look of tables, figures, and graphs. Additionally, there are output errors that involve expression of the data to an inappropriate level of precision.
Buffer overflow vulnerabilities, as an example, have been known to the security community for the past 25 or 30 years, yet the vulnerabilities are still increasing. Clearly greater efforts are neccessary to teach and to enforce awareness and implementation of source code requirements.
Build Security In (BSI) is a project of the Software Assurance program of the Strategic Initiatives Branch of the National Cyber Security Division (NCSD) of the U.S. Department of Homeland Security. BSI (http://buildsecurityin.us-cert.gov/portal) contains and links to best practices, tools, guidelines, rules, principles, and other resources that software developers, architects, and security practitioners can use to build security into software in every phase of its development. BSI content is based on the principle that software security is fundamentally a software engineering problem and must be addressed in a systematic way throughout the software development life cycle.
In addition to BSI, and out of the multitudes of sites on the Web that address some facet of secure coding, some sites are listed next to assist in secure application development:
Scanning once is not enough; ongoing application assessment is essential to implementing effective secure development practices and, in turn, a secure application.
White box testing strategy deals with the internal logic and structure of the code. White box testing is also known as glass, structural, open box, or clear box testing. In order to implement white box testing, the tester has to deal with the code, and therefore needs to possess knowledge of coding and logic (i.e., internal working of the code). White box tests also need the tester to look into the code and to find out which unit/statement/chunk of the code is malfunctioning.
Advantages of white box testing are:
Disadvantages of white box testing are:
Black box testing is testing without knowledge of the internal workings of the item being tested. Blackbox testing is also known as behavioral, functional, opaque box, and closed box. For this reason, the tester and the programmer can be independent of one another, avoiding programmer bias toward his own work. Test groups are often used. Due to the nature of black box testing, the test planning can begin as soon as the specifications are written.
Advantages of black box testing are:
Disadvantages of black box testing are:
Gray box testing is a software testing technique that uses a combination of black box testing and white box testing. Gray box testing is not black box testing, because the tester does know some of the internal workings of the software under test. In gray box testing, the tester applies a limited number of test cases to the internal workings of the software under test. In the remaining part of the gray box testing, the tester takes a black box approach in applying inputs to the software under test and observing the outputs.
Testing is a complete software engineering discipline in its own right. Volumes have been written about the various techniques that software engineers use to test and validate their applications. Some basic best practices can be employed regardless of the testing methodology selected. These are:
In addition to these best practices, use of these standards can provide additional resources:
Computer security practitioners and top-level managers must understand that IT management deficiencies, left unchanged, will sabotage efforts to establish and sustain effective computer security programs. The costs of neglecting security at the start will continue to be a serious issue until software security is fully integrated into every aspect of the development life cycle. Security must be built in, and building security into systems starts with the software development practices and techniques used to build those systems.
Campanella, J. (1999), ed. Principles of Quality Costs: Implementation and Use, 3rd ed. Milwaukee, Wisconsin: ASQ Quality Press.
Felten, E., & G. McGraw (1999). Securing Java: Getting Down to Business with Mobile Code. New York: John Wiley & Sons. Also free and unlimited Web access from www.securingjava.com.
Fox, C., and P. Zonneveld (2006). IT Control Objectives for Sarbanes-Oxley: The Role of IT in the Design and Implementation of Internal Control Over Financial Reporting, 2nd ed. IT Governance Institute.
Institute of Electrical and Electronics Engineers. IEEE Standard for Software Test Documentation. ANSI/IEEE Std. 829-1983. 1983.
Institute of Electrical and Electronics Engineers. IEEE Standard for Software Quality Assurance Plans. ANSI/IEEE Std. 730-1981. 1981.
McGraw, G., and E. W. Felten Java Security: Hostile Applets, Holes and Antidotes—What Every Netscape and Internet Explorer User Needs to Know. New York: John Wiley & Sons, 1997.
McGraw, G., and E. W. Felten. “Understanding the Keys to JavaSecurity—the Sandbox and Authentication,” 1997; www.javaworld.com/javaworld/jw-05-1997/jw-05-security.html.
NASA Software Assurance Technology Center, http://satc.gsfc.nasa.gov/.
Ould, M.A. Strategies for Software Engineering: The Management of Risk and Quality. New York: John Wiley & Sons, 1990.
RSA Data Security, www.rsasecurity.com/products/.
Schneier, B. Applied Cryptography: Protocols, Algorithms, and Source Code in C, 2nd ed. New York: John Wiley & Sons, 1995.
Stallings, W. Network and Internetwork Security: Principles and Practice. Englewood Cliffs, NJ: Prentice Hall, 1995.
Tipton, H. F., and K. Henry. Official (ISC) Guide to the CISSP CBK. Boca Raton, FL: Auerbach Publications, 2006.
3.21.247.9