Chapter 11

Software Design and Development

Geoff Patch,    CEA Technologies Pty. Ltd.

Software development for embedded systems is not like software development for desktop systems. This chapter provides detailed descriptions of the key differences between the two domains. It examines operating system support, real-time requirements, resource constraints, and safety. It then provides advice about tools and techniques to develop embedded system software, with the primary emphasis on processes that help the embedded system developer to reliably and repeatedly produce high quality embedded systems. The advice is based upon techniques that have developed over many years, and have been proven to work effectively in highly complex real-world military and industrial applications.

Keywords

Software; process; design; tools; techniques; quality; reliability

We are surrounded in our daily lives by embedded systems. Modern telephones, automobiles, domestic appliances, and entertainment devices all have one thing in common—large amounts of complex software running on a processor, or processors, embedded in the device. That software implements much of the functionality of the device.

The design and development of software for these systems is a specialized discipline requiring knowledge of constraints and techniques that are not relevant to more conventional areas of software. While the rapid proliferation of embedded systems is being reflected in the gradual introduction of relevant courses into university degrees, in many ways, the efficient production of high-quality embedded software is still an art rather than a science.

This chapter will examine software design and development in the embedded system domain. It explains the processes, tools, and techniques that may be used to overcome the difficulties of development in this environment.

Please remember

All examples and case studies in this chapter serve only to indicate what you might encounter. Hopefully these examples and case studies will energize your thought processes; your estimates and calculations may vary from what you find here.

Case Study: Loading ammunition

Embedded systems are important. I am a competitive target shooter. To achieve the best possible accuracy during matches I manufacture my own pistol and rifle ammunition. Part of this process involves measuring out the charges of propellant powder that drive the projectiles down the range. These powder charges generate pressures of tens of thousands of pounds per square inch during firing, and propel the projectiles at velocities of thousands of feet per second.

The charges are for historical reasons measured in an old-fashioned unit known as the grain. There are 437.5 grains in an ounce. Typical powder charges are surprisingly small. They must be measured very accurately to achieve consistently high performance from the ammunition. For example, a typical powder charge for a pistol round might consist of 3.5 grains of powder, measured to within one-tenth of a grain.

What makes this interesting is that while a charge of 3.5 grains might be perfectly safe, a charge of 4.5 grains might be enough to cause dangerously high pressures during firing that could damage the firearm and perhaps injure me.

To ensure that I measure my powder correctly and safely, I use a high precision digital electronic powder scale specifically designed for this purpose. It features an embedded control system that performs the measurements and drives a user interface consisting of several buttons and a digital display with a variety of readouts. I absolutely rely on the correct operation of this little embedded system, because if it fails me at best I may damage my firearm and at worst I could be seriously injured or killed (Figure 11.1).

image
Figure 11.1 A little less than one-hundredth of an ounce of propellant powder resting on an electronic powder scale. This is a safe amount in the intended bullet, but two-hundredths of an ounce of the same powder would be highly dangerous. (copyright 2014 by Geoff Patch. All rights reserved. Used with permission.)

Embedded systems are really important.

Distinguishing characteristics

Before contemplating how to approach the development of embedded software, you need to understand what actually distinguishes embedded software from software developed for either the desktop or the server. Understanding the difference between embedded and other domains is extremely important. The distinguishing factors are not trivial; in many cases, they are not obvious. Whether obvious or not, these factors are critical drivers in embedded software development programs; appreciating their impact can differentiate eventual success from failure of a development program.

Not every embedded system has all of these characteristics, of course, and some nonembedded systems will share these characteristics. It would be surprising, however, to find an embedded system that doesn’t embody some of the features from the following list.

Minimal operating system support

A full featured desktop operating system (OS), such as Linux, provides an enormously powerful virtual machine for the desktop software developer to develop against. In most cases, the physical hardware underlying the OS platform is almost entirely abstracted away by large Application Programming Interfaces (APIs) that hide the complexities of dealing with the hardware on which the application is running. Beyond hardware abstraction, desktop OSs provide other abstractions such as virtual memory, multiprocessing, and threading that provide the illusion of multiple independent tasks operating concurrently. In fact these software processes sequentially share a single processor or a small number of cores on a multicore CPU.

The features of a desktop OS provides the desktop developer with enormous power and flexibility, but they come at a cost which includes the amount of storage required to store the OS image and the amount of RAM required to actually run the system. At the time of writing, large desktop OSs may require gigabytes of storage and RAM to operate successfully. As well as their storage requirements, such OSs also consume substantial amounts of processing power to provide their services.

In many cases, it is not feasible to use one of these large-scale OSs in an embedded system because it would make the device under development technically or commercially infeasible. In these circumstances, the developer is faced with two options.

For small systems, the best choice may be to have no OS at all. The embedded software may consist of a program that runs a control loop that polls devices for activity, or responds to flags being set by interrupt handlers. This solution is simple, inexpensive, and is often the best choice for small high-volume applications where minimizing cost is a key driver.

For more complex applications, particularly those where multiple activities must be conducted in parallel, the best choice is a Real Time Operating System (RTOS). An RTOS is a small, efficient, and fast OS specifically targeted at the embedded system domain and designed to meet performance deadlines. RTOSs provide many of the useful capabilities of the desktop OS such as multithreading, while avoiding the plethora of nice-to-have features that consume resources without adding core capabilities to the system.

RTOSs will be discussed in more detail later in this chapter.

Real-time requirements

A real-time software application must meet all processing deadlines. Ganssle and Barr define real time as “having timeliness requirements, typically in the form of deadlines that can’t be missed” [1]. A system fails if it does not meet either the logical or the timing correctness properties of the system. While not universal, timing constraints in embedded systems are extremely common.

A common mistake is to believe that a system with real-time requirements must be “fast.” A system with real-time requirements simply has to deliver numerically correct computational results while meeting its temporal constraints. A correct numerical result delivered late is a failure.

Determining real-time requirements in an embedded system depends entirely on the nature of the system. In radar systems, timing constraints are frequently specified in units of nanoseconds. In an automobile engine management system, timing constraints may be specified in microseconds or milliseconds. While it may be stretching the spirit of the definition, it’s entirely possible that a system may have timing constraints specified in seconds or minutes or even days.

However tight the constraints may be, the addition of timing requirements to a system is frequently a key discriminator between embedded systems and those in other domains. Spreadsheets, word processors, and web browsers work just as fast as they can; the result is delivered when it’s available. Users may wish for faster performance from such systems, but a complaint to a spreadsheet vendor that the calculations are performed slowly would most likely fall on deaf ears.

Real world sensor and actuator interfaces

Real-world interactions drive real-time requirements in an embedded system. This interaction senses some aspect of the external environment around the system; makes decisions based on the information received from the sensors or inputs; then drives change in the external environment through actuators or other control mechanisms.

The pattern of real-world sensing and control, subject to real-time constraints, is extremely common in the world of embedded systems. Moreover, it is one of the most difficult areas to understand fully and implement correctly. The detailed mathematical study of this topic falls under the subject of control systems found in Chapter 8.

Resource constrained

The computational capacity of most embedded systems is severely constrained. The processor environment is typically much less powerful than the current state of the art in the desktop environment.

This isn’t arbitrary. Nothing is free. Heavy duty computational power typically requires large processors in large physical packages, which require large amounts of electrical power. Thus begins a cascade effect on design: large amounts of electrical power require the installation of large power supplies; large amounts of power result in the generation of large amounts of heat, which necessitate heat sinks, fans, and other infrastructure to move the heat away from the processor and peripherals. All these items add complexity, weight, and volume to the embedded system.

As well as limiting the raw computational power of the processor, the physical environment will constrain many other aspects of the software. The amount of nonvolatile memory available to store the executable image often limits the size of the program. Both the amount of RAM available and the speed at which it operates will limit data structures. The availability of interfaces with the appropriate performance constraints will limit the speed of communication with other systems.

Factors that need to be considered in this area are size, weight, power, heat dissipation, and resistance to shock and vibration. (Chapters 810 go into the trade-offs for these factors and parameters.) In general, these factors will significantly influence the amount of processing power available to an embedded system software developer.

Case Study: The constrained environment

The constrained environment for hardware provided by most embedded systems must be considered during software design and development. The software for embedded systems frequently develops in parallel with the hardware. You first develop and test algorithms in isolation on desktop systems due to the ease of development in a feature-rich environment for software development.

In a number of cases, I have seen systems fail at the integration stage due to algorithms working perfectly in high-performance desktop workstations, but failing to meet their timing requirements when embedded into their target hardware.

The fault in all these cases was not the limitations of the embedded environment. That was a well-known quantity throughout their development. The fault was that the engineers ignored the realities of the resource constraints of their target systems, resulting in the need to perform substantial rework at great cost. These lessons are both expensive and unnecessary.

Single purpose

Embedded systems often have fixed and single purposes. An engine control module only manages the engine; it has no other function. An industrial process control system will be designed to control a well-defined set of industrial processes. It won’t have to download and play MP3 files off the Internet while maintaining the membership database of the operator’s basketball club.

Contrast the embedded system’s single-mindedness with a desktop system, which is inherently general and multipurpose. Users may install software developed by any third party and at least, in theory, they may run any number of programs simultaneously in any combination.

Educational institutions universally teach the principles involved in developing such general-purpose software as being the correct paradigm to follow for successful application development. This is true outside the embedded systems arena, but following this paradigm can have unexpected and even harmful consequences when used in the development of embedded systems.

For example, dynamic memory allocation techniques allow developers to allocate and deallocate blocks of memory as the load on their application changes. This is advantageous in a general-purpose system performing many different tasks simultaneously because it allows an application to use only the amount of memory it needs, which frees unused memory so that the OS can allocate it efficiently to other tasks. Unfortunately, general-purpose algorithms for memory allocation that work well on the desktop suffer from problems such as temporal nondeterminism and memory fragmentation that can make them totally unsuited for use in an embedded environment. If you are adding rows of cells to the bottom of a spreadsheet, you are unlikely to notice a delay of 10, 20, 30, or even 100 s of milliseconds every now and then when the spreadsheet has to allocate a block of memory to accommodate the new data. On the other hand, if the fuel injection control computer in your vehicle paused for 50 milliseconds to allocate some new memory as you accelerated and failed to supply fuel to the engine for around five engine revolutions, then that is something you would almost certainly notice.

Long life cycle

Embedded systems, particularly military and large-scale industrial control systems, have life cycles that far exceed those of desktop systems. Some may run as long as 30 or even 50 years! The impact of this long life cycle is that long-term maintainability of embedded system software is extremely important.

If an application such as a game has a life cycle of a year or two, it would be unprofitable to allocate extensive resources to ensuring the long-term maintainability of the software. On the other hand, if an embedded system has a life cycle of 20 years, then it’s likely that multiple generations of developers will support and maintain it over that extended time frame.

This extended product life cycle has major implications for the nature of embedded software. From the very start of the project, extensive effort must be put into documentation such as code comments, system interface specifications, and system operational models.

If the system is not carefully designed, developed, and documented so that it can be successfully maintained throughout its working life, then it will fail over the long term.

Reliability and design correctness

In many cases, the failure of a software system will result in nothing more than some degree of inconvenience to the operator. This is true even with embedded systems such as mobile phones and entertainment devices.

Many embedded systems, however, operate in mission-critical and safety-critical environments. The lives of hundreds of sailors may one day depend on the correct operation of their fire control radar as it guides a missile in the defense of their ship. The continuing heartbeat of a person wearing a heart pacemaker depends entirely on the correct operation of the software and electronics embedded into that device.

Under these circumstances, instead of mere inconvenience, the failure of a system or a design flaw may result in death, injury, economic loss, or environmental damage. These adverse consequences may affect either individuals or entire nations. Therefore, the emphasis in the design and development of such systems must be quality, correctness, and reliability above all else.

Safety

Operator health and safety in the world of desktop software development is typically limited to consideration of aspects such as choosing screen layouts and character fonts that minimize eyestrain. This is in sharp contrast to the world of embedded system development, where embedded devices frequently control powerful machines and processes at incredible speeds.

Embedded software controlling powerful machinery has the capability to injure and kill people and in notable cases it has done exactly that. Producing software that is reliable is a common design goal in all domains, but producing safe software is critical in the area of embedded systems.

Case Study: Therac-25

When desktop systems go wrong, the usual result is some inconvenience. When embedded systems go wrong, the result may be inconvenience if you’re lucky or death you’re not.

One of the most well-known and well-studied cases involving loss of life caused by the failure of an embedded system was a device called the Therac-25. The Therac-25 was a radiation therapy machine intended to treat cancer by directing beams of radiation at the tumor site. The device contained a control system running on a PDP-11 minicomputer with an operator interface connected to a VT-100 terminal.

The embedded control software was poorly implemented and contained a number of defects. As well as the software defects, there were a number of system level defects in the device such as the absence of mechanical safety interlocks in the components responsible for the formation and deployment of the radiation beam.

The combination of software and hardware design flaws and defects meant that under certain rare circumstances, the patient could be subjected to beams of radiation that were around 100 times as strong as the intended dose. Despite the complex nature of the interactions required to trigger the defect, the circumstances occurred on six occasions, on each occasion resulting in the patient receiving a massive overdose of radiation. All of the patients suffered severe injuries involving radiation burns and radiation poisoning, and three of the patients died as a result of the injuries they received.

A detailed study of the Therac-25 case may be found in Ref. [2].

Standards and certification

Most embedded systems must comply with domain specific standards and achieve certification by relevant authorities. This is particularly true in areas where the incorrect operation of the system may result in harm to people or the environment. Conversely, most desktop software is not developed against particular standards or subject to certification by any regulatory body.

Compliance with external standards has many costs. For example, developers must know the required standards and train to produce software that conforms to the standards. Work products, such as code and documentation, must then be externally audited to ensure compliance.

Cost

Embedded systems are expensive to build and maintain! Implantable medical devices take 5–7 years and between US$12 and 50 million to develop. Industrial and manufacturing test stations may cost millions and support can run into hundreds of thousands of US dollars each year. An automobile recall to simply reprogram an engine control module may only take 15 min per car but at a rate of US$50 per hour for 1 million cars that amounts to US$12.5 million; and that does not include the nonrecurring engineering cost to develop and test the new software!

Companies building embedded systems must employ, train, and then retain highly skilled staff with esoteric expertise. High-quality development tools for embedded software, such as compilers and debuggers, can cost orders of magnitude more than their desktop equivalents. The development of the system may require expensive electronic support tools, such as logic analyzers and spectrum analyzers. The development processes used, and the resulting work products, may be subject to standards compliance audits that increase the time and cost required to perform the development.

Product volume

Embedded systems usually minimize the cost of the processor environment in large volume manufacturing to save costs and to maximize profits, which helps the product to be commercially competitive. Minimizing the processor environment directly affects the software that operates it. Conversely, the costs of packaging and delivering desktop software, however, are generally small and have no impact on the development of the software.

This cost reduction in embedded systems usually results in the use of less capable devices in the processor environment. You must carefully consider this constraint in the design and implementation of the software.

Specialized knowledge

Developing software in any environment requires a combination of software development expertise and domain-specific knowledge. If you are going to produce an online shopping web site, you will need web programming and database skills, combined with some knowledge of retail activities, financial transaction processing, tax regulations, and various concerns specific to the business domain.

The same thing applies in the development of embedded systems, except that the domain knowledge required is usually much more specialized and complex than that found in most desktop systems. Esoteric mathematical techniques are commonplace in the world of embedded systems. For example, the fast Fourier transform (FFT) is a digital implementation of a method that extracts frequency information from time-sampled data. It is seldom seen in desktop applications, but may be found in numerous embedded systems, ranging from consumer audio to advanced radar systems.

Similarly, the Kalman filter is a complex algorithm originally devised for tracking satellites. While rare on the desktop, it has become commonplace in embedded systems. It is the basic algorithm for a vast variety of applications such as the Global Positioning System (GPS), Air Traffic Control, and advanced weapon control.

These techniques, along with others such as control system design, require years of advanced study to master. The people who have such advanced skills are in high demand.

Security

The issue of system security has historically received little attention in the embedded system space. Surprisingly, this was actually a reasonable policy to follow in most cases because embedded systems were typically isolated from potential sources of attack. An engine management system, isolated in the engine bay of a motor vehicle and connected only to the engine, provides few opportunities for compromise by a hostile actor.

Increases in processor power, however, have been accompanied by increases in connectivity throughout the world of computing; now embedded systems have become a part of this trend. As a consequence, previously isolated subsystems tended to be designed so that they could be connected to each other within a larger system. In the automotive world, the widespread adoption of the Controller Area Network (CAN) bus provided a reliable and inexpensive means for linking together the disparate control and monitoring systems within a motor vehicle [3]. This development provided many benefits, but it also meant that the engine management system no longer remained isolated.

The trend toward increasing connectivity has continued. Many embedded systems now have Internet connectivity, which is often implemented to provide remote system diagnostic and upgrade capabilities. Unfortunately, these enhanced capabilities carry with them the risks associated with exposing the components of the system to unauthorized external assaults, i.e., hacking.

The security of embedded systems was once an afterthought, if it was thought of at all. However, embedded systems play a massive part in the operation of our modern world. Many of these devices now have their presence exposed on the Internet or through other means. This means that security must be designed into the systems from the very start if they are to operate robustly in this unpredictable environment. Chapter 12 provides more insight into security issues.

The framework for developing embedded software

Human beings are inherently orderly creatures. From our earliest years, we are happiest and most prosperous when provided with clear rules and boundaries that are presented in a consistent and coherent manner and that provide tangible benefits as we grow. This desire for order and regularity is reflected in the laws and regulations that we implement to govern our lives. While restricting our actions in many areas, an orderly framework of laws allows a group of disparate individuals to come together as a society and work harmoniously together for the good of all.

The benefits of defining and then working within a commonly understood framework seem obvious, regardless of whether we are referring to an entire society or to a small business unit within an engineering organization. And yet the common experience of software developers in general, whether embedded developers or otherwise, is that their development activities are conducted in an environment that is remarkably unconstrained by rules and regulations.

This lack of structure and constraint is all the more remarkable when software development is contrasted with professional activities in other fields such as law, medicine, and accounting. For example, an accountant working for a corporation will work according to a very clearly defined set of corporate accounting procedures. These procedures will have been derived from the common corporate governance legislation that all companies are subject to within a country, with the result that there is a high degree of standardization within the profession and general agreement on how accounting should be done.

Similar comments could be made about professions such as law and medicine, but the situation with software development is still in a state of flux. Despite activities such as the Software Engineering Body of Knowledge (SWEBOK), there is no common understanding within the profession about how to do good software development [4]. Every organization produces software in a manner that is different to every other organization. Even different departments within the same company may work in completely different ways to produce similar products.

This situation is attributable in large part to the newness of the profession. People have been building bridges for thousands of years. Apart from the occasional mishap such as the Tacoma Narrows collapse, bridge building is a well-understood problem with well-understood and reliable solutions [5]. Large-scale software development, on the other hand, has only been widely conducted over a period of a little more than 40 years, so the common understanding of how to develop good software is still maturing. The growth of this understanding is made more difficult by the incredibly broad variety of applications software and the resistance of many practitioners to the very concepts of regulation and standardization.

In most cases, this resistance springs from a misguided belief that working within such a framework will crush creativity, eliminate elegance, and turn satisfying creative work into soul-destroying production line drudgery. The fact is that nothing could be further from the truth. Written English, for example, is produced within the context of very tight constraints regarding spelling, grammar, and punctuation. Yet within those constraints for written language, you can produce anything from a sonnet to a software user manual. In fact, the widely understood rules of written language do not hinder creativity or elegance, they encourage it.

This section aims to provide guidance for developing and implementing a high-level framework for software within embedded systems. This framework is not a complex or an unproven theoretical construct with burdensome overheads. It is a simple set of ideas that have been refined over many years of engineering practice. It has reliably and repeatedly produced high-quality embedded systems with minimum overhead.

Processes and standards

Many organizations engaged in the development of embedded systems don’t know what they’re doing. What I mean by this is that they don’t understand the mechanisms that conceive products and then deliver successful products. The result of this lack of understanding is that work is performed on a “cottage industry” basis and difficulties abound. Each project is approached differently. If more than one engineer is involved in a project, they will use different tools and different techniques. Communication between staff may be difficult, components may not integrate well together, and heroic efforts are often required of individuals to achieve deadlines.

Given a fair amount of luck a project that is run like this will succeed, but nobody will be quite sure why it was a success. Without the addition of luck, there will be just as much doubt as to why the effort was a failure.

A number of steps are required to minimize the requirement for large helpings of luck in successful development programs. First, you need to develop an understanding of operations. This involves studying and reflecting upon how development is performed within your organization, determining what works and what doesn’t work, and then documenting these observations in a set of operational processes. Second, once these documents have been developed, they should be promulgated within the organization and adopted as the corporate standard to be used by all engineers.

Case Study: Corporate coding style

While it is easy to talk about developing and implementing processes, it’s not so easy to do. People are not machines and people who are set in their ways are not as easily reprogrammed as computers are.

Developing, implementing, and enforcing consistent processes within a group requires a high degree of technical skill to distinguish good techniques from bad and to be able to document those techniques successfully. Along with the technical skill, though, is a requirement for substantial management skill involving sensitivity, awareness of human nature, and a willingness to compromise combined with determination to ensure that necessary changes are pushed through.

Most engineers working in unconstrained development environments will have strongly held views about how things should be done and it’s highly unlikely that they will all have the same views on every subject. A classic example of this is the placement of parentheses within source code written in the C programming language and its derivatives. There are many different ways to place the parentheses within the code; most engineers quickly adopt a style with which they are comfortable. For many years I wrote C code using the formatting convention described by Kernighan and Ritchie in their classic book “The C Programming Language.” When I attempted to introduce a code formatting standard within my team, I assumed that all of my staff would happily adopt that formatting convention. To my surprise, I met passionate resistance to this suggestion. To my further surprise, I met passionate resistance from at least one person to every other formatting style that I suggested.

It didn’t take me long to realize that everybody on the team viewed their particular formatting style as the best way to lay out code and that they were very unwilling to adopt any other style. This was a difficult situation, as the only options were to abandon the effort to standardize the code layout or enforce a code layout style that would leave at least some members of the team unhappy.

I was convinced of the benefits of adopting a standard corporate coding style, so I chose the latter course. I consulted extensively with each engineer, so that everybody felt that their voice was being heard. (This is supported in Chapter 2—Ed.), and produced a corporate formatting style that looked good and pleased most staff. As expected, there was some unhappiness within the team, but after a few months everybody came to see the benefits of consistent code layout across projects and the arguments were forgotten.

One size doesn’t fit all

Programs for developing embedded systems come in all shapes and sizes. It’s very unusual for one project to closely resemble the next project in dimensions such as scope, complexity, schedule, and staff requirements. For this reason in most cases, it’s more important for process documents to describe what (the ultimate goals and objectives) should be achieved rather than how (the exact procedures) it should be achieved.

For example, the level of system review conducted on a small project that occupies one person for two months will be different to that conducted on a project that occupies six people for three years. Applying a process that is suitable for one of those projects to the other project could be wildly inappropriate. What is appropriate, though, is to have a review process that specifies in part, that regardless of the size of a project, the preliminary design of the system should be documented and externally reviewed prior to the commencement of coding.

Specifying the process in this manner means that the review of the small project could be conducted by a couple of people in a morning, while the large project review might be conducted by ten people over a period of a week. In both cases, the desired business outcome occurs in accordance with the process, with a level of complexity and overhead incurred that is suited to the nature of the project.

While this advice is true in general, there may be cases where you want to be prescriptive, where one size really does have to fit everybody. For example, while some organizations allow projects to use various coding standards, the usual case is for a single standard to be applied at the corporate level. In this case, all code produced within the organization has to comply with the standard, regardless of the nature of the project that it is produced for.

It’s also worthwhile considering the nature of the projects that are undertaken by your organization. While there may be substantial differences between individual projects, it’s quite likely that most of them will fit into categories such as “small,” “medium,” and “large,” or perhaps “safety-critical” (or mission-critical) versus “nonsafety-critical (or nonmission-critical).” If this is true, then you should generate specific process templates applicable to each of these different categories of project. The result is that at the start of a project, the engineers involved can simply copy the template appropriate to their project, perform some minor tailoring, and have their project process ready to go with minimum effort.

Process improvement

We live in a rapidly changing world. Change applies to development environments just as much as to any other aspect of life, so you must ensure that corporate processes and their related documentation keep up with evolutionary change. If this does not happen, then either the enforcement of obsolete processes will inhibit necessary and valuable change or changes will occur anyway without the documentation being upgraded so that it eventually becomes useless.

You should examine each recommendation for change to ensure that it is beneficial. In many cases, proposals for process change arise naturally with the introduction of new technology or techniques. Recognizing and embracing such positive change enhances productivity and enhances staff involvement as they feel that they own the processes. (Again, confirmed and supported in Chapter 2—Ed.)

On the other hand, not all change is good. In particular, you should be aware of the concept of “normalization of deviance,” coined by sociologist Diane Vaughan in her book on the Challenger space shuttle disaster. The concept relates to a situation in a process-oriented environment where a small deviation is made from the methods described by a process, with no ill effect [6]. Each time such a process deviation occurs that doesn’t lead to a negative result people become more tolerant of that unofficial standard, with the result that actual working methods gradually diverge greatly from the documented processes. It is very likely that at some point, as with the Challenger disaster, the normalization of deviance will produce results that greatly diverge from being harmless to being catastrophic.

Regardless of how it eventuates, work practices will change. You should welcome and incorporate beneficial work practices into the overall process framework while recognizing harmful practices and taking active measures to reject them.

Process overhead

Complying with corporate processes takes time and effort. In a business environment, this equates to money. Process compliance should cost less than the benefits that are gained from such compliance and if this is not the case then you’re not only spinning your wheels, you’re also going backward! Therefore, an important aspect of process improvement efforts is studying how processes work and how to modify them, to minimize both the number of processes required for successful operations and the amount of effort required to comply with the processes.

Case Study: Nothing in this life is free

As well as delivering high-quality products, my team wants to ensure the continued quality of our systems as they undergo maintenance and improvement after being fielded. Therefore, my engineering team performs a “last line of defense” review on changes that are applied to software in delivered systems. This review process is known as the Change Control Board (CCB), and its purpose is to ensure that all due diligence has been performed during the design, implementation, and testing of each change. In some cases, we may inspect the actual code itself, but primarily we look for evidence that we have complied with our processes, and that they have been executed professionally. We look for comprehensive evidence of high-quality design, review, and testing activities, with the expectation that these activities will produce high-quality products.

The CCB consists of myself and the four most senior engineers in the group. Depending on the workload, the CCB may meet several times each week. Five staff members meeting for around two hours per week at around $100 per hour for each person means that the CCB is costing the company around $1000 per week.

That’s a lot of money, but nothing in this life is free. Is the money well spent? You bet it is! Occasionally the CCB will detect a problem and require more effort to be applied to a work package to fix a defect or improve quality in some way. Each of these problems represents pain that we didn’t inflict on our customers, corporate reputation that was not lost, and product recalls or field updates that we didn’t have to apply. And ultimately, all of those things represent a lot of money saved for the company.

Ensuring that we maintain a consistently high standard of product quality throughout the life of every delivered product is worth every penny that we spend on it.

Unnecessary processes, which don’t contribute business value or overly bureaucratic processes with high compliance costs, can actually impede rather than assist progress by wasting staff resources in an unproductive manner.

Case Study: A peer review process needed review

I strongly believe in the benefits of peer review of work products. One of my first process improvement efforts involved the implementation of a process for source code review. The original version of this process involved a great deal of overhead. It described in minute detail how responsibilities were to be allocated during a review, how initial planning meetings should occur, followed by review meetings, followed by washup meetings, how all of these meetings should be scheduled, the nature of the outputs of the review, and much more.

I put a lot of effort into developing this process and I thought it was an excellent piece of work. It was only after I introduced the process and tried to make it operate that I realized it was a complete failure. The process that I had specified was so detailed and onerous that my team would have had to spend far more time reviewing code than writing code to comply with the process. After some initial enthusiasm, the team struggled with the overheads involved and very quickly let me know that we had a problem.

I scrapped the original effort and started from scratch. We obtained a web-based collaboration tool that allows a group of engineers to annotate code listings with comments that are e-mailed to all of the members of the group. The amount of effort required to perform a review using this tool is minimal and the engineers in the team accepted it enthusiastically. Source code reviews using the tool now occur as a matter of course, code quality has improved, and the staff can visibly see the benefits of engaging with the process.

Process compliance

I have established that developing and implementing processes requires considerable management skill. Maintaining process compliance requires a similar, if not greater, level of management involvement.

No matter how you look at it, process compliance involves extra work and generally it is not nearly as interesting or exciting as writing a cool new piece of code. Furthermore, many of the long-term advantages of working within a process framework are not evident during the early stages of building a process-oriented environment. As a result, people tend to ignore processes, or put process-related work at the “bottom of the pile” so that it never gets done. There is hope! You can take a number of steps to address this problem:

1. Persuade staff members of the benefits of process compliance. This is generally a matter of explaining and demonstrating how a small amount of short-term pain can result in a large amount of long-term gain.

2. Actively involve staff members in process development. Workers will resist irrelevant or burdensome processes that are imposed by management without consultation with the people in the trenches. (Confirmed in Chapter 2—Ed.)

3. Encourage and enforce useful and effective processes. You might as well not have the process if it is ignored. If you honestly believe that a process is useful and beneficial for your business, then deviations from that process, no matter how small, must be corrected whenever they occur.

Case Study: White space

I believe that a consistent “look and feel” is very important when code is produced by multiple developers, so our coding standard is quite prescriptive about where space characters and blank lines are to be inserted in a source code file. There are arguments to be made both for and against this level of detail in a coding standard, but we think it has value so that’s how we work.

New staff coming into the group often find it difficult coming to grips with the coding standard for “white space,” and they tend to drift from the standard, inserting white space characters as they see fit, rather than in accordance with the standard because they see it as a trivial issue.

In response to this noncompliance we could say “Oh, it’s only white space, it doesn’t really matter,” but we don’t. Our standard describes how we work, and we comply with that standard in all regards, both large and small. Our review process catches and corrects these smaller defects, just as it catches the bigger issues.

Automation is also a big help in this area. We now use an open source code formatting tool called “Uncrustify” and a database of formatting rules to mostly eliminate source code formatting problems.

A word of caution—always keep in mind that the aim of business is business, not process compliance. Processes are there to support business goals, not as an end in themselves. Do not mistakenly and rigidly follow processes at all times, regardless of the circumstances.

Your processes will be most beneficial for the predictable and routine within the work environment because they provide standard solutions to standard problems. That said, life will always throw you curve balls; unusual situations will arise that aren’t covered by your processes, or occasionally following your processes will produce undesirable results. Sometimes, for very good reasons, business demands may dictate that work be performed outside the process framework. Such cases should be atypical, you should violate processes only after due consideration and with full awareness of what you are doing. BUT…when it’s necessary don’t be afraid to do what has to be done to finish the job.

Under these circumstances, once the emergency has been resolved, it’s very tempting to put the unpleasantness behind you and carry on with getting the job done. While this is the easiest thing to do it’s also a mistake, because you will lose the opportunity to improve your processes by learning from what happened.

As soon as possible after the event, you should debrief all those involved. Get the players together and talk to them to get their story about what happened. For a variety of reasons, it’s likely that everybody will have a different story. Take written notes, work out what actually happened, analyze the event, and learn from it.

Some questions to ask are:

• Was this a one-off event, or is it likely to happen again?

• Could the situation have been handled differently and more effectively?

• Do you (or we) need to adapt your (or our) processes to cater for this happening again?

ISO 12207 reference process

The International Organization of Standardization (ISO) has produced ISO 12207—Software Lifecycle Processes [7]. It is a reference standard that covers the entire software development life cycle: acquisition, development and implementation, support and maintenance, and system retirement.

ISO 12207 details activities that you should conduct during each stage of software development within a project. This standard describes things that should be done during development, but not how to do them. You can best use the standard as a checklist of useful activities that can be tailored to provide a level of process overhead that is suitable for any given project.

Recommended process documents

The size and scope of your process documentation should depend on the size and nature of your organization. Your organization may only be you, sitting at a computer in your home office, or it may comprise you with a dozen other people working on a ground-breaking piece of new technology in a high-tech start-up.

You should have a documented set of processes that describe the standard approach that you take to standard issues or problems, which lets you concentrate on the unusual or interesting issues that arise. Suppose you started out alone, then one day someone starts working with you; documented procedures will save you time and money if you can say, “Read this…it’s how I’ve been doing things.” Suppose you are already working in a group, then one day somebody leaves; documented procedures will save you time and money if you can say to the replacement hire, “Read this…it’s how we do things around here.”

In every project, you should tailor the abstract guidance provided by each of these high-level documents into a set of concrete steps appropriate for the project. These procedural documents related to a specific project translate the general, “What do we want to achieve?” into, “This is how we are going to work on this project.”

Table 11.1 lists a minimal set of process documents. Developing and implementing this small set of processes will immediately benefit your development activities.

Case Study: Embedded systems can last a long, long time

In 1991, I commenced work on the development of the embedded target tracking software for a new radar system. The signal processing hardware environment was very complex, consisting of a custom multi-CPU pipelined parallel processor assisted by a cluster of DSPs which did most of the computational heavy lifting. The software was equally complex, if not more so.

During the next 15 years, this general-purpose tracking system was applied to multiple different problem domains, none of which we anticipated at the time of the original development. Over the course of those years the software was modified, enhanced, and rehosted to different hardware and operating system platforms. I estimate that the maintenance and enhancement of the software over that period of time occupied about 5 times the amount of effort that I spent writing it in the first place.

At the time of writing, that software is still in operational service in a number of locations, and is being supported by people who had only just been born when it was first written.

Case Study: How not to develop code and how to recover (by Kim R. Fowler)

This case study continues from Chapter 1. It is a story with a happy ending but not before we experienced considerable pain getting there. To refresh your memory, my company was the customer and we were buying an instrument to put on an aerospace vehicle. The vendor claimed to have a product that was “Commercial-Off-the-Shelf.” My company wanted to modify the design and add a number of sophisticated sensors to the COTS product. (Chapter 1 describes the details, which don’t need repeating here.)

For this chapter on software development, you should know that the product had two significant portions to it:

1. An embedded system resided on an aerospace vehicle. The system comprised a processing unit driven by fairly complex sensors. Data streamed from the sensors to the processing unit for data compression and multiplexing. From there the vehicle received the data and transmitted it to the ground in a high-speed serial stream. The embedded system had DSP chips running C code.

2. A ground-based terminal received the high-speed serial stream from the aerospace vehicle. The terminal demultiplexed and then decompressed the data. It filtered and manipulated, displayed, and then stored the data. The terminal was a high-end desktop computer running custom C code.

The first copy of the embedded system was to deliver at the end of August (the terminal was to follow a bit later). In early August, after multiple requests to the vendor, a colleague of mine bluntly asked, “Is it working?!! Demonstrate its current operation to us!” The vendor couldn’t.

Several colleagues and I packed up and flew across country to meet with the vendor. We found the following problems:

• Spaghetti code—code that wound through illogical and convoluted paths.

• Nested interrupt routines, in fact the entire program code operated out of these nested routines. (Never—ever—do this!!!)

• No common event handlers, each different input had a unique event handler.

• Orphan code segments, there were entire sections of code commented out with no clue as to their original purpose.

• No consistent style; in fact there was no software style guide.

• Legacy code that did not fit our particular project, the vendor was trying to reuse old code and “shoehorn” into the new application.

• The ground-based terminal was a custom program that was in as bad shape as the embedded system.

So what caused all these problems? First the vendor had a senior person who was a single software developer with no accountability. He was an original member of the company with some small ownership in company shares; the company assumed that he knew what he was doing since he was 20 years into his career writing software for aerospace applications. Consequently, the company had no processes, no code reviews, and no planned demonstrations of operations for the product. Actually they had no development plans—design, schedule, or test—of any sort. On top of all this, they were using an unfamiliar DSP processor, with caches, serial ports, and internal memory new to them, and a new development system. It was chaos. The vendor had bitten off more than they could reasonably chew.

We quickly assessed the situation and explained to the vendor how bad it was—for us and for them. We also explained what we expected them to do to deliver a working product to us. We prescribed the following steps and stages to vendor’s director of engineering:

1. Establish a schedule and delivery milestones.

2. Prepare staged releases of the software for the embedded system.

3. Hold monthly reviews of the software development with us in person, eyeball-to-eyeball.

4. Institute new software processes that would be defined and followed with code reviews and metrics for production and anomalies.

5. Prepare a complete set of documents.

6. Buy and integrate a commercial software package for the ground-based terminal.

As you know by now, the story turned out well. The vendor, under tutelage of the chastised director of engineering, knuckled under and worked very hard for six months with each employee putting in 70 and 80 hours per week. They instituted clear procedures and styles for developing software. The original developer just never got it. He hung around for several weeks only to resist the changes. Very soon there was a parting of the ways, which was good for the vendor’s staff.

We, the customer, rolled up our sleeves and jumped in with the vendor. We helped with the development system and worked to understand the DSP processors with them. We bought and integrated the software for the ground-based terminal. We reviewed their progress and then provided the test facilities and personnel for the environmental testing toward the end of the delivery schedule.

The final product delivered a bit late but it worked as advertised.

So what should the vendor, or any software development organization, do in the first place? Here are some recommended steps:

• Plan, execute, review, report, and iterate each step of the process.

• Prepare and follow development plans.

• Manage the configuration.

• Use a style guide.

• Maintain records of production metrics, bug rates and severity, and status of fixes.

• Perform regular and careful code reviews.

• Perform regular and careful project reviews.

• Deliver incremental releases, if possible, and demonstrate the growing system capability.

• No one—absolutely no one—runs open loop with no accountability.

Table 11.1

A minimal set of suggested process documents

Design Describe at a high level what you want your design process to achieve and the artifacts that should result from the design, such as system block diagrams, system use cases, interface specifications, and test procedures. Remember that every project will be different, so don’t go into too much detail here.
Implementation This is the place to provide an overall description of how your development activities are to operate. Describe the general development life cycle, and the tools such as version control systems (VCSs) that are used to support development. Also provided references to the other supporting standards such as your code formatting standard.
Release This is a key activity that often doesn’t receive as much attention as it deserves. Describe the process to transfer your newly developed products from your organization to your customers in a controlled and repeatable manner.
Review Rigorous peer review of work products throughout the development life cycle is a hallmark of a good software engineering environment. The description of the review process should mandate that these review activities take place. It should also include a requirement that all review activities be documented in some manner both to provide proof that the reviews are taking place and to describe the action items that arise. (See the details of review in Chapter 13.)
Code formatting Code formatting standards can be very contentious. This is one of those cases where you should consider being prescriptive, rather than providing general advice.
No particular formatting standard is significantly better than any other. The important thing is simply to choose something and then stick to it. The long-term maintenance benefits of doing this are dramatic.
Code commenting Some people suggest that code should be self-documenting. In an ideal world this would be true, but in the complex and difficult world of embedded system development this is almost never the case.
Remember that embedded systems typically have long life spans and that large amounts of the cost of the system may be spent on maintenance over the life of the product. Anything that can reduce the effort and hence the cost involved in software maintenance will have substantial long-term benefits.
As with the code formatting standard, it is best to be prescriptive in this document.
Language subsetting The language subset document is specific to the programming language. If your development work involves the use of multiple languages, then each language requires a different standard. The purpose of this document is to codify the best practice in the use of each language. It recommends the use of certain techniques or features of the programming language while forbidding the use of other techniques.

Image

The embedded system world has displayed considerable interest in language subsetting over the last few years, primarily driven by the widespread use of C and C++ in embedded environments. Two notable efforts in this area come from the Motor Industry Software Reliability Association (MISRA) and Lockheed-Martin.

MISRA is a consortium of motor vehicle manufacturers. In the late 1998, they produced a standard called “Guidelines for the use of the C language in vehicle based software,” otherwise known as MISRA-C:1998. This document contains a set of rules for constraining the use of various aspects of the C language in mission-critical systems. While originally targeted at the automotive industry, a wide variety of other industries has adopted this standard. It has undergone a number of revisions, and the current version is MISRA-C:2012. As well as the C language standard, in 2008 MISRA also produced a C++ standard called “Guidelines for the use of the C++ language in critical systems.” This document is known as MISRA-C++:2008.

The work by Lockheed-Martin relates to their involvement in the Joint Strike Fighter Program. As part of developing the avionic software for this aircraft, Lockheed-Martin developed a language subset known as the Joint Strike Fighter Air Vehicle C++ Coding Standards, or JSF++ for short.

Both the MISRA and JSF++ standards provide a good basis for study and the development of a good language subset. In particular, the MISRA standards have become so popular that many common compilers and static code analysis tools provide inbuilt standard support that you can easily enable.

Requirements engineering

If someone were to produce an old-fashioned map showing the embedded systems software development process, then the area entitled “Requirements” would surely also be labeled “Here Be Dragons.” There is no part of the process that is more prone to difficulty and no part of the process that is more critical to the development of a system than getting the requirements right. (Chapter 6 goes into great detail on generating and maintaining requirements.)

It’s as simple as this. At the start of a project, someone will have a set of ideas in their head about what they want the system to do. Those ideas represent the system requirements. Your job is to construct a coherent list of those requirements and then to build a system that meets the requirements. If you get the requirements wrong at the start, then even if you do everything else perfectly you will end up building the wrong thing. You will build something that your customer doesn’t want and your project will be a failure. It’s simple to describe, but it sure isn’t easy to do.

Requirements engineering in embedded system software development has the added difficulty that rather than being the end product, the software is a component that is embedded in a larger system. This means that the system requirements are usually expressed in terms of what the system must achieve, instead of what the software must achieve, with the result that the detailed software requirements are buried and lost inside the system requirements. It can be very difficult for an engineer to translate a requirement from an application domain, such as radar tracking or engine management, into a meaningful software requirement.

If you are aware of the importance and difficulty of good requirements engineering and if you allocate the necessary resources to perform this part of the process effectively, then you’ve taken a large step toward mitigating the risks associated with having bad requirements. You also need to keep in mind that even with your best efforts, the inherent difficulty of gathering requirements means that it’s unlikely that you’ll get it right first time. New requirements will emerge during the development of your product, and you must be prepared to accept these changes and to be flexible enough to incorporate them into your product during the course of the development.

So what do you actually need to perform good requirements engineering? My view is that the critical activities involved in requirements engineering are:

1. Collection. This is the process of discussing with all stakeholders and building an initial list of all of the things that the product is supposed to do.

2. Analysis. After collecting the initial requirements, you need to study and analyze them for completeness and consistency. Even if only one person specifies the requirements, it’s likely that the requirements will contain internal conflicts and inconsistencies that must be resolved at this point.

3. Specification. Once you have collected all of the requirements, you need to formalize them into a System Requirements Specification. This document will define each requirement in the most precise language possible and will allocate unique identifiers to each requirement for consistency of reference. The identifiers enable other derived documents, such as test specifications, to remain synchronized with the requirements specification.

4. Validation. A final check of the completely defined and identified set of initial requirements contained in the System Requirements Specification to confirm that they meet all of the needs, and the overall intent, of the customer.

5. Management. Once you have been specified and validated the initial requirements, you may use them to commence the design activities. From this point, and throughout the project, you will discover that despite your best efforts, your requirements will still contain flaws that need fixing. Some requirements will disappear, new requirements will emerge, and others will change shape. Requirements management involves staying on top of this fluid situation and ensuring at all times that you have a solid specification that precisely defines what you are attempting to build.

Example: Early models solidify requirements inappropriately

There are a number of variations on this theme. For example, I have worked with teams who liked to perform graphical system modeling during requirements engineering using the Unified Modeling Language (UML) and other tools. My experience has been that this tends to bring the design process into the requirements phase too early. The effort involved in producing and updating the models can cause people to be unwilling to pursue alternative design options, with the result that a suboptimal design can become fixed too early.

There are a wide variety of software products available to assist in all of these areas of the requirements engineering process. For smaller projects, a spreadsheet and a word processor are probably all that you need. For larger projects with hundreds or thousands of requirements, it can be very difficult to manage the process without the support of dedicated requirements engineering tools.

A good example of such a product is the DOORS requirements tracking tool originally developed by Swedish firm Telelogic, and now supported and marketed by IBM. This consists of a suite of different tools providing various capabilities, but the key functionality supports managing the development of sets of complex requirements for large systems.

Version control

Automated VCSs are a key tool in any modern software engineering environment. Developing software without the assistance of such a tool is like running a race with a rock in your shoe. Yes, you will reach the finish line eventually, but the race will be slower and more painful than necessary.

VCSs apply to all software development projects and are not specifically related to embedded systems. Regardless, VCSs still deserve some discussion here because their use represents a best practice in modern software development.

The simplest view of VCSs is that they are document management systems specifically targeted at maintaining libraries of text files where the text files typically contain software source code. The systems may be used to store other types of files, but they are less effective when used to manage binary files such as word processor documents.

The conventional method of operation involves storing a master copy of each file in a central repository. A database management engine of some kind typically manages the repository of files. When an engineer needs to make a change to a file, he or she checks the file out of the repository and copies it to a local computer for editing. When the engineer has made the required changes, he or she checks the updated file back into the repository, which simply involves copying the file from the local computer back into the repository.

There are many variations on this basic theme. For example, rather than simply storing entire new copies of changed documents in the repository, some systems encode and store the sequence of changes between one version and the next. When many small changes are made to large files, as is typical in software development environments, this technique can dramatically reduce the storage requirements of the repository and the time taken to access the files. Some VCSs lock the repository copy of a file when it is checked out, which enforces the rule that only one person can be working on a file at any time. While this technique is simple to understand and implement, it can be extremely inconvenient in a team environment as it forces serial access to the file. More advanced systems allow parallel development where multiple users are able to check out files simultaneously for update, with the system assisting to resolve any editing conflicts that might occur at check in time.

Now let’s consider the primary benefit of VCS which, as the name implies, is the control and management of different versions of a software product. If the world was a perfect place, VCS would be far less useful. Unfortunately, the world is not a perfect place and Version 1.0 of most software systems is followed by Version 1.1, then Version 1.2…you get the picture.

Management of this in the trivial case where the entire system is built from one source file doesn’t present much of a challenge, as there’s a one-to-one mapping between each version of the source file and the resulting executable. Unfortunately, few real-world systems are this trivial. I am aware of a major avionics embedded system that consists of over 35,000 source files. Systems of this magnitude are not unusual, and control of the development process in this environment is an extremely complex task that would be virtually impossible without the assistance of a VCS.

The first benefit provided by a VCS is a history of the changes that have been made to each source file, including the date and time of the change, the name of the engineer who made the change, and useful comments about the purpose of the change. This feature is very beneficial in understanding how a system has evolved and who was responsible for the change process, but more importantly, it allows changes to revert when things go wrong, as they inevitably will. This ability to quickly back out of and recover from defective changes can be an absolute lifesaver.

A further complication in a large-scale development environment is that there is very little correlation between the changes that are made to each file because changes are made to different files for different reasons at different times. A new version of the system might require changes to 100 different source files. The next version of the system might require further changes to 35 of those files, with more changes being made to 100 different files, and each succeeding version will require some unpredictable combination of changes to different files. The result is that a snapshot of the state of all the files in the system on one day will most likely be very different to a snapshot taken a week later. Manually reverting to a previous version of the system under these conditions would be an extremely difficult task to perform manually as it would require reverting each individual file to the correct state, but a VCS provides the mechanisms for doing this that make it a trivial process. The usual process is to tag all of the files in a project with the version number or some other unique identifier. Once this tag is in place, it is possible to extract the entire system with a specified tag in a single operation.

The next advantage provided by a VCS involves the coordination of distributed development activities by multiple engineers. As the physical separation between developers increases, the task of coordinating their updates to shared source files increases. The members of a development team these days may be located in different offices within a building, different buildings on a corporate campus, different buildings in different cities, or even in different cities on different continents in different time zones.

Managing the activities of such teams, particularly the very widely distributed teams, is nearly impossible without the use of a VCS. It’s still hard even with the use of a VCS, but the difficulties are substantially reduced. Consider two engineers attempting to work on a source file over a period of several weeks when one person is located in Los Angeles and the other is located in Sydney. Thousands of miles and a 7-hour time difference separate these two people. Without a VCS, coordinating their work would require a substantial amount of time and effort to be invested in phone calls at inconvenient times, transfer of files back and forth, and recovery from errors. Automating this process via a VCS will increase productivity by eliminating almost all of that overhead.

Finally, a VCS provides a single convenient target for system backup and recovery processes. Without the use of a VCS, an organization’s code base may be spread over multiple folders on a server or over multiple servers. In the worst case, the code may be spread over the individual workstations belonging to the members of the development team. In this sort of environment, the failure of the hard drive in a machine containing the only copies of the source code for an important application can result in a disastrous waste of time and effort.

With the use of a VCS, the source code backup process can be greatly simplified. This simplification means that the process will be inherently more reliable because all that is required is that the source code repository be backed up on a periodic basis to ensure coverage across the entire code base.

The design of a good backup process is beyond the scope of this discussion, but it’s worthwhile pointing out that if all of your backup media are stored in the same physical location as the server containing your VCS repository, then you could be heading for trouble. A disaster such as a fire or flood might result in you ending up with no server and no backups either, so always incorporate offsite storage of media into your backup regime.

A final point worth noting is that a VCS is just a tool. Used correctly, it will substantially reduce the costs of developing large-scale software systems. Even when used incorrectly, it will still help, but perhaps not as much as it should. For example, if an engineer checks a source file out of the repository and then works on it intensively for a month without checking it back in to the repository, then many of the benefits of change logging, collaboration, and easy backup are lost. As with most other areas, the development of simple and well-understood VCS management processes are important in ensuring that the benefits of using the tool are fully realized.

Effort estimation and progress tracking

Software development is an opaque activity. No other form of organized human activity resembles it. There is no other form of organized human activity in which it is so difficult to understand what people involved in a project are doing or achieving.

Consider conventional real-world construction projects that involve lots of metal and concrete. Things built out of concrete can be measured in terms that humans can intuitively understand. The completed length of a freeway, the completed height of a dam, and the number of stories completed on a skyscraper are all things that you can easily express and convey numerically and pictorially. If a freeway construction project is due for completion in three weeks and there are still three miles of unfinished road under construction, then it will be obvious to the project team that the project is in trouble. Very importantly, the problem will also be obvious to outsiders, even those who are completely unfamiliar with any aspect of freeway construction.

Software, as the name implies, is the exact opposite of concrete. The construction material used in software development is human thought, which is an infinitely light, malleable, and flexible material. Software engineers are capable of building large, complex, and incredibly intricate mechanisms out of this material.

Unfortunately, software doesn’t come with the intuitions that are associated with conventional construction materials; consequently, the conventional paradigms for human-oriented measurement don’t apply to it. You can’t pick up a piece of software and heft it in your hand. You can’t weigh it or pull out a tape measure to measure its length. And because software doesn’t come with the intuitions associated with conventional construction materials, it is very difficult to represent and measure progress within a software development project. A software project may be in a very similar position to the freeway-building project with three miles of road to complete in three weeks without it being obvious to interested outsiders.

There are no magical solutions to this problem, but there are techniques and tools that can mitigate some of the risk associated with tracking the progress of a software project.

The intuitive metric for measuring progress in a software project is the number of lines of code that have been written. If a project requires 10,000 lines of code, and the developers have produced 8000 lines of code, then the project is 80% complete, right? Well, maybe. Or maybe not, because not all lines of code are created equal. The developers may have concentrated on the infrastructure aspects of the project, producing lots of straightforward code to accomplish simple tasks. The remaining 2000 lines of code may represent the core functionality of the application, containing complex algorithms and tricky performance issues that require detailed study, analysis, and testing. In this case it wouldn’t be at all unusual to discover that the last 2000 lines of code take as much time to complete as the first 8000. (This is the old rubric that 20% of the work takes 80% of the time.)

In some cases, lines of code may even be a perverse metric that indicates the opposite of what is actually correct. If one engineer can solve a problem in 1000 lines of code, and another can solve the same problem in 100 lines of code that execute 10 times as fast then it would probably be a mistake to consider that the first engineer is more productive than the second. Or perhaps the engineers are equally skilled and productive, but one is working in assembler language while the other is working in a higher level language such as C++, in which case the latter would be able to implement similar functionality in far fewer lines of code.

The keyword here is “functionality.” The best technique for approaching this difficult issue lies in assessing progress in terms of the amount of functionality completed, not the lines of code completed. In simple terms, this involves deriving defined points of functionality from the system requirements and allocating a measurement of complexity to each function point. The result is a single metric that represents the total functionality of the system, with greater functional density associated with different function points depending on their complexity.

This technique is known as Function Point Analysis, and it was defined by Allan Albrecht of IBM in the 1970s [8]. Much work has been done on this topic since then, and there are now a number of recognized standards for performing this analysis [9]. Even if you choose to perform your analysis in a less formal way, it is still an invaluable technique for providing insight into the complexity of a project.

Let’s assume that you’ve done a Function Point Analysis on your new project, along with other aspects of your design, and you’re ready to set to work. Unfortunately, your problems are not over. In fact, they’ve only just begun. Setting to work on implementing your function points sounds great, but who is going to do the work? Which engineers will complete which functions? How do you keep track of which function points are completed, which ones are in progress, and which ones haven’t been touched yet? Keep in mind also that your iterative testing program will reveal design and implementation defects that will require correction by somebody at some time during the project.

The difficulties of managing this resource allocation and scheduling problem are substantial, as are the risks to your project by getting things wrong in this area. Happily, there are tools available to help you mitigate this risk. They go by a variety of names, including “bug tracking systems,” “issue tracking systems,” and “ticketing systems.” I prefer to use the term “ticketing system” because it’s a neutral term that avoids the misconceptions associated with terms such as “bug tracker.”

The name “ticketing system” comes from the use of cards known as tickets, which were originally used to allocate work to staff members in a call center and help desk organizations. When a customer called in a problem, the phone operator would fill out a paper ticket with details of the problem and then place it in a queue that was serviced by the responsible engineers and technicians. During the process of working on the task, the engineer would annotate the ticket with details of the job as required. When the task was completed, the ticket would be marked as completed and filed with all of the other completed jobs.

A modern ticketing system is a software implementation of that workflow, with the tickets replaced by database entries, and a software application providing a front-end to the database that allows for queries and updates to the entries. There are a number of ticketing systems that are specifically designed for use in a software development environment. These usually provide specific support for aspects of the software development life cycle, along with features, such as integration into VCSs and report generation systems.

The benefits of using a ticketing system (or whatever you prefer to call it) during a software development project can’t be exaggerated. If you use tickets to allocate all of the work at the beginning of a project, then the list of tasks resides with the tickets. If you want to know what you have completed so far, the list of completed tickets is available at the click of a button. The same applies to tickets that are the subject of work in progress, and work that is yet to be started.

Using a high-quality ticketing system to allocate tasks to engineers and to track the progress of those tasks through to completion is a hallmark of good software engineering. If you are developing systems without one of these tools, then you may as well work with one hand tied behind your back.

Life cycle

A current direction in the life cycles of software development is toward Agile development techniques as espoused in The Agile Manifesto. Agile development techniques are human centered [10]. They recognize the psychology, strengths, and weaknesses of the humans involved as both customers and developers in the software development process.

Agile techniques are particularly well suited to the sort of exploratory software development characteristic of applications that require heavy human interaction, such as web-based transaction processing systems. The software development effort, in such systems, often focuses on the implementation of complex graphical user interfaces (GUIs), which aim to resolve the issues associated with human cognitive strengths and weaknesses. In particular, people generally cannot assess the strengths and weaknesses of a GUI without operating the interface.

One of the particular strengths of this development methodology is that it recognizes that humans are visually oriented creatures. It concentrates, therefore, on prototyping and the rapid experimental development of user facing components of application software. Short cycles of rapid application development combined with intense user feedback and code refactoring can produce very high velocities of development when this technique is used by skilled Agile developers.

Agile development techniques might apply to the development of embedded software. In this domain, Agile development must proceed with great caution since embedded systems fundamentally differ from desktop software, as described in the first part of this chapter.

Agile techniques are diametrically opposed to the “Design Everything First” waterfall methodologies exemplified in the life cycle methodology described in DOD-STD-2167A. This was a large highly prescriptive, documentation-centric methodology promulgated by the US Department of Defense in the 1970s which mandated that development should be conducted as a series of discrete steps, with each step, such as system design, fully completed before beginning the next step [11].

While attractive on the face of it, this methodology fails to recognize the fundamental fact of human fallibility. People make mistakes. Designs will contain errors, and committing to the completion of the entire design before commencing development commits the eventual development process to the implementation of an error prone design. This means that design errors will be propagated through the rest of the development cycle and may not be found until system integration or test time. Fixing these problems at this time will be much more expensive than doing it earlier in the project. Also, being closer to the delivery deadline means that the corrections will be developed under much more stressful circumstances, with it being likely that the quality of the work will be lower due to schedule pressure.

So, the embedded system developer is faced with a conundrum. Highly interactive and highly iterative agile development methodologies with minimal up-front design work well in the development of human centered applications with complex GUIs where user requirements are very uncertain. Unfortunately, most embedded systems don’t fit this description, and the application of agile techniques to the development of embedded software can be very problematic.

On the other hand, the heavyweight sequential methodologies that require that everything be designed up-front can also be problematic because the development process chokes on a flurry of design and implementation defects that are only discovered very late in the development cycle.

Unfortunately, there is no magical technique or methodology that provides an answer to this problem because the solution lies in compromise and good judgement developed through experience. You can apply some fundamental ideas though. Don’t attempt to design every aspect of your system up-front before commencing development. Conversely, resist the temptation to begin coding before you have thought about your system and produced at least a basic design.

I am trying to caution against coding without thought or design and based on inadequate development of requirements. Even now I still hear of teams (not at my company!) who start work on systems by sitting at their computers and writing code without thought for requirement development or design.

Like Goldilocks and the Three Bears, you need to perform an amount of up-front design that is neither too much, nor too little, but just right. Your experience and good judgment determines what is “just right” for your project and your environment. In some situations, there’s no working around it and you will have to do very large amounts of highly detailed design work up-front, while in other cases a more exploratory approach may be entirely suitable.

Whatever the case may be, once the initial up-front design process is completed, commence a series of iterative development cycles. The phrase “design a little, code a little, test a little” best captures the essence of what is involved. The feedback process from the test phases is critical. You must recognize and correct design or implementation errors as early as possible, because such errors take more time and money to correct as the system gradually takes shape. This might be considered a hybrid approach between spiral and Agile development.

Finally, when you complete the development, test your entire system to ensure correct operation; this is validation. Given the testing phases embedded into the development cycles, this testing should actually be a confirmation of correctness as much as a search for errors.

Chapter 1 provides further information on life cycle models, including the waterfall model, the spiral model, and Agile development.

Tools and techniques

Real-time operating systems

Fairness is an admirable characteristic in most situations. People undoubtedly wish to be treated fairly in their dealings with other people, and they carry that desire over into their dealings with machines. General-purpose OSs are designed to allocate scarce resources between a variety of user and system level processes. It’s unsurprising that one of the primary goals of those systems is achieving a fair and equitable allocation of those resources between all of the competing processes that are running on the computer. Of course, “fairness” may be defined in a variety of ways. For example, an OS may be designed to allocate CPU resources fairly between computationally intensive background processes to achieve high throughput, at the expense of interactive users.

The primary driver for OS design is not a love of fairness on the part of the OS developers, but simply that OSs are general purpose in nature. The desktop operating environment is inherently chaotic and unpredictable. It’s highly unlikely that any two general-purpose computers have exactly the same characteristics in terms of applications and data files installed. This in turn means that the computational demands placed on the processors are almost totally unpredictable.

This unpredictability forces the OS designers to make very generic decisions about how their schedulers will allocate system resources between a multitude of competing processes. Fairness is a reasonable starting point when deciding how to slice the pie.

Embedded systems, however, represent exactly the opposite situation from that encountered on the desktop. Embedded systems are focused devices dedicated to the solution of a single problem. They highly constrain inputs and outputs and thoroughly define processing. Techniques appropriate to the desktop environment are prone to fail in the embedded world. Fairness as a resource allocation philosophy fails miserably in embedded systems.

Enter the RTOS. The primary distinguishing feature between a general-purpose OS and an RTOS is that an RTOS is not concerned with the fair allocation of system resources. An RTOS allocates each process a priority. At any given time, the process with the highest priority will be executing. As long as the highest priority process requires the use of the CPU, it will get it to the exclusion of other processes. This “highest priority first” scheduling philosophy is an extraordinarily powerful tool for the embedded system software developer. Used well, an RTOS can extract the maximum processing power out of a system with minimal overhead.

There are a large number of commercial and open source RTOS packages available to choose from, with a wide variety of features. The most compelling argument for adopting one of these systems is that they allow your developers to focus on their application domain, rather than on developing infrastructure that, while necessary, doesn’t contribute to your business goals.

For example, apart from their key task creation and scheduling functions, RTOSs also provide abstracted software layers and device drivers for processor bootstrap, interprocess communication, onboard timers, external communications ports, and peripherals of all varieties. Getting your expensive engineers to custom build this infrastructure rather than concentrating on your core business applications is almost certainly going to be more expensive and less efficient than acquiring a suitable product from people who are in the business of producing that infrastructure.

Design by Contract

Design by Contract (DBC) is a powerful software development technique developed by Prof. Bertrand Meyer. He espoused the concept in his book Object-Oriented Software Construction [12] and implemented it as a first-class component of the Eiffel programming language. Facilities for using DBC are now available in a wide variety of programming languages, either built into the language or implemented as libraries, preprocessors, and other tools.

DBC uses the concept of a contract from the world of business as a powerful metaphor to extend the notion of assertions that have been a common programming technique for many years. As in a business contract, the central idea of the DBC metaphor involves a binding arrangement between a customer and a supplier about a service that is to be provided.

In terms of software development, a supplier is a software module that provides a service, such as data through a public interface, while a customer is a separate software module that makes calls on that interface to obtain the data. A contract between the two modules represents a mutually agreed upon set of obligations that must be met by both the customer and supplier for the transaction to complete successfully. These obligations represent things such as valid system state upon entry to and exit from the call to the interface, valid input values, valid return types, and aspects of system state that will be maintained by the supplier.

There is a substantial difference between the conventional use of assertions and DBC. The old school use of assertions involves writing code and then adding assertions at various points throughout the code. As the name implies, Design by Contract involves incorporating assertions about the correctness of the program into the design process so that the assertions are written before the code is written.

This is a powerful idea. Creating contracts during system design forces the designer to think very carefully about program correctness right from the beginning. DBC provides convenient tools for guaranteeing that correctness is established and maintained during program operation.

Two standard elements for establishing DBC contracts are the REQUIRES() and ENSURES() clauses. These may be viewed as function calls that take a Boolean expression as an argument. If the Boolean expression evaluates to true, then program execution continues as expected. If the expression evaluates to false, then a contract has been violated and the contract violation mechanism will be invoked. You place one or more REQUIRES() clauses at the start of a function to establish the contract to which the function agrees prior to commencing work. You place one or more ENSURES() clauses at the end of the function to establish the state to which the function guarantees after it has done its work.

The key thing for you to understand is that DBC is intended to locate defects in your code. This might sound like a statement of the obvious, but it’s more subtle than it appears at first glance. The contracts that you embed throughout your code should establish the system state required for correct operation at that point of execution. They represent “The Truth” about the correct operation of the system. If a contract is violated, then a catastrophe has occurred and it is not possible for the system to continue to operate successfully. At that point, it is better for the system to signal a failure and terminate, rather than to propagate the catastrophe to some other point of the system where it may fail quietly and mysteriously, resulting in days or weeks of debugging to track backward from the point of failure.

DBC should not be used for validation of data received over interfaces, either from an operator or from another system. Normal defensive programming techniques should be used at the interface to reject bad data such as operator typographical errors. The difference here is between something bad that might actually happen under unusual circumstances versus something that must never happen at all. You should never invoke a DBC contract on something that could reasonably occur.

One way to look at your contracts is that they are testing for impossible conditions. You shouldn’t think, “I won’t test for that, because that will never occur.” You should think “I will test for that, because if that does occur, it indicates that there is a defect in the system and I want to trap it right now.”

So, the REQUIRES() statement says, “I require the system state to satisfy this condition before I can proceed. If the system is not in this state, then it is corrupt and I’ll halt.” Similarly the ENSURES() statement says, “I guarantee that inside this function I have successfully transformed the system state from where we were at the beginning, to this state. If this transformation has not been successful, then something has gone very wrong, the system is corrupt, and I’ll halt.”

You might look at ENSURE at the end of a function and think “Why should I test that? That pointer is never going to be NULL.” But the concern is that the pointer will never be NULL if the system is working correctly. What we’re catching with the ENSURE is the case where a maintainer (either you next week or somebody else in 10 years’ time) comes along, changes the system erroneously, and introduces a subtle bug that results in the pointer being NULL once every month, which causes the system to mysteriously crash.

To repeat: Ask yourself exactly what the system state should be at the start of a function for subsequent processing to succeed and then REQUIRE that to be so. At the end of the function, ask yourself to what state you’ve transitioned and ENSURE that you’ve successfully made that transition.

This process forces you to think more clearly about what you are doing and ensures that you have a much crisper mental model of your system inside your head during design and development. This leads to better design and implementation so that you avoid introducing defects in the first place.

An analogy I like is that DBC statements are the software equivalent of fuses in electrical systems. A blown fuse indicates that something is defective (or failed) while protecting your valuable equipment from an out-of-range value (i.e., too much current). A triggered DBC condition indicates that something is defective while protecting the rest of your system from that corruption.

Some final words of caution:

1. Nothing comes for free. Using DBC incurs an overhead caused by the computation necessary to evaluate each of the DBC clauses. In systems with limited amounts of spare processing power, it may be the case that the evaluation of the DBC clauses involves a prohibitive amount of overhead. In such cases, it may be necessary to disable DBC evaluation for release after system development and testing has been completed.

2. The expressions provided in DBC clauses must not have any side effects. If the expressions do have side effects and DBC evaluation is disabled for release, then the released version will operate differently to the debug version at runtime, which is obviously highly undesirable.

3. It can be very difficult to decide what action to take in an operational system if a DBC clause evaluates to false. When a DBC clause is triggered, it presumably indicates the detection of a defect within the system. The simplest option when this happens is to display or log appropriate errors messages and then terminate execution of the program. While this is a simple solution, it may not be feasible to follow this course of action in a working embedded system. For example, if the system under consideration is the engine management system of a motor vehicle, then terminating the program while the vehicle is traveling down a freeway at high speed is probably not appropriate. On the other hand, doing nothing may be an equally bad choice given the detection of a potentially serious defect within the system.

4. Realistically, the action to be taken by the DBC processing upon detecting a defect is a choice that can only be made by the developers based on the information they have about the nature of their system.

Drawings

I want you to visualize a three-dimensional solid object bounded by six square faces and facets or sides with three meeting at each vertex. Take as much time as you need to form an image of that object clearly in your head before you go on.

I’m pretty sure that after reading that sentence (perhaps a couple of times) and thinking about it a little you would realize that it described a cube. I’m also pretty sure that the process of reading that sentence and converting it into a mental image in your head took more than a few seconds. I’m 100% sure that if I’d replaced the sentence with a picture of a cube, you would have looked at the picture, and the concept of “cube” would have entered your thoughts in a fraction of second—much faster than it would take to describe in words. Furthermore, the image of the cube may have had substantial information content that would have been difficult if not impossible to describe in words. For example, the image of the cube may have been a precise shade of pastel blue, and the sides of the cube may have been polished smooth, or textured.

This little exercise conveys a profound insight, which is that the human visual processing system works best at image processing and pattern matching. It’s astoundingly good with words, but pictures definitely allow us to communicate some things far more effectively than we can with words. The result of this is that the information content and the speed of delivery of an image may be orders of magnitude greater than a written attempt to communicate the same information.

What’s this got to do with software development? Well, in the words of the old joke, a picture is worth 1024 words. When attempting to represent your systems, to model them and to explain them to others, you will undoubtedly need to use lots of words, but you may also use drawings to supplement the words.

This means that effective drawing tools are in important part of every developer’s toolbox. For many types of engineering, drawings such as block diagrams (also known as “boxes with lines and arrows”), a general-purpose tool such as Microsoft Visio, is more than adequate. For more complicated drawings, such as UML diagrams and state machines, it is more appropriate to use drawing tools that are dedicated to that particular task.

The state of the art in software system visualization may be found in Model Based Design (MDB) products such as IBM’s Rhapsody. This is much more than a drawing tool, as it allows for graphical system modeling using UML, followed by generation of source code in a variety of languages directly from the model.

Static source code analysis

Software tools called static source code analyzers perform detailed analysis of source code to detect common programming mistakes. The term derives from the fact that the tool analyzes the source files without actually executing the code, in contrast to dynamic analysis that requires code execution. One of the earliest static analyzers was the Unix “lint” program, which was reportedly named because of its role in picking “lint,” or potential defects, out of C language source code.

Many of the error detection features of the early analyzers are now implemented as standard components of modern optimizing compilers. A good example of this is the use of an uninitialized variable, which is a defect that is most compilers now detect and flag. Modern high-performance analyzers, however, go far beyond the simple statement-by-statement level analysis performed by Lint and similar tools. These newer tools can analyze across the entire code base of an application and detect errors that are not currently located by compilers alone. As with most things, you get what you pay for. Licences for these tools may cost as little as a few hundred dollars per user, or they may cost hundreds of thousands of dollars.

Example: Tool costs versus engineering time (by Kim R. Fowler)

I, and others in the industry, have seen a strange and illogical reticence by management to buy software tools. The reticence seems myopic because of the simple estimates and calculations can show how much tools can save a project in time and money!

Consider a US$5000 tool that analyzes software within a few seconds. An engineer might spend upward of a week in mind-numbing manual analysis of the same software. If an engineer has a loaded cost of US$100 per hour for salary and benefits, that week of manual analysis just cost US$4000. The breakeven point is less than nine days of engineering time. Furthermore, that is also a delay of a week that the tool could have done in seconds.

Like many automated systems, static analyzers have positive and negative aspects. The positive side consists of effectively having a tireless and super-precise language expert on your team who can rapidly and repeatedly examine your code and locate a wide variety of standard defects. Code reviews using these tools may be performed any time, at the click of a mouse button. The downside to the use of these systems is the lack of flexibility caused by removing human intelligence from the analysis process. The result is that the systems may be prone to the detection of “errors” that aren’t really errors, otherwise known as false positives (also called false alarms), and the failure to detect actual errors, which are known as false negatives.

False positives make it difficult to separate the wheat from the chaff. If there are too many false positives, the user may be overloaded to the point that actual errors are not discerned among a flood of unnecessary warnings. False negatives, on the other hand, are at least as dangerous as false positives because they can provide the developer with a false sense of security about the correctness of the code that has been analyzed.

Analyzer performance therefore lies somewhere along a spectrum. You usually adjust the sensitivity of the analyzer from one end of the spectrum to the other. One end of the spectrum involves maximum analytical sensitivity, in which all possible errors are detected, along with a potentially very large number of warnings about harmless cases. The other end of the spectrum involves minimum analytical sensitivity, in which few if any false positives are reported, as well as a potentially missing a large number of actual defects that are also not reported.

To reduce the number of false positives in most cases, my experience with static analyzers has been that it’s best to reduce the sensitivity and accept the occasional false negative. The reason for this is that an excessive number of false positives (i.e., false alarms) induces the “boy who cried wolf” syndrome in developers, with the result that they may miss the detection of actual errors, if the tool is even used at all. In simple terms, excessive false positives (i.e., false alarms) can result in a tool being used poorly or not at all. A tool that you regularly use, which detects 75% of errors, is better than a tool that detects 80% of errors but is not used because of the workload caused by excessive false positives (i.e., false alarms).

Note that this advice applies to most cases, but not every case. Deciding how to set the analytical sensitivity of a static analyzer must take account of the nature of the system under analysis. For a complex, high value, and high profile system with safety- or mission-critical aspects, you may best use maximum sensitivity; the cost of working through a large number of false positives is likely to be much less than the cost of a system operational failure that results in a high profile mission failure or loss of life. For less critical systems, you may reduce the sensitivity of the analyzer as described to reduce the costs of using the tool, and to ensure that it is actually used.

Review

Human beings are fallible creatures. Even under the best of circumstances people will make mistakes, and software development is an operating environment that is so far from “the best of circumstances” that it sometimes appears to have been designed to humble us by providing constant reminders of our imperfections.

The problem is caused by the level of abstraction involved in producing a piece of software. People perform at their best when they are doing, rather than thinking. I could toss you a pen, you could catch it without thinking. In fact, thinking about what you need to do to catch the pen would probably cause you to fail to catch it; the same applies to most physical actions that we perform on a daily basis such as walking, talking, and driving a car.

On the other hand, deliberate, extended, careful cognition related to complex concepts is something that humans can do amazingly well, but it isn’t natural and it isn’t easy. When that intense cognitive effort is devoted to an artificial activity that is almost entirely divorced from real-world physical concepts, then we find ourselves in a position where we are pushing the boundaries of what our minds are capable of achieving. So, we make mistakes. Lots and lots and lots of them. Egregious, colossal blunders, delicate errors of almost gossamer subtlety and complexity, and everything in between.

It seems like an impossible situation. How can we possibly achieve our desired goal of producing reliable software? We are inherently imperfect and prone to errors and we are doing something that is unnatural and difficult.

The first step involves lessons in humility and acceptance. Be humble. Accept that both you and your colleagues will make mistakes. Promote a “mistake friendly” environment, because after all “experience” is just another name for, “I recognize that mistake, because I made it before.”

The second step involves, you guessed it, developing a process. In this case, the process required to mitigate the risks associated with the introduction of errors into our systems involves the review of work products by other, qualified people.

I strongly believe that the single most effective tool for locating defects during all stages of the software development process is peer review.

The reason I believe this so strongly is that my experience over many years has been that people are their own worst enemies because they don’t like to find their own mistakes…so they often don’t. This goes back to the lesson in humility that I mentioned earlier. Most engineers are highly intelligent, highly educated, and capable people. We invest a lot of ourselves, our pride and our ego, into the things we make, so at a deep level it is psychologically very difficult for us to acknowledge that something we have worked hard to produce is imperfect, flawed, or broken in some way. The result of this is that we are frequently blind to our own mistakes because deep in our hearts we aren’t humble enough and we really don’t want to find those mistakes.

The people sitting next to us, on the other hand, have none of their ego or personal identity involved. Your colleagues will, in a general sense, want you to succeed in your efforts so that your team and your organization as a whole will prosper, but their pride and sense of self-worth comes from their involvement with their work products, not yours. This means that your colleagues are able to view your work dispassionately and without any of the psychological baggage that you bring to it. They will be able to quickly find flaws and suggest improvements that you would struggle to locate, if you could locate them at all.

Ultimately it comes down to a very simple choice. You can get your colleagues to find the flaws in your work or you can get your customers to find them. The first option is simple, effective, and very rewarding. The second option is often disastrous.

In contrast to all this talk of pride in workmanship, there is also another deeply human characteristic that is mitigated by the prospect of peer review. That characteristic is, to put it bluntly, laziness.

Even the most diligent of engineers is subject to this temptation. It would be a very rare individual who hasn’t at some time thought, “It’s Friday afternoon and I’m tired so I’ll just hack something up here. Nobody will ever know.” The last sentence in that little trail of thought is the critical one. If it was indeed true that nobody would ever know about the quick and dirty hack, then it would be very easy to submit to this temptation. However, in an environment of high-quality peer review, it makes no sense to do so. The train of thought becomes, “It’s Friday afternoon and I’m tired, but if I just hack something up here, the team will pick it up at the review next week and I’ll have to do it again, as well being embarrassed.” This train of thought has a better outcome.

There’s no doubt that reviews need to be performed, but what should be reviewed? In my view, and in an ideal world, all work performed during the development process should be subject to peer review. Requirements, design, implementation, testing, release, and installation. Everything! Unfortunately, few of us live in an ideal world. It generally takes longer to create something than it does to review it, but having every work product independently reviewed in detail by one other person would almost double the size of the team required to do the work. Shrinking budgets and tighter schedules mean that very few organizations are able to devote resources of this magnitude to their review activities.

The best bang for your buck is gained by performing review at all stages of the development cycle, high-quality review of a design is almost pointless if the subsequent implementation is not also subject to review; it could result in a great design that is poorly implemented. Keep in mind also that work doesn’t stop on most nontrivial applications once Version 1.0 has been delivered. The processes applied to producing a great product should also be applied to maintaining the quality of the product throughout its life span.

I mention this because of cases I have seen where development organizations have completed projects and then focused all their attention on the next project, to the detriment of the older products. The new work is more interesting than fixing defects in other people’s code and the attitude that, “It’s only maintenance, so we’ll let the intern handle it” has developed, with the result that the quality of subsequent releases diminishes.

Building a reputation for producing high-quality products takes a long time, but that reputation can disappear in a day. You should take care to ensure that this doesn’t happen. It may not be glamorous, but applying good processes, including extensive peer review, is just as necessary after the initial delivery of a product as it is during development.

As well as spreading the effort of performing reviews throughout the development cycle, you should allocate the greatest effort first to the areas of greatest risk and complexity within the design. This doesn’t mean though, that less critical areas of the code shouldn’t be subject to scrutiny if there are resources to do so, as it’s not unusual to find a high density of defects in such code. This can be for a variety of reasons such as having less important code being produced by less experienced developers, or simply because it was developed in haste.

Case Study: Review and tools

There is no doubt that performing effective reviews of other peoples output falls into the category of “hard work.” This is particularly the case when reviewing source code, which is renowned as being a difficult and tedious task. There are a variety of techniques for performing code reviews, which historically have been variations on the theme of getting a group of colleagues to read the code and then getting together in a room to work through it line by line on hardcopy listings or on a screen.

Thankfully, software tools have recently been developed by both commercial organizations and open source teams that can increase the productivity of code review teams by an order of magnitude. These code review collaboration tools typically provide a means such as a browser interface for a group of developers to share a common view of a set of source modules. Along with the shared view of the code comes the ability to interactively annotate the code in a similarly shared manner. The result is that widely distributed development teams are able to generate rapidly shared threads of comments about aspects of the code in a manner similar to the threads of conversation on an Internet blog.

My experience with these tools has been that they minimize the overheads and tedium of performing reviews while allowing those involved in the reviews to perform their work with great flexibility and efficiency.

An effective peer review process can be a key component for bringing a group of individual workers together and turning them into a cohesive and supportive team. Unfortunately, the converse can also apply. A poorly implemented review process can be divisive and disruptive, and even result in reduced productivity. Unsurprisingly, human nature is once again the source of the problem. (Again, see Chapter 2 for support and evidence—Ed.)

Everybody likes being praised. Nobody likes having their failings pointed out. It hurts even more when your foul-ups are diligently noted and dissected in front of your peers, who are some of the people you want to impress most in this world with your expertise and professionalism. Therefore, reviews are inherently a threatening experience at a fairly deep level. Reviews are potentially very uncomfortable for those on the receiving end. From a management perspective, this means that implementing a review process can be very difficult because the people who will benefit most from the process are those who are most likely to resist it. I know of several cases where managers within organizations have attempted to implement review processes and failed due to the resistance put up by the employees. In all of those cases, the employees and the organizations are poorer for the lack of review.

Understanding the extreme sensitivity of this issue is critical to a successful review process implementation. It’s unlikely that it will ever be easy sailing, but there are some things that can be done to increase the prospects of success.

The primary reason for introducing a review process is to increase product quality, which translates to a reduced number of defects in the released code. You need to sell the engineers involved in the process on this. The best selling point is that at the end of the day, the credit for producing a high-quality piece of software goes to the developer, not the reviewers. In other words, the reviewers are there to make the developer look good for free. The engineers who embrace this attitude are most likely to be the ones who are smiling at bonus time.

Nobody is ever going to end up smiling though, if they feel like they’ve been personally attacked in a review. Reviews must be entirely professional and impersonal. Comments have to be about the product, not the person, because there’s a world of difference between a comment like, “Reference to uninitialized variable” and “What sort of jerk forgets to initialize memory before using it?”

Senior engineers must embrace the process and participate to set an example. Allowing senior staff to be exempted undermines the process. People still make mistakes no matter how senior they are and junior staff members will be well aware of this. The senior staff members should also be engaged in ensuring the general quality of the process.

Finally, ensuring that reviews are taking place and that the process is effective and productive requires continuous management oversight. Proper management oversight will ensure that reviews are conducted in a spirit of mutual cooperation to maximize their benefits and minimize the possibilities for disruption and ill-feeling between staff members.

Effective peer review of work products really is the silver lining in the cloud of human fallibility that casts its shadow over our attempts to develop high-quality software. I view it as the one great marker of process maturity within a development organization.

Case Study: Ensuring that reviews happen, and that they are effective

When I started introducing more comprehensive reviews into my current team, I had to consider how I could ensure effective management of the process. I wanted to ensure that reviews were actually taking place, I wanted to ensure that they found problems effectively, and I wanted to ensure that they were personally positive rather than destructive for the people involved.

The approach I took was to mandate that I be invited to participate in every review, with the understanding that I would actually only be able to participate in a small number of them.

The result of this is that simply by looking at the invitations, I can see what is being reviewed, and by whom. Occasionally I actually participate in a review. This allows me to contribute to the review at a technical level, but more importantly it allows me to evaluate the effectiveness of the ongoing operation of our review process. I can assess whether all staff members involved are participating effectively or not, and I can ensure that the communication with the review is professional, courteous, and supportive.

This simple technique allows me to perform regular reviews of the review process itself, ensuring its ongoing quality and benefit to the organization.

Chapter 13 gives more detail on implementing reviews and review processes.

Test, verification, and validation

The test regime for your system is the final barrier that stands between your defects and your customers. If you test your products effectively then those memory leaks, intermittent crashes, communication failures, and other embarrassing flaws that weren’t picked up by all of your other processes will not enter public consumption. If you don’t test your products effectively, then if you are very lucky you will end up with some egg on your face and nothing more. If you are not lucky, which is quite likely, then your organization will surely suffer financial loss and the loss of reputation among your customers. In the worst case, someone may be injured or killed as a result of your poor product testing.

One of the principles of military defensive operations is that defenses should be deep, consisting of multiple, mutually supporting layers. The idea here is that even if an enemy breaks through one layer of defenses, they will be stopped by one of the subsequent layers.

The same principle applies to software system testing processes. You need to have multiple, mutually supporting layers of testing strategically located throughout your development process so that a defect that slips through one layer of testing is caught by a subsequent layer. Failing to do this was the cause of many problems for projects that used waterfall development life cycle models. Detailed testing didn’t commence until late in the project when most of the development work was completed. If the tests weren’t perfectly comprehensive, and they usually weren’t, then quite serious errors could slip through the tests into the released products.

The first line of defense consists of unit tests. As the name implies, unit tests are performed on low-level functional software units, which are typically individual functions within a module. These tests typically ensure the correct operation of a function bypassing in a series of known values to the function, and ensuring that the function correctly transforms the system state and returns the correct value according to the requirements that the function was built against.

There are very useful unit test frameworks available for most programming languages that allow for the automatic generation and operation of extensive unit test. These frameworks increase productivity by reducing the workload associated with creating, executing, and evaluating the results of unit tests.

An interesting paradigm that has evolved from the Agile Development Movement is Test Driven Development (TDD) [13]. The essence of this idea is that you should write unit tests before writing the application code, then the aim of writing new application code is to force failing unit tests to pass. Developing code in this manner has a number of benefits, the primary one being that it actually forces developers to produce and maintain their unit tests early. This means that the tests are then always readily available at any time to verify the basic functionality of the system and guard against the introduction of defects into previously working code.

The next line of defense consists of integration tests, which verify the correct interoperation of different independent modules within the system. Integration tests confirm that the modules on either side of an interface specification agree about how the interface should work and then process the interface correctly.

Integration testing frequently produces surprising results with the culprit usually being the imprecision of the English language. It’s surprisingly easy for two groups of people to read the same interface specification and come up with completely different interpretations of various areas of the specification. Different specifications mean that various components of the interface don’t work together properly on the first try. Because of this, it’s extremely important that integration testing thoroughly exercises all areas of the interface.

The third line of defense is system level testing. In the case of most embedded systems, this type of testing isn’t specifically software related. The point of system level testing is to forget about how the system functionality is implemented and simply test the system as a whole according to its requirements. At this level of testing, the system could be built from software, programmable logic, discrete components, optics, and mechanics.

System level testing typically revolves around the related concepts of Verification and Validation (V&V). The process of system verification involves ensuring that a system has been constructed according to all necessary regulations, specifications, requirements, and other conditions that may have been imposed on the construction on the product. The process of system validation involves ensuring that a system has been constructed in a manner such that it meets the needs and intent of its customers.

Verification is objective; it quantitatively tests the metrics of the requirements. Validation can be more subjective, “Does the system meet the customer’s intent?”

In simple terms, validation asks, “Are we building the right thing?” and verification asks “Are we building the thing right?” It’s important to understand the difference between these terms, because to be successful you need to be able to answer yes to both of those questions. If you build something correctly according to all regulations and specifications then it will verify correctly, but if it doesn’t pass validation then it simply isn’t what your customer wants, and you may well end up back at the drawing board.

An important aspect of the V&V process is that it is most effectively performed by an independent group who have not been involved with the development of the product. This ensures that the results of the V&V process are not subject to the natural human biases of those who have labored to produce the product.

Chapter 14 gives more details on implementing test and integration.

Conclusion

The military radar systems that my team and I work on are used to shoot supersonic missiles out of the sky with other supersonic missiles, which is not a trivial task. The systems are incredibly complex and the task they achieve is equivalent in difficulty to using a rifle to shoot a bullet out of the sky with another bullet, with the consequence of failure being the deaths of hundreds of people. But we don’t fail. The software embedded in our radars works reliably and well, and the reason for this is that the code was developed using the simple and effective techniques described in this chapter.

These techniques have worked for me, and whatever your application domain may be, applying these techniques to your embedded software will help you to build high-quality embedded systems that are delivered on time and on budget.

References

1. Ganssle J, Barr M. Embedded Systems Dictionary San Francisco, CA: CMP Books; 2003; pp. 90–91.

2. Leveson NG. Safeware: System Safety and Computers Boston, MA: Addison-Wesley; 1995; (Appendix A).

3. ISO 11898 Controller Area Network (CAN).

4. IEEE Software Engineering Body of Knowledge (SWEBOK). <www.swebok.org>.

5. Scott R. In the Wake of Tacoma: Suspension Bridges and the Quest for Aerodynamic Stability Reston, VA: ASCE Publications; 2001.

6. Vaughan D. The Challenger Launch Decision: Risky Technology, Culture and Deviance at NASA Chicago, IL: University of Chicago Press; 1996.

7. ISO/IEC 12207 Systems and Software Engineering—Software Life Cycle Processes.

8. Albrecht A. Measuring Application Development Productivity, Proceedings of IBM Application Development Symposium IBM Press 1979; pp. 83–92.

9. ISO/IEC 24570, 2005 Software Engineering—NESMA Functional Size Measurement Method Version 2.1—Definition and Counting Guidelines for the Application of Function Point Analysis.

10. K. Beck, et al., Manifesto for Agile Software Development. <www.agilemanifesto.org>.

11. DOD-STD-2167A. Military Standard: Defense System Software Development United States Department of Defense 1988.

12. Meyer B. Object—Oriented Software Construction Upper Saddle River, NJ: Prentice Hall; 1997.

13. Beck K. Test-Driven Development: By Example Boston, MA: Addison-Wesley Longman Publishing Co., Inc.; 2003.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.140.186.201