25 Reverse Engineering

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

CAST	Certification Authorities Software Team
COTS	commercial off-the-shelf
EASA	European Aviation Safety Agency
FAA	Federal Aviation Administration
LAL	less abstract level
LLR	low-level requirement
MAL	more abstract level
PDS	previously developed software
RE	reverse engineering
SQA	software quality assurance

This chapter defines reverse engineering, identifies some issues related to it, and provides high level recommendations for how to reverse engineer the life cycle data required to satisfy DO-178C objectives. This chapter is closely related to Chapter 24, since reverse engineering is recognized in DO-178C as an alternative method for generating life cycle data, particularly for previously developed software (PDS). As with other subjects throughout this book, the topic is covered with a focus on safety-critical software in the civil aviation domain. The concepts may also apply to other safety-critical domains.

Note that much of this chapter concentrates on reverse engineering software life cycle data (requirements and design) from source code, since this tends to be the most common application of reverse engineering in the aviation software industry. However, the concepts may be applied to other scenarios, such as starting with object code or documenting missing system requirements.

25.1 What Is Reverse Engineering?

The definitions of reverse engineering vary. DO-178C defines it as: “The process of developing higher level software data from existing software data. Examples include developing Source Code from object code or Executable Object Code, or developing high-level requirements from lowlevel requirements” [1].

DO-178C section 12.1.4.d explains that reverse engineering may be used as an approach to upgrade a baseline: “Reverse engineering may be used to regenerate software life cycle data that is inadequate or missing in satisfying the objectives of this document [DO-178C]. In addition to producing the software product, additional activities may need to be performed to satisfy the software verification process objectives” [1].^*

Roger Pressman defines reverse engineering for software as: “The process of analyzing a program in an effort to create a representation of the program at a higher level of abstraction than source code. Reverse engineering is a process of design recovery” [2].

The Certification Authorities Software Team (CAST)^† paper CAST-18 explains reverse engineering as follows:

Reverse engineering is an approach to generating software life cycle data that did not originally exist, cannot be found, is inadequate, or is not available in order to satisfy the applicable DO-178B/ED-12B objectives. However, it is not just the generation of the relevant software life cycle data, but a process of assuring that the data is correct, the software functionality is understood and documented, and the software functions (performs) as intended and required by the system. It involves recovery of requirements and design, as well as conducting the relevant verification activities to the appropriate level to ensure the integrity of the software, to ensure all software life cycle data is available and correct, and that an appropriate level of design assurance is achieved [3].^‡

The Federal Aviation Administration (FAA) research report, entitled Reverse Engineering Software and Digital Systems, by George Romanski et al. states:

Reverse Engineering (RE) is a class of development processes that start with detailed representations of an implementation, and apply various techniques to produce more generalized, less detailed representations. The goal is to have more abstract representations that can be used to understand and reason about the structure and the intent of the more detailed representations [4].

25.2 Examples of Reverse Engineering

There are numerous examples where reverse engineering may be or has been applied, including the following:

Commercial off-the-shelf (COTS) software, such as real-time operating systems or vendor-supplied libraries.
Software originally developed to another standard (e.g., military or automotive standard), such as an engine controller, a controller area network bus driver, or a flight control component.
Open source software, such as the runtime libraries for the GNAT open source compiler for the Ada programming language, Linux operating system, or Xen hypervisor.
Existing software, which after years of maintenance has become too fragile to upgrade or fix.

25.3 Issues to Be Addressed When Reverse Engineering

The certification authorities have identified common issues surrounding reverse engineering in CAST-18. The CAST-18 focuses on reverse engineering from source code. These issues have also been noted in project-specific FAA issue papers, when reverse engineering is proposed. The common issues are explained here [3,5].^*

Issue 1: Lack of a well-defined process. The process of reverse engineering software artifacts must be organized and well defined. Too often, reverse engineering is used to compensate for poor development practices and does not follow a documented process. If reverse engineering is used, the processes, activities, transition criteria, and strategies for satisfying the DO-178C objectives must be documented in the plans and standards.

Issue 2: Failure to justify how DO-178C objectives are satisfied. Certification authorities have frequently observed that when companies propose reverse engineering as a life cycle model, they do not adequately explain how the DO-178C objectives will be satisfied [3]. Per CAST-18: “Reverse engineering should be used cautiously and only in well-justified cases (i.e., for a project that has been used in a number of applications and has shown itself to be of high integrity). The use of reverse engineering in new software development is strongly discouraged by the certification authorities” [3].

Issue 3: Lack of access to experts and original developers. When life cycle data are missing, it is often necessary to access the original developers in order to understand their thought process. Source code can be difficult to decipher, especially if it has few or no comments and is written for optimized performance. Per CAST-18, the most successful reverse engineering projects are those with access to the original development team, particularly when clarification is needed for ambiguous or difficult areas [3].

Issue 4: Complex or poorly documented source code. Unless strict coding standards are enforced, the source code for many COTS products is difficult to read. Having examined source code for several COTS real-time operating systems, I know firsthand how challenging the source code can be. The source code is often filled with complex data structures and pointers; and to top it off, it contains minimal comments. The source code issues may happen because the code was not developed for a safety-critical environment or because it was a quick prototype that was never intended to be used as the final product. Poorly documented source code makes it difficult to assess the code’s intended function and to ensure that the reverse engineered requirements and design are adequate. CAST-18 summarizes it well: “A thorough understanding of the code is essential to successful reverse engineering. Poorly documented or complex code is not a good candidate for reverse engineering” [3].

Issue 5: Abstraction difficulties. When reverse engineering design and requirements from source code, it is difficult to achieve the appropriate level of abstraction. Pressman explains: “Ideally, the abstraction level should be as high as possible” [2]. It is quite difficult to go from low levels of abstraction (such as code) to higher levels of abstraction (such as requirements). There are several consequences of not achieving the right level of abstraction—two are noted. First, when not performed properly, the design and requirements closely resemble the source code. The testing performed against such requirements provides very little value, since it does not evaluate the intent of the software (the what) but rather its implementation (the how). Essentially such testing just proves that the code is the code—it doesn’t prove that the code does what it is supposed to. Second, without proper abstraction, unwanted functionality may exist in the source code which is not visible at the system level. When there is a large gap in the granularity between the system requirements and the software requirements, it is difficult to confirm the completeness of the system-level requirements.

This is a good time to raise the pseudocode alert. Reverse engineered projects often represent the low-level requirements (LLRs) as pseudocode that looks almost exactly like the code. That is bad enough, but to make it even worse, such projects sometimes attempt to use the testing against the pseudocode to satisfy the DO-178C structural coverage analysis objectives. This makes the structural coverage analysis virtually useless. See Chapters 7 and 8 for more information on pseudocode.

Issue 6: Traceability difficulties. Traceability is closely related to the abstraction issue. If the levels of abstraction are not properly established, there are two potential tracing issues that seem to evolve. Both are symptoms of the developers not really understanding what the software does. First, the tracing can be brute-forced; that is, traces are added because the code or design has to trace to something. Links are added based on similar words rather than a solid understanding of the software. Second, requirements may be identified as derived to avoid the need to trace. Both of these tracing issues can mask unwanted functionality that exists in the code.

Issue 7: Certification liaison problems. The certification liaison process is often not well executed in reverse engineering projects. Sometimes, reverse engineering is not identified in the plans and is not coordinated with the certification authorities [3]. Far too many times, I’ve reviewed a set of plans that indicate the project is using a waterfall life cycle model only to find when I assess the actual data, that it is really reverse engineered, without a plan.

25.4 Recommendations for Reverse Engineering

In order to proactively address the noted issues, the following recommendations are offered. Some of these recommendations are not necessarily unique to reverse engineering.

Recommendation 1: Evaluate and justify appropriateness of reverse engineering. Before launching into an reverse engineering effort, it is important to evaluate the appropriateness of the approach. Mature, well-proven code that has extensive service experience (such as the examples included in Section 25.2 earlier) may be a suitable candidate for reverse engineering. However, it may be difficult to justify reverse engineering for prototyped code used to validate the requirements. Some projects attempt to use their rapid prototype code as is, and write the design and requirements to match. This approach is not recommended, since the maturity and stability of the code and its design are uncertain. As explained in Chapter 6, prototype code may be used as input to the requirements, design, and final code. However, the use of prototype code as an input to other processes and life cycle data should be tempered since this code may not have a clean, robust, or even safe architecture. In my experience, reverse engineering from prototyped code typically costs more and takes longer than if the code were simply rewritten.

Recommendation 2: Be honest when reverse engineering is being utilized. Many projects that I’ve reviewed over the last 10 years have used some form of reverse engineering; however, most never admit it in their plans. If reverse engineering is going to be used effectively, it must be planned and properly implemented. It should be clear why it is being used, how it will be implemented, and how it will satisfy the DO-178C objectives.

Recommendation 3: Perform a complete gap analysis. Before committing to a reverse engineering effort, it is important to do a complete gap analysis on the existing data to determine which DO-178C objectives are satisfied and which are not. Often, a quick survey is performed and the schedule is built on the results of that survey. But, a few months into the project, it is discovered that there are many more holes than were originally identified. To avoid this risk, assemble a team of qualified engineers (a Tiger Team or an A-Team) to thoroughly analyze and identify both the data and process gaps. The team should include talented and experienced engineers who possess technical experience through the entire life cycle, domain knowledge, and DO-178C experience.

Recommendation 4: Document the life cycle, processes, and transition criteria. Reverse engineering is a life cycle and it should be documented in the plans as any other life cycle is. The phases of the development and verification effort should be defined, including the inputs, entry criteria, activities to be performed, outputs, and exit criteria for each phase. For example, Figure 25.1 shows a generic process for a reverse engineering phase. The figure shows a less abstract level (LAL) that is used as input to develop a more abstract level (MAL). It illustrates that the LAL is reviewed prior to developing the MAL (to ensure the quality of the LAL and compliance to a set of standards, such as the coding standards). Once the MAL is created, it is also reviewed. And, then both the MAL and LAL are reviewed together. It should also be noted that the generic process shows the existence of change requests. The process does not assume that the LAL is perfect, hence avoiding the code is king phenomenon (i.e., assuming the code is perfect).

Images

Figure 25.1 Generic life cycle for different abstraction layers. (From G. Romanski et al., Reverse engineering software and digital systems, Draft report to be published by FAA/DOT Office of Aviation Research, Washington, DC, October 2011. Used with permission of the author.)

Recommendation 5: Document detailed procedures and standards. Implementing a successful and repeatable reverse engineering process requires detailed procedures, checklists, and standards. For example, if the reverse engineering project is starting with source code, the original coding standards may be inadequate or missing altogether. Therefore, appropriate coding standards will need to be generated and the code reviewed against those standards. There may also be specific procedures developed for managing changes to the code. In particular, changes will need to be coordinated with all appropriate entities and approved prior to implementation.

Recommendation 6: Coordinate plans and get agreement. As with any project seeking approval by the certification authorities, the plans and standards should be reviewed, internally approved, coordinated with the certification authority, and approved by the certification authority. A reverse engineering effort may require additional coordination given the earlier mentioned challenges. Be sure to take that into account. As noted in Chapter 5, the earlier the plans are coordinated and agreed with the certification authority, the better.

Recommendation 7: Coordinate with multiple stakeholders. Depending on the nature of the project, there may be multiple stakeholders, including the systems team, the original developers of the software, the customer (or maybe even multiple customers), the reverse engineering team, etc. It is important to ensure that all stakeholders are identified, informed, and performing their tasks as expected. Projects with multiple stakeholders should do the following [4]:

Clearly identify roles and responsibilities.
Identify and coordinate the processes used.
Describe, coordinate, and verify the configuration management between stakeholders.
Coordinate problem and fault tracking between all stakeholders.
Control and track information flow between the stakeholders.
Ensure that the stakeholders have necessary expertise to carry out their responsibilities.
Ensure that communication between stakeholders is unencumbered.

Recommendation 8: Gather all existing data and apply software configuration management, including change control and problem reporting. The existing artifacts should be baselined and under change control prior to implementing the reverse engineering process. For example, the source code and user’s manuals, requirements documents, or design data used as input to the development effort should be captured, baselined, and controlled. Any changes should be handled through the change control process (using problem reports and/or change requests).

Recommendation 9: Involve software quality assurance (SQA) and certification liaison personnel. As with all projects, it’s important to involve SQA and certification liaison personnel. For a reverse engineering effort, this is particularly important. Because reverse engineering is considered a higher risk solution, the certification authorities often provide project-specific guidance (e.g., FAA issue papers or European Aviation Safety Agency (EASA) certification review items). Early and continual involvement of SQA and certification liaison personnel helps to proactively address these concerns.

Recommendation 10: Use a technically strong team with domain expertise. Reverse engineering is not a job for junior engineers. Because it involves the creation of more abstract data from less abstract data, reverse engineering requires engineers with technical expertise and domain knowledge. It would be difficult (or maybe impossible) to have a flight management system engineer successfully reverse engineer a real-time operating system. Likewise, an otherwise competent engineer who hasn’t experienced the full development life cycle on multiple projects is not a good candidate for such an effort. In my experience, the success of reverse engineering projects is directly proportional to the experience of the engineers implementing them. Good reverse engineering requires a multidisciplined team with strong cognitive ability and good communication skills in order to generate the life cycle data. Tools can help, but the success of the project will depend heavily on the quality of the engineers.

Recommendation 11: Consult with the original developers. If it is possible, it is extremely valuable to coordinate with the original software developers. Some may still be available and their insight can make the difference between success and failure. They may not be able to take an active role in the project, but even a small amount of quality time with them throughout the project is beneficial. This communication significantly improves the understanding of the software and the quality of the artifacts. Since there may be few opportunities to consult with the original developers, use the time wisely. Generate a list of specific questions and ensure the answers are fully understood. It should be noted that the FAA’s research on reverse engineering considers access to the subject matter experts (preferably the original developers) as a necessity [4].

Recommendation 12: Strive to thoroughly understand the software functionality. When reverse engineering, it is important to comprehend the software’s functionality and behavior. In fact, reverse engineering can be seen as a behavioral discovery process [4]. It is imperative to consider the big picture (what the software does)—not just the low level details.

Recommendation 13: Think top-down. Having evaluated dozens of projects, I can typically identify a reverse engineered effort, even if the plans indicate otherwise, because the top-down view is incomplete. That is, when doing the top-down requirements threads (from system requirements to software requirements to design to code), there are disconnects and missing implementation details. The bottom-up view might look good (e.g., all of the code traces to LLRs and all the LLRs trace to high-level requirements), but the top-down view is rough. The main problem I find is that the traces do not show that the requirements are fully implemented. Even when developing the data bottom-up, it is important to keep the top-down view in mind. The FAA’s research report on reverse engineering explains the need to “emphasize that going bottom up can’t guarantee that all of the desired features at the system level will exist” [4]. There needs to be some “means of introducing real system intent” and establishing the “completeness of the reverse engineered perceived system intent” [4].

Recommendation 14: Evaluate robustness. When reverse engineering, particularly from code, it is important to evaluate the robustness of the software. The PDS or prototyped code may not have been developed to be robust in the presence of unexpected or abnormal inputs. Any missing robustness functionality should be identified in a problem report and properly addressed.

Recommendation 15: Develop appropriate levels of abstraction. As previously discussed, reverse engineering involves developing MALs of data from LALs of data (e.g., design from code, and high-level requirements from design). Developing MALs of abstraction is difficult. It is hard in a forward engineering project and even tougher when reverse engineering. It is challenging to go from more detail to less. “Careful consideration should be given to the difference between abstraction levels, to ensure that there is sufficient intellectual value added to demonstrate a thorough understanding of the two representations being traced and verified” [4]. This is where experienced engineers help. When an inexperienced or unqualified engineer does the reverse engineering, the LLRs look like the code.

The FAA research report states it well:

The difference in the abstraction levels of the MAL and the LAL should strike a balance between the difference between each level of abstraction, and the number of levels of abstraction. The MAL should provide a representation that specifies the intended behavior of the LAL.

A re-statement of the intended behavior at the same or similar level using a different notation is only useful to check that the transformation was correct, not the intended behavior.
If the abstraction level between the MAL and LAL is too large, there will be less confidence that the process is repeatable with the same or equivalent results.

This issue also applies to forward engineering, but there may be a temptation for engineers to develop low-level requirements from code that specify how the code works rather than the intended behavior. This may then result in the difference between the abstraction level of the lowlevel requirements and the high-level requirements being too large [4].

Recommendation 16: Proactively trace the data. Bidirectional traceability is important for both forward and reverse engineered projects. It is especially important in reverse engineered projects where some higher level artifacts (e.g., system or software requirements) exist, and the intent is to develop consistent lower level artifacts (such as design and code). That is, lower level data is being reverse engineered to be compliant with higher level data. The bidirectional tracing will help to ensure consistency in both the top-down and bottom-up views.

Recommendation 17: Look for errors and plan for changes to code. One of the most frequent blunders in reverse engineering is to treat the code as king or as golden (i.e., to consider the code perfect). A good reverse engineering process realizes that the code may have issues. It may have unneeded functionality, lack robustness, be poorly commented, or be overly complex. It could even have some logic or functional errors. A survey performed as part of the FAA-sponsored reverse engineering research indicated that most of the problems raised during reverse engineering effort were raised against the source code (71%) [4]. The FAA’s report also noted that the majority of the source code errors are found using manual processes: “The striking observation is that the most prolific error detection method when performing RE [reverse engineering] is manual analysis, which is performed as a development process and producing LLRs [low-level requirements] and establishing the traceability between LLRs [lowlevel requirements] and source code” [4].^* The FAA research report also warned that “if a working code base is reverse engineered, the fact that the code base is working could lead to a false sense of confidence that the software is correct which might result in a less thorough investigation or care in developing the RE artifacts” [4]. The reverse engineering process should look for errors, weaknesses, complexities, ambiguities, etc. in the code that might impact safety, and then proactively address the issues (e.g., create a safe subset of the code).

Recommendation 18: Look for holes. When putting together all of the various abstraction levels, it’s important to look for and identify missing pieces. That is, search for errors of omission—functionality that should be in the code (and requirements) but is not. Also, look for inconsistencies between the various abstraction levels, including missing or erroneous traces, incomplete system or software requirements, erroneous design, and unused code. It is more challenging to identify what is not there, than it is to review what is there. This is another case where experienced engineers can provide great insight—having seen multiple projects, they know what to expect and what may be missing.

Recommendation 19: Document problems as they are identified. When reverse engineering, it is important to document potential and actual problems when they are observed. Otherwise, they may be forgotten or overlooked. There are several ways to do this. One is to keep an investigation list, which is periodically analyzed. Legitimate issues then get rolled into problem reports.

Another approach is to open a problem report for each issue. If it ends up not being a problem, the issue can easily be cancelled or closed with no action. As noted in Chapters 8 and 10, it is important to identify the person who discovered the problem, as well as the engineer who is most knowledgeable about the topic.

Recommendation 20: Address the issues. Issues that are classified as valid problems must be addressed. Some common issues found are uninitialized variables, pointer issues, inconsistent data definitions, inconsistent use of data (types, units), incorrect algorithms, inadequate integration of modules, immature startup design (warm and cold), failure to address all functional scenarios, incorrect order of events, missing built-in tests, and data passed but not used [6].

Recommendation 21: Validate requirements. DO-178C assumes that system requirements allocated to software are validated. Therefore, DO-178C doesn’t require requirements validation (ensuring the requirements are complete and correct). However, when the system requirements are reverse engineered, they require validation—to ensure they are the right requirements. The system requirements should be validated before reviewing the software requirements, design, and code. Additionally, any software requirements that get classified as derived (do not trace to system requirements) need to be validated.

Recommendation 22: Perform forward verification. In order to satisfy the DO-178C objectives, any reverse engineered development data need to be forward verified. That is, bottom-up development is acceptable; however, top-down verification is still needed. In other words, review high-level software requirements against system requirements, review low-level software requirements against high-level software requirements, review source code against LLRs, etc. Forward verification also confirms the top-down consistency.

Recommendation 23: Forward engineer once a solid requirements baseline is established. Once a solid baseline with supporting life cycle data exists, the project should be forward engineered. It is highly discouraged to continue with the reverse engineering once requirements are in place.

Recommendation 24: Know when to stop. Not all reverse engineering projects will be successful. If the code is not mature or stable at the beginning of the reverse engineering effort, considerable rework to the code may result. This can become expensive and unmanageable. The FAA’s research report states:

If the product is unstable, and many changes to the code base are required then the costs become very high. Developing requirements to software that is being updated uses up a lot of resources as work is repeated and through the spread of impact, the problem becomes larger and unmanageable. If the product is unstable, then the RE processes should be stopped and informal debugging steps should be taken before continuing the reverse engineering processes [4].

I would go a step further to suggest that there may be some situations where the code should be discarded and restarted. Sometimes, it is faster to rewrite clean code than trying to salvage broken or fragile code.

Recommendation 25: Be advised that reverse engineering is hard. End-to-end reverse engineering is not easy. It requires considerable effort and due diligence. In some ways, it may even be more difficult than forward engineering, because it can be tougher to abstract up than to decompose down.

Recommendation 26: Document lessons learned. As with all experiences in life, it is important to document the lessons learned in order to not learn them again. Throughout the project, it’s advisable to keep a lessons learned list and to perform ongoing assessments of what does and does not work.

References

1. RTCA DO-178C, Software Considerations in Airborne Systems and Equipment Certification (Washington, DC: RTCA, Inc., December 2011).

2. R. S. Pressman, Software Engineering: A Practitioner’s Approach, 4 edn. (New York: McGraw-Hill, 1997).

3. Certification Authorities Software Team (CAST), Reverse engineering in certification projects, Position Paper CAST-18 (June 2003, Rev. 1).

4. G. Romanski, M. DeWalt, D. Daniels, and M. Bryan, Reverse engineering software and digital systems, Draft report to be published by FAA/DOT Office of Aviation Research (Washington, DC, October 2011).

5. L. Rierson and B. Lingberg, Reverse engineering of software life cycle data in certification projects, IEEE Digital Avionics Systems Conference (Indianapolis, IN, 2003).

6. C. Dorsey, Reverse engineering within a DO-178B framework, Federal Aviation Administration National Software Conference (Danvers, MA, 2001).

*Brackets added for clarification.

†CAST is a team of international certification authorities who strive to harmonize their positions on airborne software and aircraft electronic hardware in CAST papers.

‡ CAST-18 was written before DO-178C was published and hence references DO-178B.

*Appendix C of the FAA’s reverse engineering research report identifies potential mitigations for the most common issues [4].

*Brackets added for clarity.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for
25 Reverse Engineering