21 Software Partitioning

Acronym

AFDX avionics full-duplexed switched ethernet
API application program interface
BSP board support package
CAST Certification Authorities Software Team
COTS commercial off-the-shelf
CPU central processing unit
CRC cyclic redundancy check
DMA direct memory access
FAA Federal Aviation Administration
I/O input/output
IMA integrated modular avionics
LRU line replaceable unit
MMU memory management unit
RAM random access memory
ROM read only memory
RTOS real-time operating system
SFI software fault isolation
SVA software vulnerability analysis

21.1 Introduction to Partitioning

Partitioning is the Achilles heel of integrated modular avionics (IMA) systems and advanced avionics. Implementing a robustly partitioned system requires extreme care and caution. This chapter provides an overview of the basic concepts related to partitioning and key actions necessary to implement partitioning in a safety-critical system.

21.1.1 Partitioning: A Subset of Protection

Partitioning is a subset of a broader concept called protection. The concepts of protection are described in Certificate Authorities Software Team (CAST)* paper CAST-2, entitled Guidelines for assessing software partitioning/protection schemes. The protection concepts in CAST-2 are presented as follows [1]:

  • Two-way protection—A component X is protected from component Y, and component Y is protected from component X. An example of two-way protection is two components within an avionics unit with no interactions and no shared resources between them.

  • One-way protection—A component X is protected from component Y, but component Y is not protected from component X. An example of one-way protection is an avionics unit that can receive data from the flight management computer but it cannot send data to the flight management computer. The flight management computer software can impact the avionics unit but not vice versa.

  • Strict protection—A component X is strictly protected from component Y if any behavior of component Y has no effect on the operation of component X. An example of this type of protection is two components within an avionics unit with no interactions and no shared resources between them. Strict protection can be one-way or two-way.

  • Safety protection—A component X can be said to be safely protected from component Y if any behavior of component Y has no effect on the safety properties of component X. An example of this is the use of a cyclic redundancy check (CRC) associated with data passed through a nonassured data link, where the only safety property of importance is the corruption of data. Loss of data is not a safety property of interest in this example. Safety protection requires one to identify the safety properties from the safety or hazard analysis. Safety protection can be one-way or two-way.

Protection may be implemented in software, hardware, or a combination of hardware and software. Some examples of how protection has been implemented include encoding/decoding, wrappers, tools, separate hardware resources, and software partitioning. This chapter concentrates on software partitioning, since it is the most common approach implemented in software. It’s worth noting that many software documents (such as DO-178C) interchange the terms protection and partitioning.

This chapter is closely related to Chapter 7 on design and Chapter 20 on real-time operating systems (RTOSs).

21.1.2 DO-178C and Partitioning

DO-178C section 2.4.1 describes partitioning as “a technique for providing isolation between software components to contain and/or isolate faults and potentially reduce the effort of the software verification process” [2]. DO-178C goes on to explain that partitioning between software components can be implemented by allocating software components to different hardware resources, or by running more than one software component on the same hardware. With the speed and capacity of processors today, the latter approach is often selected. DO-178C section 2.4.1 provides the following five guidelines regardless of the partitioning approach [2]:

  1. A partitioned software component should not be allowed to contaminate another partitioned software component’s code, input/output (I/O), or data storage areas.

  2. A partitioned software component should be allowed to consume shared processor resources only during its scheduled period of execution.

  3. Failures of hardware unique to a partitioned software component should not cause adverse effects on other partitioned software components.

  4. Any software providing partitioning should have the same or higher software level as the highest level assigned to any of the partitioned software components.

  5. Any hardware providing partitioning should be assessed by the system safety assessment process to ensure that it does not adversely affect safety.

DO-178C has one objective that explicitly mentions partitioning: Table A-4 objective 13, which states: “Software partitioning integrity is confirmed” [2]. This objective references DO-178C section 6.3.3.f, which explains that during review and/or analysis of the software architecture, it must be ensured that partitioning breaches (violations) are prevented. This objective applies to all software levels, when partitioning is used.

There are five DO-178C objectives that have an indirect link to partitioning: Table A-5 objectives 1–5 all reference (either as an objective or as an activity) DO-178C section 6.4.3.a, which explains that “violations of software partitioning” are a typical error type that should be revealed during requirements-based hardware/software integration testing [2]. This means that partitioning is covered by requirements and that it must be tested. This is difficult as partitioning requirements are often negative, for example: “It shall not be possible for one partition to modify the data memory of another partition unless that memory has been configured as shared.” It would be easy to write a test to show an example or two, but because of various configuration options, showing that the test set is sufficient to verify the requirement can be challenging.

Partitioning also directly supports safety when it is used to separate and/or isolate software functions. It ensures that less critical software functions do not interfere with more critical software functions and that all functions have the time, memory, and I/O resources needed to perform their intended functionality.

21.1.3 Robust Partitioning

Until the late 1990s, partitioning was used rather sparsely to separate two or more software levels running on the same hardware. However, since the late 1990s, it has become a more common technique used in avionics. With the expansion of computing power and the availability of RTOSs that support partitioning, partitioning has become a key characteristic of many modern avionics systems. Robust partitioning is a cornerstone for IMA. RTCA DO-297 defines IMA as: “A shared set of flexible, reusable, and interoperable hardware and software resources that, when integrated, form a platform that provides services, designed and verified to a defined set of safety and performance requirements, to host applications performing aircraft functions” [3]. Since the IMA platform hosts applications of different software levels, robust partitioning is needed to ensure that each application has the necessary resources and does not interfere with other applications.

Chapter 20 introduced the concept of robust partitioning and mentioned it as related to the RTOS. The concept is further explored in this chapter. DO-297 section 2.3.3 explains the concept of robust partitioning as follows [3]:

Robust partitioning is a means for assuring the intended isolation of independent aircraft functions and applications residing in IMA shared resources in the presence of design errors and hardware failures that are unique to a partition or associated with application-specific hardware. If a (different) failure can lead to the loss of robust partitioning then it should be detectable and appropriate action taken. The objective of robust partitioning is to provide the same level of functional, if not physical, isolation and protection as a federated implementation. This means robust partitioning should support the cooperative coexistence of core software and applications hosted on a processor and using shared resources, while assuring unauthorized or unintended interference is prevented… Robust partitioning is a means for assuring the intended isolation and independence in all circumstances (including hardware failures, hardware and software design errors, or anomalous behavior) of aircraft functions and hosted applications using shared resources. The objective of robust partitioning is to provide an equivalent level of functional isolation and independence as a federated system implementation (i.e., applications individually residing on separate Line Replaceable Units (LRU)). This means robust partitioning supports the cooperative coexistence of applications using shared resources, while assuring that any attempted unauthorized or unintended interaction is detected and mitigated.

The implementation of robust partitioning draws from the computer security domain, which uses the concepts of data and information flow (i.e., access control, noninterference, and separability), integrity policies, timing channels, storage channels, and denial of service. However, while safety partitioning and software security concepts are related, the two models do not completely coincide since they are driven by different objectives [4,5].

DO-297 section 3.5 lists the following characteristics of robust partitioning [3]:

  • The partitioning services should provide adequate separation and isolation of the aircraft functions and hosted applications sharing platform resources. Partitioning services are the services provided by the platform that define and maintain the independence and separation between partitions. These services ensure that the behavior of functions or applications within a partition cannot unacceptably affect the behavior of functions or applications in any other partition. These services should prevent any adverse effects on the aircraft of the simultaneous undetected corruption of all the functions and applications’ partitions sharing the affected resources.

  • The ability to determine in real-time, with an appropriate level of confidence, that the partitioning services are performing as specified consistent with the defined level of safety.

  • Partitioning services should not rely on any required behavior of any aircraft function or hosted application. This implies that all protection mechanisms required to establish and maintain partitioning are provided by the IMA platform.

If robust partitioning is not implemented properly, it can lead to numerous problems, some of which may have safety implications. Examples of partitioning issues are as follows [3,6]:

  • Erroneously writing data into the wrong areas; for example, a faulty partition allowing an application to write to a memory location to which another partition assumes it has exclusive access.

  • Stealing time from another application.

  • Crashing the processor.

  • Corrupting the I/O by falsely sending output data which appear to come from a critical function.

  • Corrupting input data before the critical function uses it.

  • Monopolizing internal communication paths.

  • Corrupting a shared flash memory file system.

  • Introducing timing jitter on context switch to a new partition (altering the performance of the new partition).

Robust partitioning occurs across the three basic subsystems of any computing platform: memory, central processing unit (CPU), and I/O [7]. Communications between partitions may also be considered as a shared resource [8]; however, this typically overlaps with the other shared resources, so it is not discussed further. Each of the shared resources (memory, time, and I/O) should be addressed by the partitioning design, implementation, and verification. Partitioning of each shared resource is discussed next.

21.2 Shared Memory (Spatial Partitioning)

John Rushby writes: “Spatial partitioning must ensure that software in one partition cannot change the software or private data of another partition (either in memory or in transit) nor command the private devices or actuators of other partitions” [4]. Basically, spatial partitioning prevents a function in one partition from corrupting or overwriting the data space of a function in another partition [9]. Justin Littlefield-Lawwill and Larry Kinnan explain that spatial partitioning assures that shared system resources in one partition “are not consumed in a manner that would result in a denial of service for other partitions requiring access to the same resource” [10]. There are two common approaches to memory protection: (1) using a memory management unit (MMU) or (2) using software fault isolation (SFI).

Hardware-based spatial partitioning is the most prevalent form of spatial partitioning. A hardware MMU is usually provided with the CPU. The details of the MMU operation vary from processor to processor. The MMU ensures that policies expressed in MMU tables provide the desired controls over memory access. Since the MMU is a commercial off-the-shelf (COTS) device without supporting life cycle data, the operating system is used to set up the MMU tables that the processor subsequently uses. The proper functionality (accuracy) of the MMU needs to be confirmed during the certification effort. Chapter 20 explained that this is often confirmed during the RTOS testing. The Federal Aviation Administration (FAA) report entitled Commercial Off-The-Shelf Real-Time Operating System and Architectural Considerations [11] provides some additional details for how the MMU and RTOS are used to provide robust partitioning.

If the MMU isn’t used, SFI is an alternative [4]. With this approach logical checks are added in the code at each memory access point. The machine code in a partition is examined to determine the destinations of memory references and to check their accuracy.

Indirect memory references cannot be checked statically, so instructions are added to the program to check the contents of the address regis ter at runtime, immediately prior to its use. The SFI technique imposes some overhead cost by adding code to the program. It also requires an additional analysis and certification cost on every project. However, it is possible to automate much of the check procedure and to qualify a tool or toolset that can be used on multiple projects [9].

The CAST-2 position paper identifies several areas to consider when implementing memory partitioning, such as loss, corruption, or delay of input or output data, corruption of internal data, program overlays, buffer sequences, external device interaction, control flow defects which affect memory (e.g., incorrect branching into a partition or protected area), etc. [1].

21.3 Shared Central Processing Unit (Temporal Partitioning)

“Temporal partitioning must ensure that the service received from shared resources by the software in one partition cannot be affected by the software in another partition. This includes the performance of the resource concerned, as well as the rate, latency, jitter, and duration of scheduled access to it” [4]. Partitioning in the time domain is closely related to multitasking schedulability, which was discussed in Chapter 20. ARINC 653 enforces strict round-robin scheduling for partitions (durations and periods are specified in a configuration table). Within the partition, other schedulers are used.

The goal of temporal partitioning is to ensure that functions in one partition do not disturb the timing of events in other partitions. Concerns include one partition that monopolizes the CPU, crashes the system, or issues a HALT instruction—leading to denial of services for other partitions. “Other scenarios that can cause a partition to fail to relinquish the CPU on time include simple schedule overruns, where particular parameter values cause a computation to take longer than its allotted time, and runaway executions, where a program gets stuck in a loop” [4]. The approach to temporal partitioning should take these scenarios into account.

Processor interrupts are used in real-time systems to identify an event that needs processor access. Interrupts must be handled carefully to avoid undermining the temporal partitioning. Normally such disruptions are prevented by eliminating interrupts altogether, with the exception of the timer tick used to implement the schedule. However, some parties desire to expand ARINC 653 to provide a deterministic means of handling interrupts [7]. Interrupts are discussed more in Section 21.5.

The CAST-2 position paper identifies areas to consider when implementing temporal partitioning, such as interrupts and interrupt inhibits, loops, frame overrun, counter or timer corruption, pipelining and caching, control flow defects, memory or I/O contention, software traps (such as divide by zero), etc. [1].

21.4 Shared Input/Output

Most systems have multiple I/O ports, devices, and channels, for example, a serial bus, an ARINC 664 AFDX (avionics full-duplexed switched ethernet) end system, a field-programmable gate array device, or a CRC device. Some devices are dedicated to a specific partition and others are shared by multiple partitions. As noted in Chapter 20 on RTOSs, a device driver typically serves as the glue code between the device and the RTOS kernel.

As with other shared resources, I/O resources need to be partitioned. Partitioning concerns for shared I/O are closely related to space and time partitioning. For each I/O device, partitioning for both the time and space domains must be considered.

Addressing partitioning in I/O can be one of the most challenging aspects of design for a partitioned system. ARINC 653 provides sampling and queuing port interface and operation definitions for interpartition I/O; however, I/O to physical devices or intermodule I/O is implemented at the discretion of the RTOS developers or other stakeholders. The ARINC 653 port mechanism (which is based on pseudo ports and pseudo partitions) can be used as an interface standard for connectivity; however, it may lead to performance issues for the applications [10]. Steve VanderLeest explains that when partitioning I/O devices, not only the end device must be considered, but one must also consider the communication mechanisms connecting the device to the memory and CPU (such as communication busses, direct memory access engines, and intermediate buffers). “The partitioning environment must manage all the salient features of the I/O subsystem” (including latency, bandwidth, control registers, and buffer space) to prevent partitions from affecting one another by an unauthorized means [7].

The approach to address I/O partitioning varies depending on I/O hardware specifics and the low-level device driver. The FAA research report, entitled Real-Time Operating Systems and Component Integration Considerations in Integrated Modular Avionics Systems Report, notes:

Different approaches are possible, such as kernel-centralized I/O control or partition I/O control. Many implementations incorporate some type of RTOS-governed partition or task permission table that permits only certain partitions or tasks to access specific I/O. A health monitor is then employed to identify any undesignated accesses by other tasks or partitions. I/O considerations differ for different RTOSs [8].

Depending on the scenario, the RTOS may implement some constraints or make some assumptions that require special integration steps by the integrator [8]. As with any other assumptions or constraints, these should be documented by the RTOS vendor and/or platform supplier and communicated to the integrator; oftentimes, the platform data sheet includes this type of information.

The use of I/O interrupts should be carefully evaluated in the partitioning scheme. Most CPUs support interrupts from I/O-related events. However, as will be discussed shortly, interrupts can impact partitioning and may not be allowed. Therefore, another approach may need to be established to communicate events that require action.

21.5 Some Partitioning-Related Challenges

Some organizations are tempted to rely on the RTOS alone to implement partitioning. However, partitioning is a system-level characteristic. The processor, board support package (BSP), devices, device drivers, MMU, etc. play a role in the partitioning approach and can disrupt robust time and space partitioning. Some specific areas that pose challenges to partitioning are identified in this section, including direct memory access, cache memory, interrupts, and interpartition communication.

21.5.1 Direct Memory Access

Direct memory access (DMA) is used to transfer a large block of memory in a short period of time, autonomous to the processor. DMA transfers use a DMA engine that may grant exclusive access to the memory bus to perform the block transfer. The memory bus is a shared resource; therefore, it can deny an application its resources if not properly shared. A DMA may violate temporal partitioning when a transfer is initiated by a partition with less execution time remaining than the time needed to complete the DMA transfer. The DMA may also induce memory partitioning violations. One way to address the DMA partitioning issue is to create an application program interface (API) to control the access to DMA rather than allowing the application direct access. Performance may be impacted by the API, but it is still faster than other types of memory transfer, provided the size of the memory blocks transferred is large enough [10].

21.5.2 Cache Memory

Cache memory is an intermediate, high-speed memory that resides between the main memory and the CPU. It contains a copy of frequently accessed memory for rapid access. It significantly improves performance; however, COTS processors do not provide partitioned cache dedicated to specific partitions (although this is changing with some of the new devices planned for future releases). In a partitioned system, the state of the processor is swapped with each partition switch. The state prior to the switch is held in the CPU registers. Most processors provide functionality to perform this swap quickly. In a nonpartitioned system, the state of the cache memory does not need to be saved because there is a single application which does not interfere with itself. However, in a partitioned system, cache is a shared resource [7].

There are a few options to preserve the partitioning when using cache memory. One option to remedy potential cache-induced partitioning violations is to turn off the cache. However, the performance impact is usu ally too high for this approach, so a deterministic method for using cache must be implemented. To date, cache flushing is the most common solution. The approach is to flush the cache during the partition switch so that the new partition has a clean cache memory when it starts. As noted earlier, “Flushing means copying all the cache values only present in the cache back to main memory (i.e., they have been updated, and copy-back mode is used). This places the overhead at the start of the partition rather than it being distributed throughout” [11]. Flushing adds to the time for con text switching and reduces cache performance at the start of a partition while the cache is loading, but it is still more efficient than no cache at all. Since the time taken to perform the flush operations varies, depending on the amount of data written to memory, the worst-case timing for context switching must be considered to ensure that time needs are met. Because each partition time slot starts with an empty cache, the performance of the application can be impacted if the partition duration is too short (so that the cache never gets a chance to fully populate).

Cache write-through mode is another option to address the cache partitioning issues. This is more efficient than the no-cache option, but code execution is slower. The benefit of this option is that cache invalidate is a single fast instruction.

21.5.3 Interrupts

An interrupt is a signal triggered by an asynchronous event. When interrupts are enabled by the processor, the normal operation is suspended in order to service the interrupt event using an interrupt service routine. To prevent such disruptions, interrupts are sometimes eliminated altogether in robustly partitioned systems, except for the timer tick used to implement the schedule [10]. Partitioned RTOSs that comply with ARINC 653 tend to limit interrupts to the system clock. However, some special hardware and/or software techniques may be employed to avoid tempo ral partitioning violations during interrupt (e.g., disallowing an application access to interrupt-causing hardware, implementing software to poll data related to the interrupt signal, or implementing an API to check if the activity requested can be completed prior to the end of the current minor frame time) [10]. Such techniques must be carefully analyzed for effectiveness, which may not be a trivial exercise.

21.5.4 Interpartition Communication

If each partition was an island that didn’t need to communicate with other islands, life would be simpler. However, because partitions may need to communicate with each other, interpartition communication must be designed to support robust partitioning.

The challenge is to design a partitioning solution that enables the exchange of information between partitioned functions (e.g., interpartition communication) and controls access to other shared resources (such as I/O devices) while keeping the partitioned functions largely autonomous and unaffected by other functions. Inter-partition communication and sharing of I/O devices influences both the space and time aspects of partitioning and protection mechanisms [9].

When addressing interpartition communication, both the memory dimension (primarily focused on transferring data from one partition to another) and the temporal dimension (primarily concerned with synchronization and how one partition invokes services from another) must be considered [4]. Communication between partitions must be restricted to only those that are intended and that are authorized by the system configuration data. The FAA’s research report provides suggestions for implementing interpartition communication without violating time or space partitioning [9].

21.6 Recommendations for Partitioning

This section provides some practical recommendations to consider when implementing and verifying partitioning. Each project varies and the technical details must be addressed on a case-by-case basis; however, these recommendations are intended to provide a starting point.

Recommendation 1: Keep in mind that robust partitioning is a system-level concern. As explained in Section 21.5, robust partitioning is not implemented by software alone. It is a system-level concern that requires early and ongoing collaboration between systems, hardware, and software teams. Likewise, robust partitioning isn’t implemented by the RTOS alone. The RTOS can play a significant role in enforcing robust partitioning, but it is not the only component involved.

Recommendation 2: Proactively design for robust partitioning. Robust partitioning requires diligent design—it does not just happen. There are multiple dimensions to consider and numerous technical challenges. However, addressing these challenges proactively during the development phases significantly reduces surprises during the integration and verification phases. The following suggestions are offered:

  1. Consider the common issues noted earlier (such as interrupts, cache memory, I/O challenges) and address them as part of the design solution.

  2. Document partitioning goals, for example:

    1. The system will implement robust memory protection: An application will always receive its allocated memory resources, without interference from other applications.

    2. The system will implement robust time protection: An application will always receive its specified execution time without interference from other applications.

    3. The system will implement robust resource protection: An application will always receive its physical and logical allocated resources without interference from other applications.

  3. Define robust partitioning requirements to meet the goals.

  4. Identify all shared resources so they can be properly utilized to support robust partitioning.

  5. Document partitioning details in the design. In particular, ensure that all components and their interactions are clearly identified.

  6. Identify all requirements (at each hierarchical level) that support or impact partitioning. For example, a requirements attribute may be assigned to distinguish partitioning-related requirements. This also assists with the subsequent partitioning analysis.

  7. Identify multiple means to prevent failure propagation.

  8. Use containment boundaries to limit the effect of failures.

  9. Reference the guidance of DO-297 section 3.5.1 (entitled Design for robust partitioning) [3], which provides guidelines to consider when designing a robustly partitioned IMA platform. The DO-297 concepts are applicable, even if the system is not classified as IMA.

Recommendation 3: During design and design reviews ensure that vulnerabilities are addressed. As the design is documented and reviewed, consider probing questions that would uncover partitioning violations. Table 21.1 pro vides example questions to help uncover data-related and control-related vulnerabilities.

Recommendation 4: Consider using an ARINC 653 compliant RTOS. As previously noted, the RTOS alone doesn’t address all of the partitioning issues; however, it is a good start. ARINC 653 provides guidelines to help RTOS and platform developers think through the partitioning challenges.

Table 21.1 Example Questions to Help Identify Partitioning Vulnerabilities

Data-Related Vulnerabilities Control-Related Vulnerabilities
  • Can partitioning be violated by data flow?

  • Can shared data be inappropriately used?

  • Can messages be sent or received improperly?

  • Can function parameters be inappropriately used?

  • Can configuration data be invalid?

  • Can data be incorrectly passed?

  • Can data be improperly initialized?

  • Can global data be read or written to improperly?

  • Can global data be erroneously written to by unintended functions?

  • Can global data be uninitialized or improperly reinitialized?

  • Can hardware registers be inappropriately used?

  • Can the linker incorrectly assemble data or code?

  • Can data become stale or invalid?

  • Can data drop out?

  • Can improper responses to data miscompares occur?

  • Can unexpected floating-point values occur?

  • Can partitioning be violated by control flow?

  • Can functions be inappropriately invoked within a feature or between features?

  • Can interrupts cause erroneous behavior?

  • Can hardware faults or failures impact data integrity or execution order?

  • Can transitions between modes be improperly implemented?

  • Can resources be inappropriately allocated?

  • Can deactivated code be inadvertently activated?

  • Can initialization sequence be incorrect?

  • Can inappropriate responses to exceptions occur?

  • Can fault handlers behave inappropriately (e.g., miss faults or failures, or handle faults or failures improperly)?

  • Can memory overlaps occur?

  • Can improper hardware addresses be read from or written to?

  • Can inappropriate responses occur during resets?

  • Can synchronization be impacted by miscompares or erroneous waits?

  • Can improper context switching cause erroneous data or timing?

  • Can any unexpected exceptions be generated?

  • Can functions be executed at incorrect rates or times?

Recommendation 5: Utilize other development and verification activities to prevent redundant efforts. Partitioning should be considered throughout the system and software development. Other development or verification activities can be utilized to support the robust partitioning design and verification, including the following:

  • Build upon the data and control coupling activities. The data and control coupling analyses provides a good starting point for robust partitioning. Data and control coupling and software partitioning are related, but are distinct concepts. Partitioning is a way to provide isolation between independent software components and thus provides protection of unintended coupling between independent partitioned components. Data and control coupling is an intentional interaction between components, including coupling between separately partitioned components. Software partitioning does not guarantee avoidance of data or control coupling problems; conversely, data and control coupling problems do not imply that the partitioning mechanism is flawed. For partitioned systems, data and control coupling analyses, as well as partitioning analysis, are required. There may be some synergy between the two objectives.

  • Use robustness testing to confirm partitioning is not violated. When partitioning is clearly documented in the requirements and design, it drives the testing effort, which provides a good start toward proving the robust partitioning claims. The partitioning analysis may identify the need for additional testing; however, the requirementsbased tests provide a foundation to begin building confidence in the partitioning strategy.

  • Utilize the RTOS software vulnerability analysis (SVA). If an RTOS is used, the RTOS vendor may have an SVA available to use as input to the partitioning analysis data. The SVA was discussed in Chapter 20.

Recommendation 6: Test the partitioning mechanism. The main purpose of verifying the partitioning mechanism is to ensure that no erroneous behavior in one partition can contribute to misbehavior or failure of any other partition and to ensure that partitioning of shared resources are not violated. DO-248C discussion paper #14 (entitled Partitioning aspects in DO-178C/DO-278A) provides some suggestions for verifying the partitioning mechanism:

Elements of partitioning integrity can be verified by exercising the partitioning mechanisms using special test scenarios, simulations, and/or analysis techniques. Test scenarios should be written to stimulate the partitioning mechanism by injecting errors or violation attempts to bypass time and space constraints. An example of additional analysis would be the calculation of worst-case execution times to assess temporal performance… A verification test suite (containing normal range test cases and abnormal or out of range test cases for all requirements of the partitioning mechanism) should be established. Robustness of the partitioning mechanism may be demonstrated by use of a requirementsbased test suite (satisfying requirements-based test coverage) [12].

Some projects also implement rogue partition testing, where a rogue partition is developed to try to deliberately violate the partitioning.

Recommendation 7: Perform a partitioning analysis. DO-297 promotes an activity called partitioning analysis. The purpose of the analysis is to demonstrate that no application in a partition can affect the behavior of applications in any other partition in an adverse manner. The partitioning analysis is similar to the system safety assessment, where all potential sources of failure are considered and mitigated. All vulnerabilities should be identified, classified, and mitigated [11]. The engineer(s) performing the analysis must have detailed understanding of the system, hardware, and software architecture. The chief architect is often the ideal person to do this analysis. Following are some common tasks performed as part of the partitioning analysis:

  1. Gather data to support the analysis, including the preliminary sys tem safety assessment, system requirements and design, software requirements and design, hardware architecture, BSP and device driver design data, processor data sheet and/or user manual, device user manuals, interface specifications, configuration tool requirements and design (if applicable), etc.

  2. Identify the robust partitioning claims that will be analyzed, for example,*

    1. The system will implement robust memory protection: An application will always receive its allocated memory resources, without interference from other applications.

    2. The system will implement robust time protection: An application will always receive its specified execution time without interference from other applications.

    3. The system will implement robust resource protection: An application will always receive its physical and logical allocated resources without interference from other applications.

  3. Identify potential vulnerabilities which could violate each claim. All potential sources of error should be systematically identified and considered, including resource limitations, scheduling tasks, I/O, interrupt error sources, etc. A traceability analysis that traces all shared resources to the modules, components, and applications that use those shared resources helps confirm that all potential vulnerabilities have been considered [13]. The potential partitioning violations of shared memory devices (such as read only memory [ROM], random access memory [RAM], cache, queues, and onboard chip registers) should be analyzed. Likewise, the effects of hardware failures on shared and nonshared hardware components should also be analyzed. DO-297 section 3.5.2.5 identifies some common potential sources of design errors that could impact the partitioning [3]:

    1. Interrupts and interrupt inhibits (software and hardware).

    2. Loops (for example, infinite loops or` indirect non-terminating call loops).

    3. Real-time correspondence (for example, frame overrun, interference with real-time clock, counter/timer corruption, pipeline and caching, deterministic scheduling).

    4. Control flow (for example, incorrect branching into a partitioned or protected area, corruption of a jump table, corruption of the processor sequence control, corruption of return addresses, unrecoverable hardware state corruption (for example, mask and halt)).

    5. Memory, input, and/or output contention.

    6. Sharing of data flags.

    7. Software traps (for example, divide by zero, unimplemented instruction, specific software interrupt instructions, unrecognized instruction, and recursion termination).

    8. Hold-up commands (i.e., performance hedges).

    9. Loss of input or output data.

    10. Corruption of input or output data.

    11. Corruption of internal data (for example, direct or indirect memory writes, table overrun, incorrect linking, calculations involving time, corrupted cache memory).

    12. Delayed data.

    13. Program overlays.

    14. Buffer sequence.

    15. External device interaction (for example, loss of data, delayed data, incorrect data, protocol halts).

  4. Identify potential vulnerabilities that are mitigated by design. The majority of the vulnerabilities are mitigated by the design—especially when robust partitioning is proactively considered during the development phases. Each vulnerability mitigated by design should trace to the requirement(s) that demonstrate the mitigation (i.e., there should be a mapping between vulnerabilities and requirements that provide the mitigations). Additionally, each mitigation should be verified during testing.

  5. Identify potential vulnerabilities that are mitigated by process. Some potential vulnerabilities may be mitigated by process (e.g., the use of a qualified tool to verify accuracy of configuration data or a design standard that confirms certain memory assignments).

  6. Identify potential vulnerabilities that cannot be mitigated by design or process. These will need to be communicated to the integrator and possibly the application developer. Such communication is normally documented in the data sheet and appropriate user information, so the integrator or application developer can take the appropriate action.

  7. Coordinate with safety and systems personnel throughout the partitioning analysis. The partitioning analysis is essentially an extension of the safety assessment and should be closely coordinated with the system safety personnel.

  8. Ensure that all potential vulnerabilities have been mitigated, particularly those that were not addressed by design or process. The goal is to minimize mitigation activities to be taken by the integrator or applications; however, in some cases there may be special actions required. For example, the integrator may have to perform some special verification steps or impose special restrictions through the configuration files.

References

1. Certification Authorities Software Team (CAST), Guidelines for assessing software partitioning/protection schemes, Position Paper CAST-2 (February 2001).

2. RTCA DO-178C, Software Considerations in Airborne Systems and Equipment Certification (Washington, DC: RTCA, Inc., December 2011).

3. RTCA DO-297, Integrated Modular Avionics (IMA) Development Guidance and Certification Considerations (Washington, DC: RTCA, Inc., November 2005).

4. J. Rushby, Partitioning in avionics architectures: Requirements, mechanisms, and assurance, DOT/FAA/AR-99/58 (Washington, DC: Office of Aviation Research, March 2000). Also published as NASA/CR-1999-209347 (Hampton, VA: Langley Research Center, March 2000).

5. B. L. Di Vito, A formal model of partitioning for integrated modular avionics, NASA/CR-1998-208703 (Hampton, VA: Langley Research Center, August 1998).

6. K. Driscoll, Integrated modular avionics (IMA) requirements and development, Integrated Modular Avionics Conference for the European Network of Excellence on Embedded Systems (Rome, Italy, 2007), p. 440.

7. S. H. VanderLeest, ARINC 653 hypervisor, IEEE Digital Avionics Systems Conference (Salt Lake City, UT, 2010), pp. 5.E.2-1–5.E.2-20.

8. J. Krodel and G. Romanski, Real-time operating systems and component integration considerations in integrated modular avionics systems report, DOT/ FAA/AR-07/39 (Hampton, VA: Langley Research Center, August 2007).

9. V. Halwan and J. Krodel, Study of commercial off-the-shelf (COTS) real-time operating systems (RTOS) in aviation applications, DOT/FAA/AR-02/118 (Washington, DC: Office of Aviation Research, December 2002).

10. J. Littlefield-Lawwill and L. Kinnan, System considerations for robust time and space partitioning in integrated modular avionics, IEEE Digital Avionics Systems Conference (Orlando, FL, October 2008).

11. J. Krodel, Commercial off-the-shelf real-time operating system and architectural considerations, DOT/FAA/AR-03/77 (Washington, DC: Office of Aviation Research, February 2004).

12. RTCA DO-248C, Supporting Information for DO-178C and DO-278A (Washington, DC: RTCA, Inc., December 2011).

13. J. Krodel and G. Romanski, Handbook for Real-Time Operating Systems Integration and Component Integration Considerations in Integrated Modular Avionics Systems, DOT/FAA/AR-07/48 (Washington, DC: Office of Aviation Research, January 2008).

*CAST is a team of international certification authorities who strive to harmonize their positions on airborne software and aircraft electronic hardware in CAST papers.

*This same example was provided earlier in the chapter but is repeated for completeness.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.93.141