o·s·o·pho·bi·a n. A common fear among embedded systems programmers.
Many embedded systems today incorporate an operating system. This can range from a small kernel to a full-featured operating system—typically called a real-time operating system or RTOS (pronounced “are-toss”). Either way, you’ll need to know what features are the most important and how they are used with the rest of your software. At the very least, you need to understand what a real-time operating system looks like on the outside. In this chapter, we take a detailed look at the mechanisms found in most operating systems and how to use them.
The information in this chapter is very general and does not include specific code examples. That’s because the features and APIs that implement the activities in this chapter are different on each operating system. Subsequent chapters show you what to do on Linux and eCos, two popular operating systems used in embedded environments.
In the early days of computing, there was no such thing as an operating system. Application programmers were completely responsible for controlling and monitoring the state of the processor and other hardware. In fact, the purpose of the first operating systems was to provide a virtual hardware platform that made application programs easier to write. To accomplish this goal, operating system developers needed only provide a loose collection of routines—much like a modern software library—for resetting the hardware to a known state, reading the state of the inputs, and changing the state of the outputs.
Modern operating systems add to this the ability to execute multiple software tasks simultaneously on a single processor—a feature called multitasking. Each such task (also commonly called a thread) is a piece of the software that can be separated from the rest of the operating system and run independently. A set of embedded software requirements can usually be broken down into a small number of such independent pieces. For example, the printer server device described in Chapter 2 contains two main software tasks:
Receiving data from the computers attached to the Ethernet port.
Formatting and sending the data to the printer attached to the parallel port.
Tasks provide a key software abstraction that makes the design and implementation of embedded software easier, and the resulting source code simpler to understand and maintain. By breaking the larger program into smaller pieces, the programmer can more easily concentrate her energy and talents on the unique features of the system under development.
Strictly speaking, an operating system is not a required component
of any computer system—embedded or otherwise. It is always possible to
perform the same functions from within the application program itself.
Indeed, all of the examples so far in this book have done just that.
There is one path of execution—starting at main
—that runs on the system. This is the
equivalent of having only one task. But as the complexity of the
application expands beyond the simple task of blinking an LED, the
benefits of an operating system far outweigh the associated
costs.
One common part of all operating systems is the kernel. In most operating systems, the kernel consists of the scheduler, the routine to handle switching between the different tasks running in the system, and the mechanisms of communication between tasks.
We have already talked about multitasking and the idea that an operating system makes it possible to execute multiple “programs” at the same time. But what does that mean? How is it possible to execute several tasks concurrently? In actuality, the tasks are not executed at the same time. Rather, they are executed in pseudoparallel. They merely take turns using the processor. This is similar to the way several people might read the same copy of a book. Only one person can actually use the book at a given moment, but each person can read it if everyone takes turns.
An operating system is responsible for deciding which task gets to use the processor at a particular moment. The piece of the operating system that decides which of the tasks has the right to use the processor at a given time is called the scheduler. The scheduler is the heart and soul of any operating system. Some of the more common scheduling algorithms are:
This scheduling (also called cooperative multitasking) allows each task to run until it is finished, and only after that is the next task started.
This algorithm allows each task to run until it completes or suspends itself; the next task selected is the one that will require the least amount of processor time to complete.
This algorithm is typically used in real-time operating systems. Each task is assigned a priority that is used to determine when the task executes once it is ready. Priority scheduling can be either preemptive or nonpreemptive. Preemptive means that any running task can be interrupted by the operating system if a task of higher priority becomes ready.
In this scheduling algorithm, each task runs for some predetermined amount of time called a time slice. After that time interval has elapsed, the running task is preempted by the operating system, and the next task in line gets its chance to run. The preempted task doesn’t get to run again until each of the other tasks in that round has had its chance.
The scheduler in some operating systems is real-time. While the schedulers in the previous section take only priorities and time slices into account, a real-time scheduler takes into account that some tasks have deadlines. Some tasks in a real-time system may not have deadlines or might have low penalties for missing deadlines. Tasks such as these can use background processing time to do their work.
In this type of operating system, the real-time scheduler should know the deadlines of all the tasks. The scheduler should base decisions on a comparison of the deadlines of tasks that are in the ready queue.
There are several real-time schedulers, including:
Each task is assigned a unique timeslot in a periodically repeating pattern. A real-time executive is more static than an operating system and assumes that the programmer knows how long each task takes. If the executive knows the task can complete its work in the allotted time, its deadlines can all be met. This won’t work if you need to create and exit tasks on the fly or run tasks at irregular intervals.
The operating system tracks the time of the next deadline for all tasks. At each scheduling point, the scheduler selects the task with the closest deadline. This is a priority-based scheduler, but with the addition that it calculates the deadlines of real-time tasks at each timer tick and adjusts priorities as appropriate. If new priorities are assigned, this might mean new tasks are run and, therefore, that a context switch (which we discuss later in this chapter in the section “Context switch”) must take place. One disadvantage to this scheduler is that it has a large computational overhead.
The operating system tracks deadlines and the time to complete the remaining work for each task. The scheduler selects the task with least laxity, where laxity is the available time minus the remaining work time, and the available time is the deadline time minus the current time. The theory behind this scheduling algorithm is that the task with the least “time to spare” can least afford to yield the processor. The exact order of which tasks run may differ from the earliest-deadline-first scheduler.
When a task is created, it states its requirements in terms of deadlines, processor usage, and other relevant resources. The operating system should admit the new task only if the system can guarantee those resources without making any other (already admitted) tasks fail.
The scheduling algorithm you choose depends on your goals. Different algorithms yield different results. Let’s suppose you’re given 10 jobs, and each will take a day to finish. In 10 days, you will have all of them done. But what if one or more has a deadline? If the ninth task given to you has a deadline in three days, doing the tasks in the order you receive them will cause you to miss that deadline.
The purpose of a real-time scheduling algorithm is to ensure that critical timing constraints, such as deadlines and response time, are met. When necessary, decisions are made that favor the most critical timing constraints, even at the cost of violating others.
To demonstrate the importance of a scheduling algorithm, consider a system with only two tasks, which we’ll call Task 1 and Task 2. Assume these are both periodic tasks with periods T1 and T2, and for each task, the deadline is the beginning of its next cycle. Task 1 has T1 = 50 ms and a worst-case execution time of C1 = 25 ms. Task 2 has T2 = 100 ms and C2 = 40 ms.
Does the processor have enough time to handle both tasks? We can start answering this question by looking at how much of the processor time each task needs in its worst case—its utilization. The utilization of task i is:
Ui = Ci/Ti |
Thus, if U1 = 50 percent and U2 = 40 percent, the total requested utilization is:
U = U1 + U2 = 90 percent |
It seems logical that if utilization is less than 100 percent, there should be enough available CPU time to execute both tasks.
Let’s consider a static priority scheduling algorithm, where task priorities are unique and cannot change at runtime. With two tasks, there are only two possibilities:
Case 1: Priority(T1) > Priority(T2)
Case 2: Priority(T1) < Priority(T2)
The two cases are shown in Figure 10-1. In Case 1, both tasks meet their respective deadlines. In Case 2, however, Task 1 misses a deadline, despite 10 percent idle time, because Task 2’s higher priority required it to be scheduled first. This illustrates the importance of priority assignment.
Real-time systems sometimes require a way to share the processor that allows the most important tasks to grab control of the processor as soon as they need it. Therefore, most real-time operating systems utilize a priority-based scheduling algorithm that supports preemption. This is a fancy way of saying that at any given moment, the task that is currently using the processor is guaranteed to be the highest-priority task that is ready to do so. Lower-priority tasks must wait until higher-priority tasks are finished using the processor before resuming their work. The scheduler detects such conditions through interrupts or other events caused by the software; these events are called scheduling points.
Figure 10-2 shows a scenario of two tasks running at different priority levels.
There are two tasks in Figure 10-2: Task A and Task B. In this scenario, Task B has higher priority than Task A. At the start, Task A is running. When Task B is ready to run, the scheduler preempts Task A and allows Task B to run. Task B runs until it has completed its work. At this point, control of the processor returns to Task A, which continues its work from where it left off.
When a priority-based scheduling algorithm is used, it is also necessary to have a backup scheduling policy for tasks with the same priority. A common scheduling algorithm for tasks with the same priority is round robin. In Figure 10-3, we show a scenario in which three tasks are running in order to demonstrate round robin scheduling with a time slice. In this example, Tasks B and C are at the same priority, which is higher than that of Task A.
In the scenario shown in Figure 10-3, Task A is running when Task B becomes ready to run. Task A is preempted so that Task B can run. While Task B is running, Task C also becomes ready to run. The scheduler therefore allows Task B to run for a time slice period. After this period expires, the scheduler gives Task C an opportunity to run. Then Task C completes its work. Because Task B still has work to complete and has the highest priority of all ready tasks, the scheduler allows Task B to run. Once Task B completes its work, Task A can finish.
You might be asking yourself how the scheduler—which is also a piece of software—gets an opportunity to execute and do its job. That is where scheduling points enter in. Simply stated, scheduling points are the set of operating system events that result in a run of the scheduler code. Following is a list of scheduling points:
When a task is created, the scheduler is called to select the next task to be run. If the currently executing task still has the highest priority of all the ready tasks, it will be allowed to continue using the processor. Otherwise, the highest-priority ready task will be executed next.
As in task creation, the scheduler is called during task deletion. However, in the case of task deletion, if the currently running task is being deleted, a new task is always selected by virtue of the fact that the old one no longer exists.
The clock tick is the heartbeat of the system. It is a periodic event that is triggered by a timer interrupt (similar to the one we have already discussed). The clock tick provides an opportunity to awaken tasks that are waiting for a certain length of time to pass before taking their next action.
When the running task makes a system call that blocks, that task is no longer able to use the processor. Thus, the scheduler is invoked to select a new task to run.
A task might be waiting for some event to take place, such as for data to be received. Once the event occurs, the blocked task becomes ready to execute and thus is immediately eligible to be considered for execution.
Real-time operating systems can allow a task to lock the scheduler. Locking the scheduler prohibits it from executing and, in turn, keeps other tasks from running (since they cannot be scheduled). The task that locks the scheduler maintains control of the processor whether higher-priority tasks are ready to run or not. This allows the task with the scheduler lock to perform operations without worrying about thread-safe issues with other tasks. Interrupts still function and ISRs still run.
Care must be taken when locking the scheduler, because it can hinder the responsiveness of the system. In general, locking the scheduler should be avoided whenever possible.
Every lock of the scheduler must have an unlock counterpart—otherwise, the system will stop running. The following is an example of locking and unlocking the scheduler:
/* Perform task operations. */ os_scheduler_lock(); /* Perform work while scheduler cannot run another task. */ os_scheduler_unlock(); /* Perform other task operations. */
The operating system keeps track of the scheduler lock state with a variable. This variable is incremented when calls are made to lock the scheduler and decremented when unlock calls are made. The scheduler knows it can run when the lock state variable is set to 0.
Different types of tasks can run in an operating system. For example, a task can be periodic, where it exits after its work is complete. The task can then be restarted when there is more work to be done. Typically, however, a task runs forever, similar to the infinite loop discussed in Chapter 3. Each task also has its own stack that is typically allocated statically.
Remember how we said that only one task could actually use the processor at a given time? That task is said to be the running task, and no other task can be in that same state at the same time. Tasks that are ready to run—but are not currently using the processor—are in the ready state, and tasks that are waiting for some event external to themselves to occur before going on are in the waiting state. Figure 10-4 shows the relationships between these three states.
A transition between the ready and running states occurs whenever the operating system selects a new task to run during a scheduling point. The task that was previously running leaves the running state, and the new task (selected from the queue of tasks in the ready state) is promoted to running. Once it is running, a task will leave that state only if it terminates, if a higher-priority task becomes ready, or if it needs to wait for some event external to itself to occur before continuing. In the latter case, the task is said to block, or wait, until that event occurs. A task can block by waiting for another task or for an I/O device, or it can block by sleeping (waiting for a specific time period to elapse).
When the task blocks, it enters the waiting state, and the operating system selects one of the ready tasks to be run. So, although there may be any number of tasks in each of the ready and waiting states, there will always be exactly one task in the running state at any time.
It is important to note that only the scheduler can promote a task to the running state. Newly created tasks and tasks that are finished waiting for their external event are placed into the ready state first. The scheduler will then include these new ready tasks in its future decision-making.
In order to keep track of the tasks, the operating system typically implements queues for each of the waiting and ready states. The ready queue is often sorted by priority so that the highest-priority task is at the head of the queue. The scheduler can then easily pick the highest-priority task to run next.
The actual process of changing from one task to another is called a context switch. Task contexts are processor-specific and sometimes compiler-specific, as is the code that implements the context switch. That means it must always be written in assembly language and is very hardware-specific. Operating systems, especially for embedded systems, emphasize the goal of minimizing the time needed to switch task contexts. Figure 10-5 shows an example of a context switch operation between two tasks: A (at priority 150) and B (at a higher priority, 200).
If there are no tasks in the ready state when the scheduler is called, the idle task is executed. The idle task looks the same in every operating system. It is simply an infinite loop that does nothing and never blocks. The idle task is completely hidden from the application developer. Sometimes, however, the operating system does assign it a valid task ID and priority. The idle task is always considered to be in the ready state (when it is not running), and because of its low priority, it may be found at the tail of the ready list. Other tasks are sometimes referred to as user tasks to distinguish them from the idle task.
The scheduler maintains information about the state of each task. This information is called the task context and serves a purpose similar to a bookmark. In the earlier analogy of multiple readers, each reader of the book is presumed to have his own bookmark. The bookmark’s owner must be able to recognize it (e.g., it has his name written on it), and it must indicate where he stopped reading when he last gave up control of the book. This is the reader’s context.
A task’s context records the state of the processor just prior to the point at which another task takes control of it. This usually consists of a pointer to the next instruction to be executed (the instruction pointer), the address of the current top of the stack (the stack pointer), and the contents of the processor’s flag and general-purpose registers.
To keep tasks and their contexts organized, the operating system maintains some information about each task. Operating systems written in C often keep this information in a data structure called the task control block. The task control block contains a pointer to the task’s context, the current state of the task, the task priority, the task entry-point function, and any task-specific data (such as parameters and task identification).
Setting the priorities of the tasks in a system is important. Care needs to be taken so that lower-priority tasks get to do their work, just as the higher-priority tasks do. Otherwise, starvation of a task can occur, where a low-priority task is kept from doing any work at all.
There are several reasons that starvation may occur in a system, including the following:
Processor overload occurs when high-priority tasks monopolize the processor and are always running or ready to run.
Low-priority tasks are always at the end of a priority-based event queue and, therefore, may be permanently blocked from executing.
A task may be prevented from running by a bug in another task; for example, one task fails to signal when it is supposed to.
There are solutions to these problems, such as:
Using a faster processor.
Using a FIFO queue of tasks rather than priority-based scheduling. This may not be possible, depending on the implementation of the operating system. In addition, it might be appropriate in some systems to starve low-priority tasks, but this must be a conscious decision.
Fixing all bugs (this is sometimes easier said than done).
The rate monotonic algorithm (RMA) is a procedure to determine the optimal priority of each periodic task in a system. This procedure assigns fixed priorities to tasks to maximize their “schedulability.” A task set is considered schedulable if all tasks meet all deadlines all the time. The algorithm is straightforward:
Assign the priority of each task according to its period, so that the shorter the period the higher the priority.
The reasoning behind this algorithm is that the task with the shortest period has the least time to do its work once it becomes ready again. In the next example, the period of Task 1 is shorter than the period of Task 2. Following RMA’s rule, Task 1 is assigned the higher priority. This corresponds to Case 1 in Figure 10-1, which is the priority assignment that succeeded in meeting both deadlines.
RMA is an example of a static priority algorithm. The alternative to a static priority algorithm is the much more complicated class of dynamic schedulers, which appear on sophisticated, commercial-grade operating systems; we don’t recommend that you try to design a dynamic scheduler of your own. RMA is the optimal static priority algorithm. If a particular set of tasks cannot be scheduled using the RMA, that task set cannot be scheduled using any static priority algorithm.
One major limitation of fixed-priority scheduling is that it is not always possible to fully utilize the processor. Even though RMA is the optimal fixed-priority scheme, it has a worst-case schedulable bound of the following:
W n = n (2(1/ n ) – 1) |
where n is the number of tasks in a system. As you would expect, the worst-case schedulable bound for one task is 100 percent. But as the number of tasks increases, the schedulable bound decreases, eventually approaching its limit of about 69.3 percent (ln 2, to be precise).
It is theoretically possible for a set of tasks to require only 70 percent CPU utilization in sum and still not meet all its deadlines. For example, consider the case shown in Figure 10-6. The only change from the example shown in Figure 10-1 is that both the period and execution time of Task 2 have been lowered. Based on RMA, Task 1 is assigned higher priority. Despite only 90 percent utilization, Task 2 misses its first deadline. Reversing priorities would not have improved the situation.
In this case, the only way to meet all deadlines is to use a dynamic scheduling algorithm, which, because it increases system complexity, is not available in many operating systems.
Sometimes a particular set of tasks will have total utilization above the worst-case schedulable bound and still be schedulable with fixed priorities. Case 1 in Figure 10-1 is a perfect example. Schedulability then depends on the specifics of the periods and execution times of each task in the set, which must be analyzed by hand. Only if the total utilization is less than W n can you skip that manual analysis step and assume that all the tasks will meet all their deadlines.
RMA degrades gracefully in that if just one deadline will be missed, it is sure to be the one of the lowest-priority task with outstanding work. The “critical set” is the set of tasks that won’t ever miss any deadlines. So, RMA works nicely for a mix of hard and soft real-time tasks, where the hard deadlines correspond to the highest-frequency tasks.
Earlier in this chapter, we discussed how the operating system makes it appear as if tasks are executing simultaneously. When writing code for tasks, it is best to keep this in mind and write tasks as if they run in parallel. The basic operation of a task is shown in Figure 10-7.
As shown in Figure 10-7, the first part initializes any task-specific variables or other resources used by the task. Then an infinite loop is used to perform the task work. First, the task waits for some type of event to occur. At this point, the task is blocked. It is in the waiting state and is put on the waiting queue. There are various types of events: a signal from another task, a signal from an ISR, and the expiration of a timer set by the application. Once one of these events occurs, the task is in the ready state, and the scheduler can run the task when processor time becomes available.
The code for this basic task (using imaginary functions) would look something like this:
void taskBasicExample(void) { /* Initalize all task-specific resources. */ taskBasicInit(); while (1) { /* Wait for the event to happen. */ os_wait_for_event(event_type); /* Perform the work for this task. */ } }
Though we frequently talk about the tasks in a multitasking operating system as completely independent entities, that’s not completely accurate. All of the tasks are working together to solve a larger problem and must occasionally communicate with one another to synchronize their activities. For example, in the print-server device, the printer task doesn’t have any work to do until new data is supplied to it by one of the computer tasks. So the printer and computer tasks must communicate with one another to coordinate their access to common data buffers.
Operating systems contain various mechanisms that aid in synchronization between two tasks. Each of the mechanisms is useful for different scenarios. A particular operating system may offer additional or slightly different types of methods for synchronization, so we will cover the ones commonly found in real-time operating systems.
In Chapter 8, we introduced the concept of a critical section, which is simply a segment of code that must be executed in its entirety before the operating system is allowed to run anything else. Often a critical section consists of a single C-language statement that sets or reads a variable. We saw some examples using interrupts in Chapter 8; another example could be a status variable that is shared between a printer task and other tasks in the system.
Remember that in a multitasking environment, you generally don’t know in which order the tasks will be executed at runtime. One task might be writing some data into a memory buffer when it is suddenly interrupted by a higher-priority task. If the higher-priority task were to modify that same region of memory, then bad things could happen. At the very least, some of the lower-priority task’s data would be overwritten. When a round robin scheduler is in use—as it normally is on multitasking operating systems—even a task of the same priority can interrupt and access resources.
One way to ensure the instructions that make up the critical section are executed in order and without interruption is to disable interrupts. However, disabling interrupts when using an operating system may not be permitted and should be avoided; other mechanisms should be used to execute these atomic operations. We’ll explore the ones you’ll find in most operating systems.
One form of synchronization is a mutex (short for mutual exclusion). A mutex ensures exclusive access to shared variables or hardware. A mutex is analogous to a bathroom key in a high-traffic rest stop. Only one person (task) is allowed to use the bathroom (shared resource) at a time. That person (task) must first acquire the bathroom key (mutex), of which there’s only one copy. The shopkeeper, who owns the key, is analogous to the operating system, which owns the mutex.
You can think of a mutex as being nothing more than a multitasking-aware binary flag. The meaning associated with a particular mutex must, therefore, be chosen by the software designer and understood by each of the tasks that use it. For example, the data buffer that is shared by the printer and computer task in the print server would have a mutex associated with it. When this binary flag is set, it is assumed that one of the tasks is using the shared data buffer. All other tasks must wait until that flag is cleared (and then until they set it again themselves) before reading or writing any of the data within that buffer. A task must wait for and acquire a mutex before releasing the mutex.
We say that mutexes are multitasking-aware because the processes of setting and clearing the binary flag are atomic. A task can safely change the state of the mutex without risking that a context switch will occur in the middle of the modification. If a context switch were to occur, the binary flag might be left in an unpredictable state, and a deadlock between the tasks could result. The atomicity of the mutex set and clear operations is enforced by the operating system, which disables interrupts before reading or modifying the state of the binary flag.
Mutexes are used for the protection of shared resources between tasks in an operating system. Shared resources are global variables, memory buffers, or device registers that are accessed by multiple tasks. A mutex can be used to limit access to such a resource to one task at a time.
In the operating system’s code, interrupts can be disabled during the critical section. But generally, tasks should not disable interrupts. If they were allowed to do so, other tasks—even higher-priority tasks that didn’t share the same resource—would not be able to execute during that interval. Mutexes provide a mechanism to protect critical sections within tasks without disabling interrupts.
Another synchronization mechanism is called a semaphore. A semaphore is used for intertask synchronization and is similar to the mutex. However, a semaphore’s value can be any nonnegative integer value.
Let’s go back to the bathroom analogy. Imagine now that there are two bathrooms (shared resources) and two keys (represented by the semaphore). We want to allow two people (tasks), but no more than two, to simultaneously use the bathrooms. If a key is available (the semaphore’s value is not zero), the person can acquire the key and use the bathroom. If all keys are being used (the semphore’s value is zero), the next arriving person (task) must wait until a key becomes available. Each time a semphore is signaled, its value is incremented. When a semaphore is acquired (after the task has waited for it), its value is decremented.
Semaphores are generally used as synchronization mechanisms, with one task signaling and another waiting. In our example of the print server, a semaphore can be used to signal to a task that incoming data is present. As data comes into the print server, it is buffered. Once a certain amount of data is accumulated, a semaphore can be used to signal the computer task that it is time to process that data. This example is similar to a relay race. In this case, the baton is the semaphore. Only the runner (task) with the baton is able to run. The computer task waiting for the incoming data is like the runner waiting for the baton. Once the baton is passed, the next runner (computer task) can proceed.
Mutexes should exclusively be used for controlling access to shared resources. While semaphores are typically used as signalling devices. A semaphore can be used to signal a task from another task or from an ISR—for example, to synchronize activities.
Mutexes are powerful tools for synchronizing access to shared resources. However, they are not without their own dangers. Two of the most important problems to watch out for are deadlock and priority inversion.
Deadlock can occur whenever there is a circular dependency between tasks and resources. The simplest example is that of two tasks: 1 and 2. Each task requires two mutexes: A and B. If Task 1 takes mutex A and waits for mutex B while Task 2 takes mutex B and waits for mutex A, then each task is waiting for the other to release the mutex. Here is an example of code demonstrating this deadlock problem:
void task1(void) { os_wait_for_mutex(mutexA); os_wait_for_mutex(mutexB); /* Other task1 work. */ } void task2(void) { os_wait_for_mutex(mutexB); os_wait_for_mutex(mutexA); /* Other task2 work. */ }
These tasks may run without problems for a long time, but eventually one task may be preempted in between the wait calls, and the other task will run. In this case, Task 1 needs mutex B to be released by Task 2, while Task 2 needs mutex A to be released by Task 1. Neither of these events will ever happen.
When a deadlock occurs, it essentially brings both tasks to a halt, though other tasks might continue to run. The only way to end the deadlock is to reboot the entire system, and even that won’t prevent it from happening again.
Priority inversion occurs whenever a higher-priority task is blocked, waiting for access to a shared resource that is currently not being used. This might not sound like too big of a deal—after all, the mutex is just doing its job of arbitrating access to the shared resource—because the higher-priority task is written with the knowledge that at times the lower-priority task will be using the resource they share. However, consider what happens when there is a third task with a priority level somewhere between those two. This situation is illustrated in Figure 10-8.
In Figure 10-8, there are three tasks: Task H (high priority), Task M (medium priority), and Task L (low priority). Task L becomes ready first and, shortly thereafter, takes the mutex. Now, when Task H becomes ready, it must block until Task L is done with their shared resource. The problem is that Task M, which does not even require access to that resource, gets to preempt Task L and run, though it will delay Task H from using the processor. Once Task M completes, Task L runs until it releases the semaphore. Finally, at this point Task H gets its chance to run. This example shows how the task priorities can be violated because of the mutex sharing between Task H and Task L.
Several solutions to this problem have been developed. One of the most widely used solutions is called priority inheritance. This technique mandates that a lower-priority task inherit the priority of any higher-priority task that is waiting on a resource they share. This priority change should take place as soon as the higher-priority task begins to wait; it should end when the resource is released. This requires help from the operating system. If we apply this to the preceding example, the priority of Task L is increased to that of Task H as soon as Task H begins waiting for the mutex. Once Task L releases the mutex, its priority is set to what it was before. Task L cannot be preempted by Task M until it releases the mutex, and Task H cannot be delayed unnecessarily.
Another solution is called priority ceilings. In this case, a priority value is associated with each resource; the scheduler then transfers that priority to any task that accesses the resource. The priority assigned to the resource is the priority of its highest-priority user, plus one. Once a task finishes with the resource, its priority returns to normal. One disadvantage of using priority ceilings is that the priority level for tasks using the mutex must be known ahead of time so the proper ceiling value can be set. This means the operating system cannot do the job automatically for you. Another disadvantage is that if the ceiling value is set too high, other unrelated tasks with priority levels below the ceiling can be locked out from executing. The priority ceiling is used even when priority inversion is not occurring, as a prophylactic measure.
The need for synchronization mechanisms needs to be thought out during the design of the software. Once designed into the software, everyone must use the mutex properly in order to access the shared resource. If someone breaks the rule, the software will not always operate as designed. In large, multiprogrammer projects, it can be hard for different programmers to realize that a resource is shared.
While semaphores can be used to signal from one task to another, there may be times when data needs to be passed in addition to the signal. For this purpose, operating systems frequently provide another mechanism called a message queue (or mailbox).
The operating system handles the buffering and buffer management for message passing as well as the safe communication of data between tasks. Message passing is therefore an alternative to the simple expedient of storing data in a global variable, and offers a much cleaner and more bug-free method of data exchange. Real-time operating systems typically use pointers for accessing the message data for reasons of speed and memory conservation.
Many applications that use message queues consist of a producer task that sends the data and a consumer task that receives it. There can be multiple producer and/or consumer tasks. The message content is typically understood between the sender and receiver ahead of time.
If the message queue is full, the operating system can block the sending task until space is available for the message. Similarly, if no message is present when the receiver attempts to read the message, the operating system blocks the receiving task. The size of the queue may vary depending on the message traffic.
We have now covered the basic mechanisms that are commonly found in most real-time operating systems. Several other features may be included, depending on the operating system, to perform various other useful operations. Some of the other mechanisms commonly found in real-time operating systems are:
These allow a task to wait for multiple events to occur before unblocking. Either all of the events must occur (the events are ANDed together) or any of the events may occur (the events are ORed together).
These are similar to a counting semaphore where a task signals another task to wake up; however, unlike a semaphore, if no task is currently waiting on the condition variable when it is signaled, the signal is lost.
These are similar to a mutex and are typically used in symmetric multiprocessing (SMP) systems. Like a mutex, a spinlock is a binary flag that a task attempts to claim. If the flag is not set, the task is able to obtain the spinlock. If the flag is set, the task will spin in a loop, constantly checking to see when the flag is not set. This might seem wasteful (and it can be); however, it is assumed that the spinlock is only held for a very short period of time, and this CPU must wait for software running on the other CPU to progress first anyway.
A counter keeps track of the number of times a specific event has occurred. An alarm is used in conjunction with a counter to wake up a task (so that it will take action) when a particular number of events have occurred.
There are several issues you need to be aware of when handling interrupts in embedded systems that use an operating system, including:
Interrupts have the highest priority in a system—even higher than the highest operating system task. Interrupts are not scheduled; the ISR executes outside of the operating system’s scheduler.
Because the operating system code must guarantee its data structures’ integrity when they are accessed, the operating system disables interrupts during operations that alter internal operating system data, such as the ready list. This increases the interrupt latency. The responsiveness of the operating system comes at the price of longer interrupt latency.
When a task disables interrupts, it prevents the scheduler from doing its job. Tasks should not disable interrupts on their own.
Some operating systems have a separate stack space for the execution of ISRs. This is important because, if interrupts are stored on the same stack as regular tasks, each task’s stack must accommodate the worst-case interrupt nesting scenario. Such large stacks increase RAM requirements across all n tasks.
Because ISRs execute outside of the scheduler, they are not allowed to make any operating system calls that can block. For example, an ISR cannot wait for a semaphore, though it can signal one.
Some operating systems use a split interrupt handling scheme, where the interrupt processing is divided into two parts. The first part is an ISR that handles the bare minimum processing of the interrupt. The idea is to keep the ISR short and quick.
The second part is handled by a DSR. The DSR handles the more extensive processing of the interrupt event. It runs when task scheduling is allowed; however, the DSR still has a higher priority than any task in the system. The DSR is able to signal a task to perform work triggered by the interrupt event.
For example, in the print server device, an interrupt might be used to handle incoming data from the computers on the Ethernet network. The Ethernet controller would interrupt the processor when a packet is received. Using the split interrupt handling scheme, the ISR would handle the minimal initial work: determining the interrupt event, masking further Ethernet interrupts, and acknowledging the interrupt. The ISR would then tell the operating system to run the DSR, which would then handle the low-level data packet processing before passing the data on to a task for further processing.
Engineers often use the term real-time to describe computing problems for which a late answer is as bad as a wrong one. These problems are said to have deadlines, and embedded systems frequently operate under such constraints. For example, if the embedded software that controls your antilock brakes misses one of its deadlines, you might find yourself in an accident. So it is extremely important that the designers of real-time embedded systems know everything they can about the behavior and performance of their hardware and software. In this section, we will discuss the performance characteristics of real-time operating systems.
The designers of real-time systems spend a large amount of their time worrying about worst-case performance. They must constantly ask themselves questions such as: what is the worst-case amount of time between the moment a human operator presses the brake pedal and the moment an interrupt signal arrives at the processor? What is the worst-case interrupt latency? And what is the worst-case amount of time for the software to respond by triggering the braking mechanism? Average or expected-case analysis simply will not suffice in such systems.
Most of the real-time operating systems available today are designed for possible inclusion in real-time systems. Ideally, their worst-case performance is well understood and documented. To earn the distinctive title of “real-time operating system,” an operating system should be deterministic and have guaranteed worst-case interrupt latency and context switch times. Given these characteristics and the relative priorities of the tasks and interrupts in your system, it is possible to analyze the worst-case performance of the software.
An operating system is said to be deterministic if the worst-case execution time of each of the system calls is calculable. Operating system designers who take real-time behavior seriously usually publish a data sheet that provides the minimum, average, and maximum number of clock cycles required by each system call. These numbers are usually different for different processors. Therefore, it is important when comparing these numbers that equivalent (or better yet, the same) hardware is used. But it is reasonable to expect that if the algorithm is deterministic on one processor, it will be so on any other. The actual times can differ.
Interrupt latency is a key in determining the responsiveness of an RTOS. An RTOS adds to the interrupt latency because it must do some processing once an interrupt occurs. The amount of time this processing takes is an important characteristic of the RTOS for a real-time system. When an interrupt occurs, the processor must take several steps before executing the ISR. First, the processor must finish executing the current instruction. That probably takes less than one clock cycle, but some complex instructions require more time than that. Next, the interrupt type must be recognized. This is done by the processor hardware and does not slow or suspend the running task. Then, the RTOS must process the interrupt and determine which ISR is called. Finally, and only if interrupts are enabled, the ISR that is associated with the interrupt is started.
Of course, if interrupts are ever disabled within the operating system, the worst-case interrupt latency increases by the maximum amount of time that they are turned off. But operating systems have certain places where interrupts must be disabled. These are the internal critical sections—relating to operating system structures—described earlier; there are no alternative methods for the operating system to protect them. Each operating system will disable interrupts for a different length of time, so it is important that you know what your system’s requirements are. One real-time project might require a guaranteed interrupt response time as short as 10 ms, while another requires only 100 ms.
The third real-time characteristic of an operating system is the amount of time required to perform a context switch. This is important because it represents overhead across your entire system. For example, imagine that the average execution time of any task before it blocks is 100 ms but that the context switch time is also 100 ms. In that case, fully one-half of the processor’s time is spent within the context switch routine! Again, the actual times are usually processor-specific because they are dependent on the number of registers that must be saved. Be sure to get these numbers for any operating system you are thinking of using. That way, there won’t be any last-minute surprises.
The answer is...it depends. In many cases, there is no clear-cut answer to this question. Many embedded systems can (and do) operate exactly as they need to by using an infinite loop, as we discussed in Chapter 3. These embedded systems do not need to be complicated by adding additional software, such as an RTOS. There’s no prize for making an embedded system more complicated.
Each project should be evaluated on its own. Start with the notion that you do not need an RTOS. Then take a look at the overall system requirements. Make a list of the different software modules the system will need in order to meet these requirements.
Let’s go back to the example of the print server device. The data-flow diagram in Chapter 2 is a good starting point. Some of the modules needed for the print server are an interrupt subsystem to handle timer and peripheral interrupts, a handler for the parallel port to communicate and send data to the printer, a networking stack for communication with computers via the Ethernet controller, a debug module that uses a serial port for output (this is not required, but it is helpful), and possibly a monitor and control command-line interface.
Now you can ask some questions about these modules in the system to find out the responsibilities of each. It might help to draw this out in your project notebook. How will these modules interact with each other? Are the modules independent and standalone or do they have interdependencies? Will they need to share memory or other hardware resources?
In the print server example, a networking stack is required. This might not be something you would want to create from scratch. Several networking stacks are available and many RTOSes include them as well.
There may not be easy answers to some of these questions. They are not solely based on technical issues. In the absence of a clear-cut winner, it’s probably best to err on the side of what makes the software easier to read and implement. Making a list of pros and cons might aid in the decision process.
Now, if your decision is to use an RTOS, move on to the next section for a discussion of some criteria for determining which RTOS is best for the project.
The previous edition of this book showed how to build your own RTOS. Despite this, we strongly recommend using an existing operating system. Let us say that again: we highly recommend using an off-the-shelf operating system rather than writing your own. A wide variety of operating systems are available to suit nearly every project and pocketbook. Using an off-the-shelf operating system allows your software team to focus on the development of the application for the product. Granted, there will be a learning curve to get up to speed on using the operating system.
In this section, we will discuss the process of selecting the operating system that best fits the needs of your project. Selecting an RTOS can be tricky. There are plenty of criteria to consider when making this decision. Of course, the criteria are typically weighted differently from project to project and company to company.
Let’s take a look at some of the important criteria used in making an RTOS selection:
The processor is typically the first choice in the hardware design on a project. Most RTOSes support the popular processors (or at least processor families) used in embedded systems. If the processor used on your project is not supported, you need to determine whether porting the RTOS to that processor is an option or if it is necessary to choose a different RTOS. Porting an RTOS is not always trivial.
We have already covered the real-time characteristics of an RTOS, which include interrupt latency, context switch time, and the execution time of each system call. These are technical criteria that are inherent to the system and cannot be changed.
RTOSes span the cost spectrum from open source and royalty free to tens of thousands of dollars per developer seat plus royalties for each unit shipped. You need to understand what your costs are in both cases. Open source might mean no upfront costs, but there might be costs associated with getting support when needed. You also need to understand the licensing details of the RTOS you choose.
Clearly, in an embedded environment, memory constraints are a frequent concern. A few RTOSes can be scaled to fit the smallest of embedded systems—for example, by removing features to create a smaller footprint. Others require a minimum set of resources comparable to a low-end PC. It is important to keep in mind the potential need to change an RTOS in the future, when memory is not as plentiful or costs need to be reduced.
The device drivers included with an RTOS can aid in keeping the development on schedule. This reduces the amount of code you need to develop for particular peripherals. Many RTOSes support the common devices found in embedded systems.
If additional features are needed, such as networking support, graphics libraries, web interfaces, and filesystems, an RTOS might include these and have the code already integrated and tested. Some RTOSes might require more fees for using these added features. If the necessary features are not included, you will need to identify third parties that provide them so that these components can be integrated into the system.
This may include a number of incidents or a period of phone support. Some RTOSes require you to pay an annual fee to maintain a service contract. For open source RTOSes, an open forum or mailing list might be provided. If more specialized support is needed, you’ll have to search around to see what is available. Popular open source RTOSes have companies dedicated to providing support.
Make sure the RTOS works with the assembler, compiler, linker, and debugger you have already obtained. If the RTOS does not support tools that you or your team are familiar with, the learning curve will take more time.
No matter which RTOS you choose, our advice is to get the source code if you can. The reason for this is that if you can’t get support when you need it (say, at 1 A.M. for a deadline coming at 8 A.M., or if the operating system vendor stops supporting the product), you’ll be glad to be able to find and fix the problem yourself. Some proprietary RTOSes provide only object code. Find out what is provided before you make your final decision.
With such a wide variety of operating systems and features to choose from, it can be difficult to decide which is the best for your project. Try putting your processor, real-time performance, and budgetary requirements first. These are criteria that you probably cannot change, so you can use them to narrow the possible choices to a dozen or fewer products. Then you can focus on the more detailed technical information.
At this point, many developers make their decision based on compatibility with existing cross-compilers, debuggers, and other development tools. But it’s really up to you to decide what additional features are most important for your project. No matter what you decide, the basic kernel and task mechanisms will be pretty much the same as those described in this chapter. The differences will most likely be measured in processors supported, minimum and maximum memory requirements, availability of add-on software modules (networking protocol stacks, device drivers, and filesystems are common examples), and compatibility with third-party development tools.
Another good set of criteria for RTOS selection can be found in the March 1999 Embedded Systems Programming article “Selecting a Real-Time Operating System,” which can be found online at http://www.embedded.com. The list of vendors might be a bit outdated, but the information is still very useful.
If you would like to dig deeper into the inner workings of real-time operating systems, here are two resources we suggest: MicroC/OS-II: The Real-Time Kernel, by Jean J. Labrosse (CMP Books) and Real-Time Concepts for Embedded Systems, by Qing Li and Caroline Yao (CMP Books).
44.222.82.133