Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 10. Preemptive Run-to-Completion Kernel

Simplicity is the soul of efficiency.
—R. Austin Freeman (in The Eye of Osiris)

In Section 6.3.8 of Chapter 6 I mentioned a perfect match between the active object computing model and a super-simple, run-to-completion (RTC) preemptive kernel. In this chapter I describe such a kernel, called QK, which is part of the QP event-driven platform and is tightly integrated with the QF real-time framework.

I begin this chapter by enumerating good and bad reasons for choosing a preemptive kernel in the first place. I then follow with an introduction to RTC kernels. Next I describe the implementation of the QK preemptive kernel and how it integrates with the QF real-time framework. I then move on to the advanced QK features, such as the priority-ceiling mutex, extended context switch to support various coprocessors (e.g., the 80x87 floating point coprocessor), and thread-local storage (e.g., used in the Newlib standard library). Finally, I describe how to port the QK kernel to various CPUs and compilers. As usual, I illustrate all the features by means of executable examples that you can actually run on any x86-based PC.

Reasons for Choosing a Preemptive Kernel

Before I go into the details of QK, let me make absolutely clear that preemptive multitasking opens up an entirely new dimension of complexity in the design and debugging of the application, to say the least. It's simply much easier to analyze and debug a program in which tasks cannot preempt each other at every instruction and instead can only yield to one other after each RTC step. Allowing task preemptions can lead to a variety of tricky problems, all ultimately routed in resource sharing among tasks. You must be extremely careful because the resource sharing might be more camouflaged than you think. For example, without realizing it, you might be using some nonreentrant code from the standard libraries or other sources. Moreover, preemptive multitasking always costs more in terms of the stack usage (RAM) and CPU cycles for scheduling and context switching than does nonpreemptive scheduling, such as the “vanilla” cooperative kernel (see Section 7.11 in Chapter 7).

When you choose a preemptive kernel, such as QK or any other preemptive RTOS for that matter, I want you to do it for good reasons. Let me begin with the bad reasons for choosing a preemptive kernel. First, in active object computing model you don't need a preemptive kernel for partitioning the problem. Though this is by far the most common rationale for choosing an RTOS in the traditional sequential programming model, it's not a valid reason in a system of active objects. Active objects already divide the original problem, regardless of the underlying kernel or RTOS.

Second, you don't need a preemptive kernel to implement efficient blocking because event-driven systems generally don't block (see Chapter 6). Third, since event-driven systems don't poll or block, the RTC steps tend to be naturally quite short. Therefore, chances are that you can achieve adequate task-level response with the simple nonpreemptive kernel (I discuss the execution profile of a nonpreemptive kernel in Section 7.11 in Chapter 7). Often you can easily improve the task-level response of the vanilla kernel by breaking up long RTC steps into short enough pieces by using the “Reminder” state pattern described in Chapter 5. And finally, you don't need a preemptive kernel to take advantage of the low-power sleep modes of your MCU. As described in Section 7.11.1 in Chapter 7, the cooperative vanilla kernel allows you to use low-power sleep modes safely.

Having said all this, however, I must also say that a preemptive kernel can be a very powerful and an indispensable tool—for a specific class of problems. A preemptive kernel executes higher-priority tasks virtually independently of tasks of lower priority. When a high-priority task needs to run, it simply preempts right away any lower-priority task that might be currently running, so the low-priority processing becomes effectively transparent to all tasks of higher priority. Therefore, a preemptive, priority-based kernel decouples high-priority tasks from the low-priority tasks in the time domain. This unique ability is critical in control-type applications.

In Chapter 1, I mentioned a GPS receiver example in which the hard real-time control loops of GPS signal tracking must execute in parallel to slow, floating point-intensive number crunching, graphic LCD access, and other I/O. This type of application is not well served by a nonpreemptive kernel, in which the task-level response is the longest RTC step in the whole system. It is simply impractical to identify and break up all the low-priority RTC steps into short enough pieces to meet the tight timing constraints of the control loops. Using a nonpreemptive kernel would also result in a fragile design because any change in the low-priority task could impact the timing of high-priority control tasks. In contrast, with a preemptive kernel you can be sure that high-priority processing is virtually insensitive to changes in the low-priority tasks. If you have a control application like that, a preemptive kernel might be actually the simplest, most elegant, and most robust way to design and implement your system.

Note
The choice of a kernel type is really a tradeoff between the coupling in the time domain and sharing of resources. A nonpreemptive kernel permits you to share resources among tasks but couples the tasks in the time domain. A preemptive kernel decouples the tasks in the time domain but is unforgiving for sharing resources among tasks. Under a preemptive kernel, any mechanism that allows you to share resources safely (such as a mutex) introduces some coupling among tasks in the time domain.

Introduction to RTC Kernels

All event-driven systems handle events in discrete RTC steps. Ironically, most conventional RTOSs force programmers to model these simple, one-shot event reactions using tasks structured as continuous endless loops. This serious mismatch is caused by the sequential programming paradigm underlying all traditional blocking kernels (see Section 6.2.2 in Chapter 6).

Though the event-driven, active object computing model can be made to work with a traditional blocking kernel, as described in Section 6.3 in Chapter 6, it really does not use the capabilities of such a kernel efficiently. An active object task structured as an endless event loop (Figure 6.5(B)) blocks really in just one place in the loop and under one condition only—when the event queue is empty. Thus, it is obviously overkill to use sophisticated machinery capable of blocking at any number of places in the task's execution path to block at just one a priori known point.

The event-driven paradigm calls for a different, much simpler, type of truly event-driven, run-to-completion kernel. A kernel of this type breaks entirely with the loop structure of the tasks and instead uses tasks structured as one-shot, discrete, run-to-completion functions, very much like ISRs [Samek+ 06]. In fact, an RTC kernel views interrupts very much like tasks of a “super-high” priority, except that interrupts are prioritized in hardware by the interrupt controller, whereas tasks are prioritized in software by the RTC kernel (see Figure 6.9 in Chapter 6).

Note
The one-shot RTC tasks correspond directly to active objects, as described in Section 6.3.8 in Chapter 6. Therefore, in the following discussion I use the terms active object and task interchangeably.

Preemptive Multitasking with a Single Stack

To be able to efficiently block anywhere in the task code, a conventional real-time kernel maintains relatively complex execution contexts—including separate stack spaces—for each running task, as shown in Figure 6.2 in Chapter 6. Keeping track of the details of these contexts and switching among them requires lots of bookkeeping and sophisticated mechanisms to implement the context switch magic. In contrast, an RTC kernel can be ultra simple because it doesn't need to manage multiple stacks and all the associated bookkeeping.

By requiring that all tasks run to completion and enforcing fixed-priority scheduling, an RTC kernel can instead manage all context information using the machine's natural stack protocol. Whenever a task posts an event to a higher-priority task, an RTC kernel uses a regular C function call to build the higher-priority task context on top of the preempted-task context. Whenever an interrupt preempts a task and the interrupt posts an event to a higher-priority task, the RTC kernel uses the already established interrupt stack frame on top of which to build the higher-priority task context, again using a regular C function call.

This simple form of context management is adequate because every task, just like every ISR, runs to completion. Because the preempting task must also run to completion, the lower-priority context will never be needed until the preempting task (and any higher-priority tasks that might preempt it) has completed and returned—at which time the preempted task will, naturally, be at the top of the stack, ready to be resumed.

At this point, it is interesting to observe that most prioritized interrupt controllers (e.g., the 8259A inside the PC, the AIC in AT91-based ARM MCUs from Atmel, the NVIC in ARM Cortex-M3, and many others) implement in hardware the same asynchronous scheduling policy for interrupts as an RTC kernel implements in software for tasks. In particular, any prioritized interrupt controller allows only higher-priority interrupts to preempt currently serviced interrupt. All interrupts must run to completion and cannot “block.” All interrupts nest on the same stack.

This close similarity should help you understand the operation of an RTC kernel because it is based on exactly the same principle widely used and documented in hardware design (just pick up a datasheet of any aforementioned microprocessors). Also, the similarity further reinforces the symmetry between RTC tasks and interrupts illustrated in Figure 6.9 in Chapter 6.

Nonblocking Kernel

One obvious consequence of the simplistic stack-management policy, and the most severe limitation of an RTC kernel, is that the RTC tasks cannot block. The kernel cannot leave a high-priority task context on the stack and at the same time resume a lower-priority task. The lower-priority task context simply won't be accessible on top of the stack unless the higher-priority task completes. Of course the inability to block disqualifies an RTC kernel for use with the traditional sequential programming paradigm, which is all about blocking and waiting for events at various points in the task's code.

But the inability to block in the middle of an RTC step is not really a problem for event-driven active objects because they don't need to block anyway. In other words, an active object computing model can benefit from the simplicity and excellent performance of an RTC kernel while being insensitive to the limitations of such a kernel.

Synchronous and Asynchronous Preemptions

As a fully preemptive multitasking kernel, an RTC kernel must ensure that at all times the CPU executes the highest-priority task that is ready to run. Fortunately, only two scenarios can lead to readying a higher-priority task:

1 When a lower-priority task posts an event to a higher-priority task, the kernel must immediately suspend the execution of the lower-priority task and start the higher-priority task. This type of preemption is called synchronous preemption because it happens synchronously with posting an event to the task's event queue.
2 When an interrupt posts an event to a higher-priority task than the interrupted task, upon completion of the ISR the kernel must start execution of the higher-priority task instead of resuming the lower-priority task. This type of preemption is called asynchronous preemption because it can happen asynchronously, any time interrupts are not explicitly locked.

Figure 10.1 illustrates the synchronous preemption scenario caused by posting an event from a low-priority task to a high-priority task.

Figure 10.1. Synchronous preemption by a high priority task in an RTC kernel.

(1) The low-priority task is executing.
(2) At some point during normal execution, a low-priority task posts or publishes an event to a high-priority task, thus making it ready to run. Posting an event to a queue engages the scheduler of the RTC kernel.
(3) The scheduler detects that a high-priority task becomes ready to run, so it calls the high-priority task function. Note that the scheduler does not return.
(4) The high-priority task runs, but at some time it too posts an event to the lower-priority task than itself.
(5) Event posting engages the RTC scheduler, but this time the scheduler does not find any higher-priority tasks than the current priority. The scheduler returns to the high-priority task.
(6) The high-priority task runs to completion.
(7) The high-priority task naturally returns to the RTC scheduler invoked at step 2.
(8) The scheduler checks once more for a higher-priority task to start, but it finds none. The RTC scheduler returns to the low-priority task
(9) The low-priority task continues.

Obviously, the synchronous preemption is not limited to only one level. If the high-priority task posts or publishes events to a still higher-priority task in point 5 of Figure 10.1, the high-priority task will be synchronously preempted and the scenario will recursively repeat itself at a higher level of nesting.

Figure 10.2 illustrates the asynchronous preemption scenario caused by an interrupt.

Figure 10.2. Asynchronous preemption by an interrupt and a high-priority task in an RTC kernel.

(1) A low-priority task is executing and interrupts are unlocked.
(2) An asynchronous event interrupts the processor. The interrupt immediately preempts any executing task.
(3) The interrupt service routine (ISR) executes the RTC kernel-specific entry, which saves the priority of the interrupted task into a stack-based variable and raises the current priority of the RTC kernel to the ISR level (above any task). The raising of the current priority informs the RTC kernel that it executes in the ISR context.
(4) The ISR continues to perform some work and, among other things, posts or publishes an event to the high-priority task. Posting an event engages the RTC scheduler, which immediately returns because no task has a higher priority than the current priority.
(5) The ISR continues and finally executes the RTC kernel-specific exit.
(6) The RTC kernel-specific ISR exit sends the end-of-interrupt (EOI¹ The EOI instruction is understood here generically and denotes a specific machine instruction to stop prioritizing the current interrupt nesting level.) instruction to the interrupt controller, restores the saved priority of the interrupted task into the current priority, and invokes the RTC scheduler.
(7) Now the RTC scheduler detects that a high-priority task is ready to run, so it enables interrupts and calls the high-priority task. Note that the RTC scheduler does not return.
(8) The high-priority task runs to completion, unless it also gets interrupted.
(9) After completion, the high-priority task naturally returns to the scheduler, which now executes at a task priority level because the EOI instruction to the interrupt controller issued at step 5 lowered the hardware priority. Note that the system priority is at task level, even though the interrupt return hasn't been executed yet.
(10) The original interrupt executes the interrupt return (IRET² The IRET instruction is understood here generically and denotes a specific machine instruction for returning from an interrupt.) instruction. The IRET restores the context of the low-priority task, which as been asynchronously preempted all that time. Note that the interrupt return matches the interrupt preemption in step 2.
(11) Finally, the low-priority task continues and eventually runs to completion.

It is important to point out that conceptually the interrupt handling ends in the RTC kernel-specific interrupt exit (5), even though the interrupt stack frame still remains on the stack and the IRET instruction has not been executed yet. The interrupt ends because the EOI instruction is issued to the interrupt controller. Before the EOI instruction, the interrupt controller allows only interrupts of higher priority than the currently serviced interrupt. After the EOI instruction followed by the call to the RTC scheduler, the interrupts get unlocked and the interrupt controller allows all interrupt levels, which is exactly the behavior expected at the task level.

Note
Some processor architectures (e.g., ARM Cortex-M3) hardwire the EOI and the IRET instructions together, meaning that EOI cannot be issued independently from IRET. (Note that I treat the instructions EOI and IRET generically in this discussion.) In this case an extra dummy interrupt stack frame must be synthesized, so the EOI/IRET instruction will leave the original interrupt stack frame on the stack. However, such CPU architectures are actually rare, and most processors allow lowering the hardware interrupt priority level without issuing the IRET instruction.

Consequently, the asynchronous preemption is not limited to only one level. The high-priority task runs with interrupts unlocked (Figure 10.2(8)), so it too can be asynchronously preempted by an interrupt, including the same level interrupt as the low-priority task in step 2. If the interrupt posts or publishes events to a still higher-priority task, the high-priority task will be asynchronously preempted and the scenario will recursively repeat itself at a higher level of nesting.

Stack Utilization

Charting the stack utilization over time provides another, complementary view of the synchronous and asynchronous preemption scenarios depicted in Figures 10.1 and 10.2, respectively. To demonstrate the essential behavior, I ignore the irrelevant function calls and other unrelated stack activity.

Figure 10.3 illustrates the stack utilization across the synchronous preemption scenario. The timeline and labels used in Figure 10.3 are identical to those used in Figure 10.1 to allow you to easily correlate these two diagrams.

Figure 10.3. Stack utilization during the synchronous preemption scenario.

(1) Initially, the stack pointer points to the low-priority task stack frame.
(2) At some point during normal execution, a low-priority task posts or publishes an event to a high-priority task, which calls the RTC scheduler. A stack frame of the scheduler is pushed on the stack.
(3) The scheduler detects that a high-priority task becomes ready to run, so it calls the high-priority task. A stack frame of the high-priority task is pushed on the stack.
(4) High-priority task executes and at some point posts an event to the low-priority task.
(5) Event posting engages the RTC scheduler, so another scheduler stack frame is pushed on the stack. The scheduler does not find any higher-priority tasks ready to run, so it immediately returns.
(6) The high-priority task runs to completion.
(7) The high-priority task naturally returns to the RTC scheduler invoked at step 2, so the task's stack frame is popped off the stack.
(8) The scheduler checks once more for a higher-priority task to start, but it finds none, so the RTC scheduler returns to the low-priority task popping off its stack frame.
(9) The low-priority task continues.

Figure 10.4 illustrates the stack utilization during the asynchronous preemption scenario. The time-line and labels used in Figure 10.4 are identical to those used in Figure 10.2 to enable easy correlating of these two diagrams.

Figure 10.4. Stack utilization during the asynchronous preemption scenario.

(1) Initially, the stack pointer points to the low-priority task stack frame.
(2) An asynchronous event interrupts the processor. The interrupt immediately preempts any executing task and the hardware arranges for pushing the interrupt stack frame onto the stack (zigzag arrow). The interrupt service routine (ISR) starts executing and possibly pushes some more context onto the stack (dashed up-arrow). The ISR stack frame is fully built.
(3-5) The ISR runs to completion and executes the RTC kernel-specific exit, which sends the EOI command to the interrupt controller.
(6) The RTC scheduler is called, which pushes its stack frame on the stack.
(7) The scheduler detects that a high-priority task is ready to run, so it enables interrupts and calls the high-priority task. The call to high-priority task function pushes the task's stack frame on the stack.
(8) The high-priority task runs to completion and returns to the scheduler. The return pops the task's function stack frame off the stack.
(9) The scheduler resumes and checks for more high-priority tasks to execute but does not find any and returns popping its stack frame off the stack.
(10) The ISR stack frame gets popped off the stack (the dashed down-arrow). Next the hardware executes the IRET instruction, which causes the interrupt stack frame to pop off the stack.
(11) The interrupt return exposes the preempted low-priority stack, which is now resumed and continues to run.

As you can see, all context (both the interrupt and task contexts) are kept in a single stack. This forces the kernel to be nonblocking. The scheduler can never access anything but the topmost context in the stack. Thus, the scheduler can only choose from two alternatives: launch a new task or resume the topmost task context saved in the stack.

Comparison to Traditional Preemptive Kernels

If you have some experience with traditional preemptive kernels, an RTC kernel will require some getting used to and perhaps rethinking some basic semantics of the “task” and “interrupt” concepts.

Conventional preemptive kernels maintain separate stack spaces for each running task, as explained in Chapter 6. Keeping track of the details of these contexts and switching among them requires a lot of bookkeeping and sophisticated mechanisms to implement the context switch. In general, an ISR stores the interrupt context on one task's stack and restores the context from another task's stack. After restoring the task's context into the CPU registers, the traditional scheduler always issues the IRET³ The IRET instruction is understood here generically and means the instruction that causes the return from interrupt. instruction. The key point is that the interrupt context remains saved on the preempted task's stack, so the saved interrupt context outlives the duration of the interrupt handler. Therefore, defining the duration of an interrupt from saving the interrupt context to restoring the context is problematic.

The situation is not really that much different under an RTC kernel, such as QK. An ISR stores the interrupt context on the stack, which happens to be common for all tasks and interrupts. After some processing, the ISR issues the EOI⁴ The EOI instruction is understood here generically and denotes a specific machine instruction to stop prioritizing the current interrupt nesting level. instruction to the interrupt controller and calls the RTC scheduler. If no higher-priority tasks are ready to run, the scheduler exits immediately, in which case the ISR restores the context from the stack and executes the IRET instruction to return to the original task exactly at the point of preemption. Otherwise, the RTC scheduler unlocks interrupts and calls a higher-priority task. The interrupt context remains saved on the stack, just as in the traditional kernel.

The point here is that the ISR is defined from the time of storing interrupt context to the time of issuing the EOI instruction and enabling interrupts inside the RTC scheduler, not necessarily to the point of restoring the interrupt context via the IRET instruction. This definition is more precise and universal because under any kernel the interrupt context remains stored on one stack or another and typically outlives the duration of an interrupt processing.

Note
The definition of ISR duration is not purely academic but has tangible practical implications. In particular, ROM monitor-based debugging at the ISR level is much more challenging than debugging at the task level. Even though all context nests on the same stack, debugging RTC tasks is as easy as debugging the main() task, because the interrupts are unlocked and the hardware interrupt priority at the interrupt controller level is set to the task level.

By managing all task and interrupt contexts in a single stack, an RTC kernel can run with far less RAM⁵ In one case of a specialized GPS receiver application, an RTC kernel brought almost 80 percent reduction of the stack space compared to a traditional preemptive kernel running the same event-driven application [Montgomery 06]. than a typical blocking kernel. Because tasks don't have private stacks, there is no unused private stack space associated with suspended tasks. Furthermore, a traditional kernel does not distinguish between the synchronous and asynchronous preemptions and makes all preemptions look like the more stack-intensive asynchronous preemptions. Finally, an RTC kernel does not need to maintain the task control blocks (TCBs; see Figure 6.2 in Chapter 6) for each task.

Because of this simplicity, context switches in an RTC kernel (especially the “synchronous preemptions”) can involve much less stack space and CPU overhead than in any traditional kernel. But even the “asynchronous preemptions” in an RTC kernel end up typically using significantly less stack space and fewer CPU cycles. A traditional kernel must typically save all CPU registers in strictly defined order and in “one swoop” onto the private task stack, to be able to restore the registers in an orderly fashion, also in “one swoop.” In contrast, an RTC kernel doesn't really care about the order of registers stored and whether they are stored in “one swoop” or piecemeal. The only relevant aspect is that the CPU state be restored exactly to the previous status, but it's irrelevant how this happens. This means that the basic ISR entry and exit sequences that most embedded C compilers are capable of generating are typically adequate for an RTC kernel while being inadequate for most traditional kernels. The C compiler is in a much better position to optimize interrupt stack frames for specific ISRs by saving only the actually clobbered registers and not saving the preserved registers. In this respect, an RTC kernel can take advantage of the C compiler capabilities whereas a traditional kernel can't.

The last point is perhaps best illustrated by a concrete example. All C compilers for ARM processors (I mean the traditional ARM architecture⁶ The new ARMv7 architecture (e.g., Cortex-M3) saves registers in hardware upon interrupt entry, so a C compiler is not involved. However, even in this case the hardware-generated interrupt stack frame takes into account the APCS because the hardware pushes only the eight clobbered ARM registers on the stack.) adhere to the ARM Procedure Call Standard (APCS) that prescribes which registers must be preserved across a C function call and which can be clobbered. The C compiler-generated ISR entry initially saves only the registers that might be clobbered in a C function, which is less than half of all ARM registers. The rest of the registers get saved later, inside C functions invoked from the ISR, if and only if such registers are actually used. This is an example of a context save occurring “piecemeal,” which is perfectly suitable for an RTC kernel. In contrast, a traditional kernel must save all ARM registers in “one swoop” upon ISR entry, and if an ISR calls C functions (which it typically does), many registers are saved again. Needless to say, such policy requires more RAM for the stacks and more CPU cycles for a context switch (perhaps by a factor of two) than an RTC kernel.

QK Implementation

QK is a lightweight, priority-based, RTC kernel specifically designed to provide preemptive multitasking capabilities to the QF real-time framework. QK is not a standalone kernel but rather is just an add-on to QF, similar to the “vanilla” kernel described in Chapter 7. QK is provided as one of the components of the QP event-driven platform.

In this section, I describe the platform-independent QK source code, whereas I focus on the basic kernel functions, such as keeping track of tasks and interrupts, scheduling, and context switching.

QK Source Code Organization

Listing 10.1 shows the directories and files comprising the QK preemptive kernel in C. The structure of the C++ version is almost identical, except the implementation files have the .cpp extension. The general source code organization of all QP components is described in Section 8.1.3 in Chapter 8.

Listing 10.1. QK source code organization
<qp>qpc - QP/C root directory (<qp>qpcpp for QP/C++)
|
+-include - QP platform-independent header files
| +-qk.h - QK platform-independent interface
| +-. . .
|
+-qk - QK preemptive kernel
| +-source - QK platform-independent source code (*.C files)
| | +-qk_pkg.h - internal, interface for the QK implementation
| | +-qk.c - definition of QK_getVersion() and QActive_start()
| | +-qk_sched.c - definition of QK_schedule_()
| | +-qk_mutex.c - definition of QK_mutexLock()/QK_mutexUnlock()
| | +-qk_ext.c - definition of QK_scheduleExt_()
| |
| +-lint - QK options for lint
| +-opt_qk.lnt - PC-lint options for linting QK
|
+-ports - Platform-specific QP ports
| +-80x86 - Ports to the 80x86 processor
| | +-qk - Ports to the QK preemptive kernel
| | | +-tcpp101 - Ports with the Turbo C++ 1.01 compiler
| | | +-l - Ports using the Large memory model
| | | +-dbg - Debug build
| | | | +-qf.lib – QF library
| | | | +-qep.lib – QEP library
| | | +-rel - Release build
| | | +-spy - Spy build (with software instrumentation)
| | | +-make.bat – batch script for building the QP libraries
| | | +-qep_port.h – QEP platform-dependent include file
| | | +-qf_port.h – QF platform-dependent include file
| | | +-qk_port.h – QK platform-dependent include file
| | | +-qs_port.h – QS platform-dependent include file
| | | +-qp_port.h – QP platform-dependent include file
| +-cortex-m3 - Ports to the Cortex-M3 processor
| | +-qk - Ports to the QK preemptive kernel
| | | +-iar - Ports with the IAR compiler
| |
+-examples - Platform-specific QP examples
| +-80x88 - Examples for the 80x86 processor
| | +-qk - Examples for the QK preemptive kernel
| | | +- . . .
| +-cortex-m3 - Examples for the Cortex-M3 processor
| | +-qk - Examples for the QK preemptive kernel
| | | +- . . .
| +- . . .

The `qk.h` Header File

The qk.h header file, shown in Listing 10.2, integrates the QK kernel with the QF framework. The structure of qk.h closely resembles the vanilla kernel header file qvanilla.h discussed in Section 7.11.2 in Chapter 7. The QK kernel uses many of the same basic building blocks provided in QF. Specifically, the QK kernel uses the native QF active object event queues (see Section 7.8.3 in Chapter 7), the QF native memory pool (see Section 7.9), and the QF priority set (see Section 7.10) to keep track of all active object event queues. Additionally, the central element of the QK design is the current systemwide priority, which is just a byte. Figure 10.5 shows the QK data elements.

Listing 10.2. The QK preemptive kernel interface (<qp>qpcincludeqk.h)
#ifndef qk_h
#define qk_h

Figure 10.5. Data elements used by the QK preemptive kernel.

(1) The QK kernel uses the native QF event queue, so it needs to include the qequeue.h header file.
(2) The QK kernel uses the native QF memory pool, so it needs to include the qmpool.h header file.
(3) The QK kernel uses the native QF priority set, so it needs to include the qpset.h header file.
(4) The global variable QK_readySet_ is a priority set that maintains the global status of all active object event queues, as shown in Figure 10.5. QK_readySet_ is declared as volatile because it can change asynchronously in ISRs.
(5) The global variable QK_currPrio_ represents the global systemwide priority of the currently running task or interrupt. QK_currPrio_ is declared as volatile because it can change asynchronously in ISRs.
(6) The global variable QK_intNest_ represents the global systemwide interrupt nesting level. QK_intNest_ is declared as volatile because it can change asynchronously in ISRs.
(7) The QK kernel uses QEQueue as the event queue for active objects (see also Listing 7.7(8)).
(8) In QK, the QActive data member osObject is used as a bitmask of flags representing various properties of the thread. For example, a bit of osObject bitmask might contain the information whether the thread uses a particular coprocessor. (Refer to Section 10.4.3.)
(9) In QK, the QActive data member thread is used to point to the thread local storage for that thread. (Refer to Section 10.4.2.)
(10) The QK kernel never blocks. The QK scheduler calls QActive_get_() only when it knows for sure that the event queue contains at least one event (see Listing 10.4(22)). Since this is certainty in this type of kernel, the QACTIVE_EQUEUE_WAIT_() macro (see Listing 7.24(2) in Chapter 7) asserts that the event queue is indeed not empty.
(11) The macro QACTIVE_EQUEUE_SIGNAL_() is called from QActive_postFIFO() or QActive_postLIFO() when an event is posted to an empty queue (see Listing 7.25(5) in Chapter 7). Note that the macro is invoked inside a critical section. (Also, because I know exactly the context in which the macro is used, I don't bother surrounding the macro body with the do {…} while(0) loop.)
(12) The active object becomes ready to run, so its priority is inserted into the ready-set QK_readySet_.
(13) This if statement tests the QK interrupt nesting level because if the event posting occurs at the task level, the QK scheduler must be invoked to handle a potential synchronous preemption (see Section 10.2.3). The scheduler is not called from an interrupt, because a task certainly cannot preempt an interrupt.
(14) The QK scheduler is called via the macro QK_SCHEDULE_(), defined in lines (30) or (33), depending on the interrupt-locking policy used.

Note
The QK scheduler is always called from a critical section, that is, with interrupts locked. The scheduler might unlock interrupts internally, but always returns with interrupts locked.

(15) The macro QACTIVE_EQUEUE_ONEMPTY_() is called from QActive_get_() when the queue is becoming empty (see Listing 7.24(12) in Chapter 7). This is exactly when the priority of the active object needs to be removed from the ready-set QK_readySet_ because the active object is no longer ready to run. Note that QACTIVE_EQUEUE_ONEMPTY_() is called from a critical section.
(16-20) The QK kernel uses QMPool as the QF event pool. The platform abstraction layer (PAL) macros are set to access the QMPool operations (see Section 7.9 in Chapter 7).
(21) The QK kernel initialization is invoked from QF_init(). The QK_init() performs CPU-specific initialization and is defined at the QK port level.
(22) The QK idle loop calls the QK_onIdle() callback to give the application a chance to customize the idle processing.
Note
The QK_onIdle() callback is distinctively different from the QF_onIdle() callback used by the cooperative vanilla kernel, because a preemptive kernel handles idle processing differently than a nonpreemptive one. Specifically, the QK_onIdle() callback is always called with interrupts unlocked and does not need to unlock interrupts.
(23) The QK_getVersion() function allows you to obtain the current version of the QK kernel as a constant string “x.y.zz,” where x is the one-digit major number (e.g., 3), y is the one-digit minor number (e.g., 5), and zz is the two-digit release number (e.g., 00).
(24) This typedef defines the QMutex type for the priority-ceiling mutex. I describe the QK mutex implementation in Section 10.4.1.
(25,26) The functions QK_mutexLock() and QK_mutexUnlock() perform mutex locking and unlocking, respectively. Again, I describe them in Section 10.4.1.
(25,26) The QK kernel, just like any other real-time kernel, uses the simplest and most efficient way to protect critical sections of code from disruptions, which is to lock interrupts on entry to the critical section and unlock interrupts again on exit. QK uses the same critical section mechanism as the QF real-time framework, and in fact, QK defines the critical section mechanism for QF in the file qk_port.h. (See Section 7.3 in Chapter 7 for the description of the QF critical section policies and macros.)
(27) As I mentioned at step 13, the QK scheduler is always invoked from a critical section but might need to unlock interrupts internally. Therefore, the signature of the QK scheduler function depends on the interrupt-locking policy used, which is determined by the QF_INT_KEY_TYPE, as described in Section 7.3 in Chapter 7.
(28) When QF_INT_KEY_TYPE is not defined, the simple “unconditional interrupt locking and unlocking” policy is used, in which case the QK scheduler QK_schedule_() takes no parameters.
(29) Similarly, the extended QK scheduler QK_scheduleExt_() takes no parameters. I discuss the extended QK scheduler in the upcoming Section 10.4.3.
(30,33) The macro QK_SCHEDULE_() invokes the QK scheduler hiding the actual interrupt policy used.
(31) When QF_INT_KEY_TYPE is defined, the policy of “saving and restoring interrupt status” is used, in which case the QK scheduler QK_schedule_() takes the interrupt status key as a parameter.
(32) Similarly, he extended QK scheduler QK_scheduleExt_() takes the same interrupt status key as the parameter. I discuss the extended QK scheduler in the upcoming Section 10.4.3.

Interrupt Processing

Interrupt processing is always specific to your particular application, so obviously it cannot be programmed generically in a platform-independent manner. However, handling interrupts is critical to understanding how the QK kernel works, so here I explain it in general terms.

The most important thing you need to understand about interrupt processing under any preemptive kernel, not just QK, is that the kernel must be notified about entering and exiting an interrupt. Specifically, every interrupt must call the QK scheduler upon exit, to give the kernel a chance to handle the asynchronous preemption, as described in Section 10.2.3.

Unlike most conventional preemptive kernels, QK can typically work with interrupt service routines synthesized by the C compiler, which most embedded C cross-compilers support. Listing 10.3 shows the pseudocode for an ISR; Figure 10.6 shows the timeline for executing this code.

Listing 10.3. ISRs in QK; boldface indicates QK-specific interrupt entry and exit
(1) void interrupt YourISR(void) { /* typically entered with interrupts locked */
(2) Clear the interrupt source, if necessary
(3) ++QK_intNest_; /* account for one more interrupt nesting level */
(4) Unlock interrupts (depending on the interrupt policy used)
(5) Execute ISR body, including calling QF services, such as:
Q_NEW(), QActive_postFIFO(), QActive_postLIF(), QF_publish(), or QF_tick()
(6) Lock interrupts, if they were unlocked in step (4)
(7) Send the EOI instruction to the interrupt controller
(8) --QK_intNest_; /* account for one less interrupt nesting level */
(9) if (QK_intNest_ == (uint8_t)0) { /* coming back to the task level? */
(10) QK_schedule_(); /* handle potential asynchronous preemption */
}
}

Timeline of servicing an interrupt and asynchronous preemption in QK. Black rectangles represent code executed with interrupts locked.

Figure 10.6. Timeline of servicing an interrupt and asynchronous preemption in QK. Black rectangles represent code executed with interrupts locked.

(1) An ISR must usually be defined with a special extended keyword (such as “ interrupt” in this case). Typically, an ISR is entered with interrupts locked, but some processor architectures (e.g., ARM Cortex-M3) don't lock interrupts. (Check your device's datasheet.)
(2) If the interrupt source needs clearing, it's best to do it right away.
(3) You need to tell QK that you are servicing an ISR, so that QK won't try to handle preemption at the ISR level. You inform the kernel by incrementing the global interrupt nesting level QK_intNest_. This must be done in a critical section, so if your processor does not lock interrupts automatically upon ISR entry (see line (1)), you need to explicitly lock interrupts before incrementing the nesting level.
(4) Depending on the interrupt-locking policy used (see Section 7.3 in Chapter 7) and when an interrupt controller is present, you might need to unlock the interrupt at this point.
Note
Steps 3 and 4 constitute the QK-specific interrupt entry, and you can encapsulate them in a macro QK_ISR_ENTRY(), as shown in Section 10.5.
(5) Execute the ISR body, including calling the indicated QF services. Note that all these services use critical sections internally. Therefore, if your interrupt-locking policy does not support nesting of critical sections, you must make sure that interrupts are not locked.
(6) You need to lock interrupts if you unlocked them in line (4), because the following code must execute atomically.
(7) You need to send the EOI instruction to the interrupt controller to inform the hardware to stop prioritizing this interrupt level.
(8) The interrupt nesting level QK_intNest_ is decremented to account for leaving the interrupt.
(9) If the interrupt nesting level indicates that the interrupt returns to the task level, as opposed to another interrupt …
(10) The QK scheduler is called to handle potential asynchronous preemption. Note that the scheduler is called with interrupts locked.

Note
Steps 6–10 constitute the QK-specific interrupt exit, and you can encapsulate them in a macro QK_ISR_EXIT(), as shown in Section 10.5.

Figure 10.6 shows the timeline of interrupt servicing and asynchronous preemption under the QK preemptive kernel. I'd like to highlight two interesting points. First, the interrupt response under the QK kernel is as fast as under any other preemptive kernel and is mostly dominated by the longest critical section in the system and how long it takes the hardware to save the interrupt context to the stack. Second, the task-level response of the high-priority task is generally faster than any conventional preemptive kernel because the interrupt context does not need to be restored entirely from the stack and the interrupt return does not need to be executed to start the high-priority task. In the RTC kernel, all this is replaced by a function call, which typically is much faster than restoring the whole register set from the stack and executing the IRET instruction.

The `qk_sched.c` Source File (QK Scheduler)

The source file qk_sched.c implements the QK scheduler, which is the most important part of the QK kernel. As explained in Section 10.2.3, the QK scheduler is called at two junctures: (1) when an event is posted to an event queue of an active object (synchronous preemption), and (2) at the end of ISR processing (asynchronous preemption). In the qk.h header file (Listing 10.2(14)), you saw how the QK scheduler gets invoked to handle the synchronous preemptions. In the previous section, you also saw how the scheduler gets called from an interrupt context to handle the asynchronous preemption. Here, I describe the QK scheduler itself.

The QK scheduler is simply a regular C-function QK_schedule_(), whose job is to efficiently find the highest-priority active object that is ready to run and to execute it, as long as its priority is higher than the currently serviced QK priority. To perform this job, the QK scheduler relies on two data elements: the set of tasks that are ready to run QK_readySet_ (Listing 10.2(4)) and the currently serviced priority QK_currPrio_ (Listing 10.2(5)).

Figure 10.5 shows the relationship between the QK data elements and QF active objects. The variable QK_currPrio_ is an integer of type uint8_t that holds the value of the currently serviced priority level. The QK ready-set QK_readySet_ is of type QPSet64 (see Section 7.10 in Chapter 7), which is capable of representing up to 64 elements numbered 1 through 64. As shown in Figure 10.5, each bit in the QK_readySet_ priority set represents one QF active object. The bit number n in QK_readySet_ is 1 if the event queue of the active object of priority n is not empty. Conversely, bit number m in QK_readySet_ is 0 if the event queue of the active object of priority m is empty or the priority level m is not used. Both variables QK_currPrio_ and QK_readySet_ are always accessed in a critical section to prevent data corruption.

Listing 10.4 shows the complete implementation of the QK_schedule_() function.

Listing 10.4. QK scheduler implementation (<qp>qpcqksourceqk_sched.c)
(1) #include "qk_pkg.h"
/* Public-scope objects -------------------------------------------------*/
(2) QPSet64 volatile QK_readySet_; /* QK ready-set */
/* start with the QK scheduler locked */
(3) uint8_t volatile QK_currPrio_ = (uint8_t)(QF_MAX_ACTIVE + 1);
(4) uint8_t volatile QK_intNest_; /* start with nesting level of 0 */
/*......................................................................*/
/* NOTE: the QK scheduler is entered and exited with interrupts LOCKED */
(5) #ifndef QF_INT_KEY_TYPE
(6) void QK_schedule_(void) {
#else
(7) void QK_schedule_(QF_INT_KEY_TYPE intLockKey_) {
#endif
uint8_t p;
/* the QK scheduler must be called at task level only */
(8) Q_REQUIRE(QK_intNest_ == (uint8_t)0);
(9) if (QPSet64_notEmpty(&QK_readySet_)) {
/* determine the priority of the highest-priority task ready to run */
(10) QPSet64_findMax(&QK_readySet_, p);
(11) if (p > QK_currPrio_) { /* do we have a preemption? */
(12) uint8_t pin = QK_currPrio_; /* save the initial priority */
QActive *a;
(13) #ifdef QK_TLS /* thread-local storage used? */
(14) uint8_t pprev = pin;
#endif
(15) do {
QEvent const *e;
(16) a = QF_active_[p]; /* obtain the pointer to the AO */
(17) QK_currPrio_ = p; /* this becomes the current task priority */
#ifdef QK_TLS /* thread-local storage used? */
(18) if (p != pprev) { /* are we changing threads? */
(19) QK_TLS(a); /* switch new thread-local storage */
(20) pprev = p;
}
#endif
(21) QK_INT_UNLOCK_(); /* unlock the interrupts */
(22) e = QActive_get_(a); /* get the next event for this AO */
(23) QF_ACTIVE_DISPATCH_(&a->super, e); /* dispatch to the AO */
(24) QF_gc(e); /* garbage collect the event, if necessary */
(25) QK_INT_LOCK_();
/* determine the highest-priority AO ready to run */
(26) if (QPSet64_notEmpty(&QK_readySet_)) {
(27) QPSet64_findMax(&QK_readySet_, p);
}
else {
(28) p = (uint8_t)0;
}
(29) } while (p > pin); /* is the new priority higher than initial? */
(30) QK_currPrio_ = pin; /* restore the initial priority */
#ifdef QK_TLS /* thread-local storage used? */
(31) if (pin != (uint8_t)0) { /* no extended context for idle loop */
(32) a = QF_active_[pin];
(33) QK_TLS(a); /* restore the original TLS */
}
#endif
}
(34) }

(1) As every QK source file, the qk_sched.c file includes the wider “package-scope” QK interface qk_pkg.h, located in <qp>qpcqksource. The qk_pkg.h header file includes the platform-specific QK port header file qk_port.h, but it additionally defines some internal macros and objects shared only internally within QK.
(2) The global variable QK_readySet_ is a priority set that maintains the global status of all active object event queues, as shown in Figure 10.5.
(3) The global variable QK_currPrio_ represents the global systemwide priority of the currently running task or interrupt.
(4) The global variable QK_intNest_ represents the global systemwide interrupt nesting level.
(5) The QK scheduler is always invoked with interrupts locked but might need to unlock interrupts internally. Therefore, the signature of the QK scheduler function depends on the interrupt-locking policy used, which is determined by the QF_INT_KEY_TYPE, as described in Section 7.3 in Chapter 7.
(6) When QF_INT_KEY_TYPE is not defined, the simple “unconditional interrupt locking and unlocking” policy is used, in which case the QK scheduler QK_schedule_() takes no parameters.
(7) When QF_INT_KEY_TYPE is defined, the policy of “saving and restoring interrupt status” is used, in which case the QK scheduler QK_schedule_() takes the interrupt status key as the parameter.
(8) The QK scheduler should only be called at the task level.
(9) If the ready-set QK_readySet_ is not empty, the QK kernel has some events to process.
(10) The priority set quickly discovers the highest-priority, not-empty event queue, as I described in Section 7.10 in Chapter 7.
(11) The QK scheduler can preempt the currently running task only when the new priority is higher than the priority of the currently executing task.

Note
The QK scheduler is an indirectly recursive function. The scheduler calls task functions, which might post events to other tasks, which calls the scheduler. However, this recursion can continue only as long as the priority of the tasks keeps increasing. Posting an event to a lower- or equal-priority task (posting to self) stops the recursion because of the if statement in line (11).

Note
Steps 22–24 represent the body of the one-shot, RTC task in QK. Note that the RTC task is executed with interrupts unlocked.

The `qk.c` Source File (QK Startup and Idle Loop)

The qk.c source file, shown in Listing 10.5, defines the QK initialization, cleanup, startup, and idle loop.

Listing 10.5. QK startup and idle loop (<qp>qpcqksourceqk.c)
(1) #include "qk_pkg.h"
#include "qassert.h"
Q_DEFINE_THIS_MODULE(qk)
/*......................................................................*/
void QF_init(void) {
/* nothing to do for the QK preemptive kernel */
(2) QK_init(); /* might be defined in assembly */
}
/*......................................................................*/
void QF_stop(void) {
(3) QF_onCleanup(); /* cleanup callback */
/* nothing else to do for the QK preemptive kernel */
}
/*......................................................................*/
(4) void QF_run(void) {
QK_INT_LOCK_KEY_
QK_INT_LOCK_();
(5) QK_currPrio_ = (uint8_t)0; /* set the priority for the QK idle loop */
(6) QK_SCHEDULE_(); /* process all events produced so far */
QK_INT_UNLOCK_();
(7) QF_onStartup(); /* startup callback */
(8) for (;;) { /* the QK idle loop */
(9) QK_onIdle(); /* invoke the QK on-idle callback */
}
}
/*......................................................................*/
(10) void QActive_start(QActive *me, uint8_t prio,
QEvent const *qSto[], uint32_t qLen,
(11) void *tls,
(12) uint32_t flags,
QEvent const *ie)
{
Q_REQUIRE(((uint8_t)0 < prio) && (prio <= (uint8_t)QF_MAX_ACTIVE));
(13) QEQueue_init(&me->eQueue, qSto, (QEQueueCtr)qLen);
(14) me->prio = prio;
(15) QF_add_(me); /* make QF aware of this active object */
#if defined(QK_TLS) || defined(QK_EXT_SAVE)
(16) me->osObject = (uint8_t)flags; /* osObject contains the thread flags */
(17) me->thread = tls; /* contains the pointer to the thread-local storage */
#else
Q_ASSERT((tls == (void *)0) && (flags == (uint32_t)0));
#endif
(18) QF_ACTIVE_INIT_(&me->super, ie); /* execute initial transition */
}
/*......................................................................*/
void QActive_stop(QActive *me) {
QF_remove_(me); /* remove this active object from the QF */
}

(1) As every QK source file, the qk.c file includes to the wider “package-scope” QK interface qk_pkg.h, located in <qp>qpcqksource. The qk_pkg.h header file includes the platform-specific QK port header file qk_port.h, but it additionally defines some internal macros and objects shared only internally within QK.
(2) The function QF_init() initializes the QF framework and the underlying kernel. In case of the QK kernel, this function has nothing to do, except invoking the QK_init() function to give the QK kernel a chance to initialize. QK_init() is defined in the QK port.
(3) The function QF_stop() stops execution of the QF framework. In case of the QK kernel, this function has nothing to do except invoke the QF_onCleanup() callback function to give the application a chance to clean up and exit to the underlying operating system (e.g., consider QK kernel running on top of DOS). All QF callback functions are summarized in Section 8.1.8 in Chapter 8.
(4) Applications call the function QF_run() from main() to transfer the control to the framework. This function implements the startup and idle loop of the QK kernel.
(5) The current QK priority is reduced from the initial value of QF_MAX_ACTIVE+1 (see Listing 10.4(3)) to zero, which corresponds to the priority of the QK idle loop.

Note
The QK current priority value of QF_MAX_ACTIVE+1 effectively locks the QK scheduler, so the scheduler is not even called upon event posting or exit from ISRs.

Note
When no interrupts are running and all event queues are empty, the QK kernel has nothing to do. The kernel then executes the idle loop. The idle loop is the only “task” structured as an endless loop in QK. The QK priority associated with the idle loop is zero and is the absolute lowest priority level in the system, which is not accessible to the RTC tasks. The task priorities in QK start at 1.

Note
As a preemptive kernel, QK handles idle processing differently than a nonpreemptive vanilla kernel. Specifically, the QK_onIdle() callback is always called with interrupts unlocked and does not need to unlock interrupts (as opposed to the QF_onIdle() callback). Furthermore, a transition to a low-power sleep mode inside QK_onIdle() does not need to occur with interrupts locked. Such a transition is safe and does not cause any race conditions, because a preemptive kernel never switches the context back to the idle loop as long as events are available for processing.

Advanced QK Features

Simple as it is, the QK kernel supports quite advanced features, which you find only in the more sophisticated real-time kernels. In this section I cover mutual exclusion that is robust against priority inversions, thread-local storage useful for thread-safe libraries, and extended context switching to support various coprocessors. If you happen to know how other kernels implement these features, I hope you'll appreciate the simple elegance of the QK implementation.

Note
All advanced QK features covered in this section are only necessary when you must share resources among multiple QK tasks (active objects). If you use strict encapsulation (as advised in Chapter 9) and never share memory, nonreentrant libraries, or coprocessors among active objects, you don't need to use any of these advanced features.

Priority-Ceiling Mutex

QK is a preemptive kernel, and as with all such kernels, you must be very careful with any resource sharing among QK tasks. Ideally, the QF active objects (i.e., QK tasks) should communicate exclusively via events and otherwise should not share any resources, which I have been advocating all along (see Chapter 9). This ideal situation allows you to program all active objects without ever worrying about mutual exclusion mechanisms to protect shared resources.

However, at the cost of increased coupling among active objects, you might choose to share selected resources. If you go this path, you take the burden on yourself to interlock the access to such resources (shared memory or devices).

One powerful method of guaranteeing mutually exclusive access to resources at your disposal is the critical section mechanism implemented with the QF macros QF_INT_LOCK() and QF_INT_UNLOCK(), as described in Section 7.3 in Chapter 7. For very short accesses this might well be the most efficient synchronization mechanism.

However, you can also use a much less intrusive mechanism available in QK. QK supports a priority-ceiling mutex to prevent task-level preemptions while accessing a shared resource. Priority-ceiling mutex is immune to priority inversions [Kalinsky 05] but is a more selective mechanism than interrupt locking because all tasks (and interrupts) of priority higher than the priority ceiling run as usual. Listing 10.6 shows an example of using a QK mutex to protect a shared resource.

Listing 10.6. Protecting a shared resource with a QK priority-ceiling mutex
void your_function(arguments) {
(1) QMutex mutex;
(1) . . .
(2) mutex = QK_mutexLock(PRIO_CEILING);
(3) You can safely access the shared resource here
(4) QK_mutexUnlock(mutex);
(4) . . .
(4) }

(1) You need to provide a temporary mutex variable of type QMutex (which is just a byte).
(2) You lock the mutex by calling QK_mutexLock(). This function requires the priority ceiling parameter, which you choose to be the priority of the highest-priority task that may use the shared resource you want to protect. Typically, this priority is known at compile time because all QK tasks have fixed priorities assigned statically usually at system startup.
(3) You access the shared resource.
(4) You unlock the mutex by calling QK_mutexUnlock().

As you can see, the mutex variables are used only temporarily, and there is no limitation on how many mutexes you can use in your application. In principle, mutex locks can even nest, so your code in Listing 10.6(3) could use another priority-ceiling mutex. Note that I mention this only as a theoretical possibility, not necessarily as a good or recommended design.

Before explaining how the QK protects the resource and why it is a nonblocking mechanism, I simply show in Listing 10.7 how it is implemented.

Listing 10.7. QK mutex (<qp>qpcqksourceqk_mutex.c)
QMutex QK_mutexLock(uint8_t prioCeiling) {
uint8_t mutex;
QK_INT_LOCK_KEY_
QK_INT_LOCK_();
(1) mutex = QK_currPrio_; /* the original QK priority to return */
(2) if (QK_currPrio_ < prioCeiling) {
(3) QK_currPrio_ = prioCeiling; /* raise the QK priority */
}
QK_INT_UNLOCK_();
(4) return mutex;
}
/*..............................................................*/
void QK_mutexUnlock(QMutex mutex) {
QK_INT_LOCK_KEY_
QK_INT_LOCK_();
(5) if (QK_currPrio_ > mutex) {
(6) QK_currPrio_ = mutex; /* restore the saved priority */
(7) QK_SCHEDULE_();
}
QK_INT_UNLOCK_();
}

(1) Inside a critical section, the current QK priority is saved in the temporary variable mutex to be returned from the QK_mutexLock().
(2) If the priority ceiling provided as the function argument exceeds the current QK priority …
(3) The current QK priority is raised to the priority ceiling.
(4) The original QK priority is returned to the caller.
(5) Inside a critical section, the current QK priority is compared to the mutex argument.
(6) If the current priority exceeds the mutex, the current QK priority is reduced to the level of the mutex.
(7) Reducing the current QK priority might “expose” some ready-to-run tasks that have a higher priority than the reduced QK_currPrio_ level. The QK scheduler is called to process these potential synchronous preemptions. Note that the scheduler is called with interrupts locked.

As you can see, locking the mutex boils down to raising the current QK priority to the priority ceiling level. Recall that the QK scheduler can only launch tasks with priorities higher than the initial priority with which the scheduler was entered (Listing 10.4(11)). This means that temporarily increasing the current QK priority prevents preemptions from all tasks with priorities lower than or equal to the priority ceiling. This is exactly what a priority ceiling mutex is supposed to do to protect your resource.

Note that the QK mutex is a nonblocking mechanism. If a task that needs to protect a shared resource is running at all, it means that all tasks of higher priority have no events to process. Consequently, simply preventing launch of higher-priority tasks that might access the resource is sufficient to guarantee the mutually exclusive access to the resource. Of course, you don't need to worry about any lower-priority tasks that might be preempted because they never resume until the current task runs to completion.

Thread-Local Storage

Thread-local storage (TLS) is a mechanism by which variables are allocated such that there is one instance of the variable per extant thread. The canonical example of when TLS could be useful is the popular Newlib⁷ www.sourceware.org/newlib/ standard C runtime library intended for use in embedded devices. Newlib's facilities are reentrant, but only when properly integrated into a multithreaded environment [Gatliff 01]. Because QK is a preemptive kernel, care must be taken to preserve the reentrant character of Newlib.

For example, consider the errno facility specified in the ANSI C standard. The runtime library sets errno when an error occurs within the library. Once set, errno's value persists until the application clears it, which simplifies error notification by the library but can create reentrancy problems when multiple threads are using the library at the same time. If errno would be just a single global variable shared among all threads, neither thread would know who generated the error.

Newlib addresses this problem by redefining errno as a macro that (indirectly) references a global pointer called _impure_ptr (see Figure 10.7). The Newlib's _impure_ptr points to a structure of type struct _reent. This structure contains the traditional errno value specified by ANSI, but it also contains a lot of other elements, including signal handler pointers and file handles for standard input, output, and error streams.

Figure 10.7. Pointer to thread-local storage (TLS) switched around by the kernel.

The central idea of the Newlib design is that every thread in the application has its own copy of the _reent structure (shown as TLS in Figure 10.7) and that the _impure_ptr pointer is switched during context switches to always point at the _reent structure of the currently active thread. Obviously, to perform the switching of the _impure_ptr, you need a helping hand from the kernel.

QK supports the TLS concept by providing a context-switch hook QK_TLS(), which is invoked every time a different task priority is processed (see Listing 10.4(19,33)). The macro QK_TLS() receives from the kernel a pointer to the current active object. The following code fragment from qk_port.h defines the macro QK_TLS() for re-assigning the Newlib's _impure_ptr during context switches:

#define QK_TLS(act_) (_impure_ptr = (struct _reent *)(act_)->thread)

Though the QK_TLS() macro will switch the _impure_ptr automatically, you are responsible for allocating the _reent structure in each active object. You also need to tell QK where the TLS is for every active object during startup by passing the pointer to the TLS as the fifth parameter of the QActive_start() function (see Listing 10.5(17)).

Note
The current implementation of the TLS in QK assumes that the thread-local storage is accessed neither in the ISRs nor in the idle loop (inside the QK_onIdle() callback function).

The TLS support in QK is generic and allows you to handle any number of libraries like Newlib. In the upcoming Section 10.6, I provide the dining philosophers application example for QK, which demonstrates the switching of two “impure pointers” for two hypothetical reentrant libraries.

Extended Context Switch (Coprocessor Support)

The C compiler-generated context save and restore for interrupts typically includes only the CPU core registers but does not include the registers of various coprocessors, such as floating-point coprocessors, specialized DSP engines, dedicated baseband processors, video accelerators, or other specialized coprocessors (perhaps implemented in FPGAs) that surround the CPU core. This ever-growing conglomerate of complex register architectures extends far beyond the core CPU registers, which poses a problem for a preemptive kernel if the various coprocessors are used by multiple tasks. The solution offered by advanced preemptive kernels is to include the various coprocessor registers in the context switch process, thus allowing sharing of the coprocessors among multiple tasks.

The QK kernel supports such extended context switch in a generic way, which you can easily customize for various coprocessors and hardware accelerators. The QK design of the extended context switch carefully minimizes the added overhead by saving and restoring the extended context only when necessary. The basic simplifying assumption is that neither ISRs, nor the QK idle loop use the coprocessor(s). Consequently, the extended context needs to be preserved only when a task preempts another task. Moreover, synchronous context switches generally don't need to be extended, because the context switch is in this case just a simple function call (see Section 10.2.3), which cannot happen in the middle of accessing a coprocessor.

This leaves only the asynchronous preemptions as really requiring the extended context switch. As described in Section 10.3.3, asynchronous preemptions are handled upon the exit from interrupts in the QK_ISR_EXIT() macro. Listing 10.8 shows pseudocode of the QK_ISR_EXIT() macro, which calls the extended scheduler QK_scheduleExt_() instead of QK_scheduler_(), as shown in Listing 10.3(10).

Listing 10.8. QK_ISR_EXIT() macro with the extended context switch
#define QK_ISR_EXIT() do {
Lock interrupts
Send the EOI instruction to the interrupt controller
--QK_intNest_;
if (QK_intNest_ == 0) {
QK_scheduleExt_();
}
} while (0)

Figure 10.8 shows the additional extended context save and restore steps implemented in the extended scheduler QK_scheduleExt_(). The per-active object extended context is simply added to the TLS area, which is accessible via the thread data member of the QActive class.

Figure 10.8. Extended context switch saves and restores coprocessor registers in the TLS area.

Listing 10.9 shows the extended scheduler QK_scheduleExt_(). In the explanation section following this listing, I describe only the highlighted differences from the regular scheduler QK_scheduler_(), which I already explained in Listing 10.4.

Listing 10.9. QK extended scheduler implementation (<qp>qpcqksourceqk_ext.c)
#ifndef QF_INT_KEY_TYPE
(1) void QK_scheduleExt_(void) {
#else
(2) void QK_scheduleExt_(QF_INT_KEY_TYPE intLockKey_) {
#endif
uint8_t p;
/* the QK scheduler must be called at task level only */
Q_REQUIRE(QK_intNest_ == (uint8_t)0);
if (QPSet64_notEmpty(&QK_readySet_)) {
/* determine the priority of the highest-priority task ready to run */
QPSet64_findMax(&QK_readySet_, p);
if (p > QK_currPrio_) { /* do we have a preemption? */
uint8_t pin = QK_currPrio_; /* save the initial priority */
QActive *a;
#ifdef QK_TLS /* thread-local storage used? */
uint8_t pprev = pin;
#endif
(3) #ifdef QK_EXT_SAVE /* extended context-switch used? */
(4) if (pin != (uint8_t)0) {/*no extended context for the idle loop */
(5) a = QF_active_[pin]; /* the pointer to the preempted AO */
(6) QK_EXT_SAVE(a); /* save the extended context */
}
#endif
do {
QEvent const *e;
a = QF_active_[ p]; /* obtain the pointer to the AO */
QK_currPrio_ = p; /* this becomes the current task priority */
#ifdef QK_TLS /* thread-local storage used? */
if (p != pprev) { /* are we changing threads? */
QK_TLS(a); /* switch new thread-local storage */
pprev = p;
}
#endif
QK_INT_UNLOCK_(); /* unlock the interrupts */
e = QActive_get_(a); /* get the next event for this AO */
QF_ACTIVE_DISPATCH_(&a->super, e); /* dispatch to the AO */
QF_gc(e); /* garbage collect the event, if necessary */
QK_INT_LOCK_();
/* determine the highest-priority AO ready to run */
if (QPSet64_notEmpty(&QK_readySet_)) {
QPSet64_findMax(&QK_readySet_, p);
}
else {
p = (uint8_t)0;
}
} while (p > pin); /* is the new priority higher than initial? */
QK_currPrio_ = pin; /* restore the initial priority */
(7) #if defined(QK_TLS) || defined(QK_EXT_RESTORE)
(8) if (pin != (uint8_t)0) {/*no extended context for the idle loop */
(9) a = QF_active_[pin]; /* the pointer to the preempted AO */
#ifdef QK_TLS /* thread-local storage used? */
QK_TLS(a); /* restore the original TLS */
#endif
#ifdef QK_EXT_RESTORE /* extended context-switch used? */
(10) QK_EXT_RESTORE(a); /* restore the extended context */
#endif
}
#endif
}
}
}

(1,2) The signature of the extended scheduler depends on the interrupt locking policy used, just like the regular scheduler.
(3) If the macro QK_EXT_SAVE() is defined, the extended scheduler invokes the macro to save the extended context.
(4) The extended context needs to be saved only if a task has been preempted. The priority ‘pin’ of zero corresponds to the QK idle loop. I assume that the idle loop does not to use the coprocessor(s).

Note
The idle loop does not correspond to an active object, so it does not have the TLS memory area to save the extended context.

The QK_EXT_SAVE() and QK_EXT_RESTORE() macros allow you to save and restore as many coprocessor contexts as necessary for a given task. As shown in Figure 10.8, you need to provide per-task memory for all the extended contexts that you use. In the next section, I describe the QK port to 80x86 with the 80x87 floating point coprocessor (FPU) and I provide examples of the macros QK_EXT_SAVE() and QK_EXT_RESTORE() for the 80x87 FPU.

Porting QK

When you use QF with the QK preemptive kernel, you don't need to port the QF framework to the kernel because QF and QK are already integrated. However, you still need to port the QK kernel to the target CPU and compiler that you are using. Fortunately, this is quite easy due to the simplistic nature of the QK kernel. All you need to provide is the compiler-specific exact-width integer types in qep_porth.h, configure QF in qf_port.h, and finally provide the interrupt-locking policy and interrupt entry/exit in qk_port.h. You often don't need to write any platform-specific QK source files because most of the time QK can work with the ISRs generated by the C compiler.

Note that the preemptive QK kernel puts more demands on the target CPU and the compiler than the simple vanilla kernel described in Chapter 7. Generally, QK can be ported to a processor and compiler, if they satisfy the following requirements:

1 The processor supports a hardware stack that can accommodate a fair amount of data (at least 256 bytes or more).
2 The C or C++ compiler can generate reentrant code. In particular, the compiler must be able to allocate automatic variables on the stack.
3 Interrupts can be locked and unlocked from C.
4 The system provides a clock tick interrupt (typically 10 to 100Hz).

For example, some older CPU architectures, such as the 8-bit PIC microcontrollers, don't have a C-friendly stack architecture and consequently cannot easily run QK. Note, however, that in most cases you can use the nonpreemptive vanilla kernel.

In this section I show an example of QK kernel port to 80x86 CPU under DOS, with the legacy Turbo C++ 1.01 compiler configured to generate code for “large” memory model. The port will also demonstrate the advanced features, such as thread-local storage, and extended context switch for the 80x87 FPU. This port is located in <qp>qpcports80x86qk cpp101l.

The `qep_port.h` Header File

Listing 10.10 shows the qep_port.h header file for 80x86/QK/Turbo C++ 1.01/Large memory model. The legacy Turbo C++ 1.01 is a prestandard compiler, so I typedef the six platform-specific exact-with integer types used in QP.

Listing 10.10. The qep_port.h header file for 80x86/QK/Turbo C++ 1.01/Large memory model
#ifndef qep_port_h
#define qep_port_h
/* Exact-width integer types for DOS/Turbo C++ 1.01/Large memory model */
typedef signed char int8_t;
typedef signed int int16_t;
typedef signed long int32_t;
typedef unsigned char uint8_t;
typedef unsigned int uint16_t;
typedef unsigned long uint32_t;
#include "qep.h" /* QEP platform-independent public interface */
#endif /* qep_port_h */

The `qf_port.h` Header File

Listing 10.11 shows the qf_port.h header file for 80x86/QK/Turbo C++ 1.01/Large memory model. You always need to configure the maximum number of active objects QF_MAX_ACTIVE and you need to include qep_port.h, qk_porth.h, and qf.h header files.

Listing 10.11. The qf_port.h header file for 80x86/QK/Turbo C++ 1.01/Large memory model
#ifndef qf_port_h
#define qf_port_h
/* The maximum number of active objects in the application */
#define QF_MAX_ACTIVE 63
#include "qep_port.h" /* QEP port */
#include "qk_port.h" /* QK port */
#include "qf.h" /* QF platform-independent interface */
#endif /* qf_port_h */

The `qk_port.h` Header File

The actual porting of QK to the CPU/Compiler of your choice happens in the qk_port.h header file. The first porting decision you need to make is the policy for locking and unlocking interrupts. To make this decision correctly, you need to learn a bit about your target CPU and the compiler to find out the most efficient way of enabling and disabling interrupts from C or C++. Generally, your first choice should be the safe policy of “saving and restoring the interrupt status” (Section 7.3.1 in Chapter 7). However, if you find out that it is safe to unlock interrupts inside ISRs because your target system can prioritize interrupts in hardware, you can use the simple and fast policy of “unconditional interrupt unlocking” (Section 7.3.2 in Chapter 7). With the fast policy you must always make sure that QF functions are invoked with interrupts unlocked, or more generally, that critical sections don't nest.

The next decision, related to the first, is the QK-specific interrupt entry and exit. Again, you need find out whether your CPU enters ISRs with interrupts locked or unlocked (most CPUs lock interrupts before vectoring to ISRs). If you decided to use the fast interrupt-locking policy, you must unlock interrupts in QK_ISR_ENTRY() and lock them again in QK_ISR_EXIT() to avoid nesting of critical sections when you call any QF services. If your system has an interrupt controller, you might decide to unlock interrupts inside ISRs even if you're using the safe policy of “saving and restoring interrupt context.” I would generally recommend leaving interrupts locked throughout the whole ISR on systems that don't have interrupt controllers. Obviously, in the latter case you must be using the safe policy of “saving and restoring interrupt context,” because most QF services that you call from ISRs use a critical section internally.

Finally, you need to customize the advanced features, such as the TLS and the extended context switch, if you plan to use them in your applications. Here, you need to find out which libraries require TLS support (e.g., Newlib). You also need to find what kind of coprocessors you want to support and how to save and restore their registers from C.

Listing 10.12 shows the qk_port.h header file for 80x86/QK/Turbo C++ 1.01/Large memory model. I decided to use the simple “unconditional interrupt unlocking” policy because the standard PC is equipped with the external 8259A Programmable Interrupt Controller (PIC) and the Turbo C++ 1.01 compiler provides the pair of functions disable() and enable(), to unconditionally lock and unlock interrupts, respectively. With this simple interrupt-locking policy, I must unlock interrupts in QK_ISR_ENTRY() and lock them again in QK_ISR_EXIT(). I also use the 80x87 floating point coprocessor (FPU) and two libraries that require TLS support.

Listing 10.12. The qk_port.h header file for 80x86/QK/Turbo C++ 1.01/Large memory model
#ifndef qk_port_h
#define qk_port_h
/* QF critical section entry/exit */
(1) /* QF_INT_KEY_TYPE not defined */
(2) #define QF_INT_LOCK(dummy) disable()
(3) #define QF_INT_UNLOCK(dummy) enable()
/* QK-specific ISR entry and exit */
(4) #define QK_ISR_ENTRY() do {
++QK_intNest_;
(5) enable();
} while (0)
(6) #define QK_ISR_EXIT() do {
(7) disable();
(8) outportb(0x20, 0x20);
- -QK_intNest_;
if (QK_intNest_ == 0) {
(9) QK_scheduleExt_();
}
} while (0)
/* demonstration of advanced QK features: TLS and extended context switch */
(10) typedef struct Lib1_contextTag { /* an example of a library context */
double x;
} Lib1_context;
(11) extern Lib1_context * volatile impure_ptr1;
(12) typedef struct Lib2_contextTag { /* an example of a library context */
double y;
} Lib2_context;
(13) extern Lib2_context * volatile impure_ptr2;
(14) typedef union FPU_contextTag {
uint32_t align;
(15) uint8_t x87 [108]; /* the x87 FPU context takes 108-bytes */
} FPU_context;
(16) typedef struct ThreadContextTag {
Lib1_context lib1; /* library1 context */
Lib2_context lib2; /* library2 context */
FPU_context fpu; /* the FPU context */
} ThreadContext;
(17) enum QKTaskFlags {
QK_LIB1_THREAD = 0x01,
QK_LIB2_THREAD = 0x02,
QK_FPU_THREAD = 0x04
};
/* QK thread-local storage */
(18) #define QK_TLS(act_)
(19) impure_ptr1 = &((ThreadContext *)(act_)->thread)->lib1;
(20) impure_ptr2 = &((ThreadContext *)(act_)->thread)->lib2
/* QK extended context (FPU) save/restore */
(21) #define QK_EXT_SAVE(act_)
(22) if (((act_)->osObject & QK_FPU_THREAD) != 0)
(23) FPU_save(&((ThreadContext *)(act_)->thread)->fpu)
(24) #define QK_EXT_RESTORE(act_)
(25) if (((act_)->osObject & QK_FPU_THREAD) != 0)
(26) FPU_restore(&((ThreadContext *)(act_)->thread)->fpu)
(27) void FPU_save (FPU_context *fpu); /* defined in assembly */
(28) void FPU_restore(FPU_context *fpu); /* defined in assembly */
#include <dos.h> /* see NOTE01 */
#undef outportb /*don't use the macro because it has a bug in Turbo C++ 1.01*/
(29) #include "qk.h" /* QK platform-independent public interface */
(30) #include "qf.h" /* QF platform-independent public interface */
#endif /* qk_port_h */

(1) The macro QF_INT_KEY_TYPE is not defined, meaning that the fast policy of “unconditional interrupt saving and restoring” is used. This is possible because the standard PC is equipped with the 8259A Programmable Interrupt Controller (PIC), which allows unlocking interrupts inside ISRs.
(2) The macro QF_INT_LOCK() is defined as the Turbo C++ function disable().
(3) The macro QF_INT_UNLOCK() is defined as the Turbo C++ function enable().
(4) As described in Section 10.3.3, the macro QK_ISR_ENTRY() is called upon the entry to every ISR.
(5) The 80x86 CPU enters ISRs with interrupts locked. However, interrupts must be unlocked before any QF or QK service can be used in the ISR body, because the fast interrupt-locking policy does not support nesting critical sections.
(6) As described in Section 10.3.3, the macro QK_ISR_EXIT() is called upon exit from every ISR.
(7) The interrupts unlocked upon entry must be locked again to prevent corruption of the QK variables.
(8) This output statement writes the EOI instruction to the master 8259A interrupt controller.
(9) As described in Section 10.3.3, the QK scheduler must be called at the exit from every interrupt to handle the asynchronous preemption. Here I use the extended QK scheduler because this port supports the extended context switch for the 80x87 FPU.

Note
If you don't define the macros QK_EXT_SAVE() and QK_EXT_RESTORE(), the extended QK scheduler is equivalent to the regular scheduler QK_schedule_(). You can always use the extended scheduler in the QK_ISR_EXIT() macro without any performance penalty, but if you want to save a little code space, you might want to use the regular scheduler.

Saving and Restoring FPU Context

The functions FPU_save() and FPU_restore(), declared in Listing 10.12(27,28), are part of the QK port. They are defined in the assembly file <qp>qpcports80x86qk cpp101lsrcfpu.asm. Both these functions are just shells for executing the 80x87 machine instructions FSAVE and FRSTOR, respectively.

Testing the QK Port

As usual, I use the dining philosopher problem (DPP) application to test the port. For QK, I have extended the basic DPP application discussed in Chapter 9 to demonstrate and test the advanced QK features, such as the priority-ceiling mutex, thread-local storage for multiple reentrant libraries, and the extended context switch for the 80x87 FPU. The DPP application for QK is located in the directory <qp>qpcexamples80x86qk cpp101ldpp.

Asynchronous Preemption Demonstration

As it turns out, an interesting asynchronous preemption is not that easy to observe in the DPP application. By an interesting preemption, I mean a task asynchronously preempting another task, as opposed to simply a task asynchronously preempting the idle loop. Figure 10.9 illustrates why. The DPP application is mostly driven by the system clock tick interrupt (ISR_tmr), which posts the time events to the Philosopher active objects. Typically, the interrupts and state machines execute so quickly that all processing happens very close to the clock tick and the CPU goes quickly back to executing the QK idle loop. With the code executing so fast, the ISR_tmr() has no chance to actually preempt any QK task, just the idle loop. Consequently an asynchronous preemption cannot happen.

Figure 10.9. Execution profile of the DPP application with QK.

Therefore, to increase the odds of asynchronous preemptions in this application, I have added the second interrupt (the ISR_kbd triggered by the keyboard), which is asynchronous with respect to the clock tick and always posts an event to the Table active object. I also added some artificial CPU loading in the form of various busy-wait functions called from to the state machines and interrupts (I show examples of these functions later in this section). Finally, I've instrumented the ISRs to report preemptions caused by interrupts to the screen.

Since I cannot foresee the speed of your CPU, I have provided a command-line parameter to the DPP application that determines the delay incurred by the various busy-wait functions. On my 2GHz PC, I've been using the value of 100 iterations, which allowed me to easily catch several asynchronous preemptions. You should be careful not to go overboard with this parameter, though, because you can overload the CPU, or more scientifically stated, you can create an unschedulable set of tasks. In this case, QF will eventually overflow an event queue and assert.

With all this scaffolding, you actually have a chance to observe an interesting asynchronous preemption, such as the instance shown in Figure 10.9(1). You need to run the DPP application (with the command-line parameter of 100 or so). As explained before, you will never get asynchronous preemptions unless you start typing on the keyboard. When the keyboard interrupt happens to come close enough after the clock tick, it might just manage to preempt one of the philosopher tasks. The keyboard ISR always posts an event to the Table object, and because Table has the highest priority in the system, upon the exit from ISR_kbd(), QK performs an asynchronous context switch to the Table active object. When this happens, you'll see that the preemption counter for one of the Philosopher tasks will increment on the screen (see Figure 10.10).

Figure 10.10. DPP application with QK running in a DOS console.

If you want to examine an asynchronous preemption closer, you can use the debugger built into the Turbo C++ IDE. You load the project DPP-DBG.PRJ (located in <qp>qpcexamples80x86qk cpp101ldpp) into the IDE and open the file qk_ext.c that I have specifically added to this project, even though it is already included in the qk.lib library. You set a breakpoint inside QK_scheduleExt_() indicated as label (1) in Figure 10.11 (see also Listing 10.12(9)). Next, select Run | Arguments… to define a command-line argument around 100. Now you can run the program. When you start typing on the keyboard, eventually you should hit the breakpoint (asynchronous preemption). You can step through the code from there. Figure 10.11 shows an example of my debug session.

Figure 10.11. Asynchronous preemption examined in the Turbo C++ debugger.

(1) The original breakpoint is set at the instruction that is only executed when a task preempts another task, but not the idle loop (inside QK_scheduleExt_()).
(2) When you step into the QF_ACTIVE_DISPATCH_() macro, you get inside the function QHsm_dispatch(). Keep stepping (F7) until you reach the highlighted line. This line calls the current state-handler function of the high-priority active object.
(3) When you step into again, you end up inside the Table active object.
(4) The Call Stack window shows the call tree information, which is a nice byproduct of QK using just a single stack. You see that the extended QK scheduler is engaged and detects preemption, so it calls QHsm_dispatch() for the Table active object. Finally, the Table_serving() state handler processes the TEST event.

Note
This example should convince you that debugging QK tasks is straightforward, even though they nest on the interrupt stack frame, as shown in the Call Stack window (see Section 10.2.5). In contrast, debugging ISRs is hard because the Turbo C++ debugger freezes when interrupts are locked at the CPU or the 8259A PIC level.

Priority-Ceiling Mutex Demonstration

To demonstrate the QK priority-ceiling mutex, I've extended the Philosopher active object to allow random think and eat timeouts for philosophers rather than fixed timeouts used in the basic implementation. To implement the feature, I use the pseudorandom number (PRN) generator provided in Turbo C++ 1.01 ( random()). This generator, like most PRN generators, is not reentrant because it must preserve its state from one call to the next. To prevent corruption of this internal state, I protect the generator with the QK mutex, as shown in Listing 10.13.

Listing 10.13. Protecting PRN generator with the priority-ceiling mutex (file <qp>qpcexamples80x86qk cpp101ldppphilo.c)
QState Philo_thinking(Philo *me, QEvent const *e) {
switch (e->sig) {
case Q_ENTRY_SIG: {
QTimeEvtCtr think_time;
QMutex mutex;
mutex = QK_mutexLock(N_PHILO);
think_time = (QTimeEvtCtr)(random(THINK_TIME) + 1);
QK_mutexUnlock(mutex);
QTimeEvt_postIn(&me->timeEvt, (QActive *)me, think_time);
return (QState)0;
}
. . .
}
return (QState)&QHsm_top;
}

Note that I use the number of philosophers N_PHILO as the priority ceiling to lock the mutex. This ceiling corresponds to the highest-priority Philosopher active object that can access the PRN generator (Philosopher active objects have priorities 1..N_PHILO). Since the priority of the Table active object (N_PHILO + 1) is above the ceiling, the mutex does not affect Table, which is exactly what I wanted to achieve. Table does not use the resource (PRN generator in this case), so it should not be affected by the mutex.

TLS Demonstration

In the QK port header file qk_port.h (Listing 10.12), you saw the two contexts of two hypothetical libraries lib1 and lib2, which need TLS support in the same way as Newlib does. The QK port implemented the switching of the two “impure pointers” in the macro QK_TLS (Listing 10.12(18)). At the application level, I need to add the contexts to the active objects and I need to inform QK where these TLS contexts are located. I also need to call the library functions so that they are shared among all Philosopher active objects. Listing 10.14 shows these steps.

Listing 10.14. Incorporating the TLS context inside active objects (file <qp>qpcexamples80x86qk cpp101ldppphilo.c)
typedef struct PhiloTag {
QActive super; /* derives from the QActive base class */
(1) ThreadContext context; /* thread context */
QTimeEvt timeEvt; /* for timing out thining or eating */
} Philo;
. . .
/*.......................................................................*/
void Philo_start(uint8_t n,
uint8_t p, QEvent const *qSto[], uint32_t qLen)
{
Philo *me = &l_philo [ n];
Philo_ctor(me); /* instantiate */
(2) impure_ptr1 = &me->context.lib1; /* initialize reentrant library1 */
(3) lib1_reent_init(p);
(4) impure_ptr2 = &me->context.lib2; /* initialize reentrant library2 */
(5) lib2_reent_init(p);
QActive_start((QActive *)me, p, qSto, qLen,
(6) &me->context,
(7) (uint8_t)(QK_LIB1_THREAD | QK_LIB2_THREAD | QK_FPU_THREAD),
(QEvent *)0);
}
QState Philo_thinking(Philo *me, QEvent const *e) {
switch (e->sig) {
. . .
case TIMEOUT_SIG: {
(8) lib1_test();
(9) lib2_test();
return (QState)0;
}
. . .
}
return Q_SUPER(&QHsm_top);
}
/*.......................................................................*/
QState Philo_hungry(Philo *me, QEvent const *e) {
switch (e->sig) {
. . .
case EAT_SIG: {
if (((TableEvt const *)e)->philoNum == PHILO_ID(me)) {
(10) lib1_test();
(11) lib2_test();
return (QState)0;
}
break;
}
. . .
}
return Q_SUPER(&QHsm_top);
}
/*.......................................................................*/
QState Philo_eating(Philo *me, QEvent const *e) {
switch (e->sig) {
. . .
case TIMEOUT_SIG: {
(12) lib1_test();
(13) lib2_test();
return Q_TRAN(&Philo_thinking);
}
. . .
}
return Q_SUPER(&QHsm_top);
}

(1) I place the data member context of ThreadContext type (defined in Listing 10.12(16)) directly inside the Philo class. That way I can be sure that every Philo object has the private ThreadContext area.
(2) Upon the Philo active object initialization, I aim the “impure pointer” of library lib1 at the TLS context for this library.
(3) I then let the library initialize the context.
(4,5) I repeat the same two steps for the library lib2.
(6) I pass the pointer to the TLS as the fifth parameter to the QActive_start() function, to inform QK about the location of TLS for each active object (see also Listing 10.5(17)).
(7) I set the thread attributes to inform QK that this active object uses library1, library2, and the FPU.
(8-13) I pepper the Philo state machine with the calls to the libraries. These calls take a long time to run and provide the CPU loading that was necessary to test asynchronous preemptions.

Note
I made similar changes to the Table active object, so that it too shares the libraries with all Philosophers.

Finally, I need to define the library functions lib1_test() and lib2_test() that actually use the “impure pointers.” Listing 10.15 shows the test code.

Listing 10.15. Using the TLS context inside the libraries (file <qp>qpcexamples80x86qk cpp101ldppsp.c)
#include <math.h>
. . .
/*-----------------------------------------------------------------------*/
void lib1_reent_init(uint8_t prio) {
(1) impure_ptr1->x = (double)prio * (M_PI / 6.0);
}
/*.......................................................................*/
void lib1_test(void) {
(2) uint32_t volatile i = l_delay;
(3) while (i-->OUL) {
(4) volatile double r = sin(impure_ptr1->x) * sin(impure_ptr1->x)
+ cos(impure_ptr1->>x) * cos(impure_ptr1->x);
(5) Q_ASSERT(fabs(r - 1.0) < 1e-99); /* assert the identity */
}
}
/*-----------------------------------------------------------------------*/
void lib2_reent_init(uint8_t prio) {
(6) impure_ptr2->y = (double)prio * (M_PI / 6.0) + M_PI;
}
/*.......................................................................*/
void lib2_test(void) {
uint32_t volatile i = l_delay;
while (i-->OUL) {
volatile double r = sin(impure_ptr2->y) * sin(impure_ptr2->y)
+ cos(impure_ptr2->y) * cos(impure_ptr2->y);
Q_ASSERT(fabs(r - 1.0) < 1e-99); /* assert the identity */
}
}

(1) I initialize the per-thread context of library lib1 (the x variable) so it depends on the priority of the task, which is different for each task.
(2,3) The parameter l_delayCtr is set from the command line (see Section 10.6.1) and determines the number of iterations performed in this function (the CPU loading that the function causes).
(4) I use a mathematical identity sin²(x)+cos²(x) == 1.0 to compute the value of r based on the impure pointer impure_ptr1. This expression makes an extensive use of the 80x87 FPU.
(5) I assert the identity. This assertion would fail if the impure pointer where switched incorrectly or the FPU would compute the expression incorrectly.
(6) I initialize the per-thread context of library lib2 (the y variable) similarly as lib1, except I add a phase shift, so that the per-thread values are different that for lib1.

Extended Context Switch Demonstration

As discussed in Section 10.5.4, the 80x87 FPU context saving and restoring is handled automatically in the QK extended scheduler. At the application level, you need to include the per-thread FPU context in every active object, which is done in Listing 10.14(1). You also need to set the QK_FPU_FLAG for every task that uses the FPU (see Listing 10.14(7)). And finally, you must use the FPU to test it. Though Listing 10.15 perform a lot of floating-point operations, it is also important to use the correct compiler options to select the FPU. I've compiled both the QP libraries and the DPP applications with the –f287 option, which instructs the Turbo C++ compiler to generate FPU hardware instructions.

Summary

A certain class of real-time embedded (RTE) stems, such as control applications, can vastly benefit from preemptive multitasking. The QP event-driven platform contains a lightweight, priority-based, preemptive kernel called QK.

QK is a special kind of a preemptive kernel, called a run-to-completion (RTC) or single-stack kernel, in which tasks are one-shot, RTC functions as opposed to endless loops as in most conventional RTOSs. The biggest limitation of RTC kernels is inability to block in the middle of a task, but this limitation is irrelevant for executing event-driven active objects, because active objects don't block in the middle of RTC steps anyway.

When applied in active object applications, the QK kernel provides the same execution profile as any other conventional, priority-based, preemptive kernel or RTOS. In fact, QK most likely outperforms all conventional preemptive RTOSs in all respects, such as speed, stack usage, code size (ROM footprint), complexity, ease of use, or any other metrics you want to apply.

QK supports advanced features, such as priority-ceiling mutex, thread-local storage, and extended context switch, which you can find only in sophisticated RTOSs. All these advanced features are helpful when you need to share memory, libraries, or devices among active object threads.

QK is also easier to port to new CPUs and compilers than most RTOSs, mainly because QK can work with compiler-generated interrupts that virtually all embedded C/C++ compilers support. Most of the time, you can complete a QK port without writing assembly code. In this chapter, I discussed the QK port to 80x86 CPU with the legacy C compiler running in real mode. Although this port can be valuable in itself, here I used it mostly to demonstrate QK capabilities. This book's accompanying Website at www.quantum-leaps.com/psicc2 contains links to many other QK ports to popular embedded processors and compilers.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 10. Preemptive Run-to-Completion Kernel

Create new playlist

Sign In

Sign Up