Race Conditions

Whenever a variable or other data item is modified at interrupt time, there is the possibility that the driver will operate inconsistently because of race conditions. A race condition happens whenever an operation is not executed atomically, but it still needs to count on data coherence throughout its execution. The ``race'' therefore is between the non-atomic operation and other code that might be executed in the meantime. Typically, race conditions appear in three situations: implicit calls to schedule from within a function, blocking operations, and access to data shared by interrupt code and system calls. The last situation is the most frequent, and that’s why race conditions are dealt with in this chapter.

Dealing with race conditions is one of the trickiest aspects of programming, because the related bugs are subtle and very difficult to reproduce, and it’s hard to tell when there is a race condition between interrupt code and the driver methods. The programmer must take great care to avoid corruption of data or metadata.

In general, the techniques used to prevent race conditions are implemented in the driver methods, which must be sure to handle the data items correctly if they unexpectedly change. The interrupt handler, on the other hand, doesn’t usually need special care, because it operates atomically with respect to the device methods.

Different techniques can be employed to prevent data corruption, and I’m going to introduce the most common ones. I won’t show complete code because the best code for each situation depends on the operating mode of the device being driven, and on the programmer’s taste.

The most common ways of protecting data from concurrent access are:

  • Using a circular buffer and avoiding shared variables.

  • Temporarily disabling interrupts in the method whenever the shared variables are accessed.

  • Using lock variables, which are atomically incremented and decremented.

Whatever approach you choose, you still need to decide what to do when accessing a variable that can be modified at interrupt time. Such variables can be declared as volatile, to prevent the compiler from optimizing access to its value (for example, it prevents the compiler from holding the value in a register for the whole duration of a function). However, the compiler generates horrible code whenever volatile variables are involved, so you might choose to resort to cli and sti instead. The Linux implementation of these functions uses a gcc directive to ensure the processor is in a safe state before the interrupt flag is modified.

Using Circular Buffers

Using a circular buffer is an effective way of handling concurrent-access problems: the best way to deal with concurrent access is to perform no concurrent access whatsoever.

The circular buffer uses an algorithm called ``producer and consumer''--one player pushes data in and the other pulls data out. Concurrent access is avoided if there is exactly one producer and exactly one consumer. There are two examples of producer and consumer in the short module. In one case, the reading process is waiting to consume data that is produced at interrupt time; in the other, the bottom half consumes data produced by the top half.

Two pointers are used to address a circular buffer: head and tail. head is the point at which data is being written and is updated only by the producer of the data. Data is being read from tail, which is updated only by the consumer. As I mentioned above, if data is written at interrupt time, you must be careful when accessing head multiple times. You should either declare it as volatile or disable interrupts before entering race conditions.

The circular buffer runs smoothly, except when it fills up. If that happens, things become hairy, and you can choose among different possible solutions. The short implementation just loses data; there’s no check for overflow, and if head goes beyond tail, a whole buffer of data is lost. Some alternative implementations are to drop the last item; overwrite the buffer tail, as printk does (see Section 4.1.2 in Chapter 4); hold up the producer, as scullpipe does; or allocate a temporary extra buffer to back up the main buffer. The best solution depends on the importance of your data and other situation-specific questions, so I won’t cover it here.

Although the circular buffer appears to solve the problem of concurrent access, there is still the possibility of a race condition when the read function goes to sleep. This code shows where the problem appears in short:

while (short_head == short_tail) {
    interruptible_sleep_on(&short_queue);
    /* ... */
}

When executing this statement, it is possible that new data will arrive after the while condition is evaluated as true and before the process goes to sleep. Information carried in by the interrupt won’t be read by the process; the process goes to sleep even though head != tail, and it isn’t awakened until the next data item arrives.

I didn’t implement correct locking for short because the source of short_read is included in Section 8.2.2 in Chapter 8, and at that point this discussion was not worth introducing. Also, the data involved is not worth the effort.

While the data that short collects is not vital, and the likelihood of getting an interrupt in the time lapse between two successive instructions is often negligible, sometimes you just can’t take the risk of going to sleep when data is pending.

This problem is general enough to deserve special treatment and is delayed to Section 9.7.4 later in this chapter, where I’ll discuss it in detail.

It’s interesting to note that only a producer and consumer situation can be addressed with a circular buffer. A programmer must often deal with more complex data structures to solve the concurrent-access problem. The producer/consumer situation is actually the simplest class of these problems; other structures, such as linked lists, simply don’t lend themselves to a circular buffer implementation.

Disabling Interrupts

A commonly used method of acquiring unique access to shared data is to call cli to disable interrupt reporting in the processor. Whenever a data item (like a linked list) is modified at interrupt time and by a function living in the normal computational flow, interrupts must be disabled by the latter function before it touches shared data.

The race condition in this case is between the instruction reading a shared item and the instruction using the knowledge just acquired. For example, the following loop may fail reading a linked list if the list is modified at interrupt time:

for (ptr = listHead; ptr; ptr = ptr->next)
    /* do something */ ;

An interrupt may change the value of ptr after it has been read but before it is used. If this happens, you’ll have problems as soon as you use ptr because the current value of the pointer is no longer related to the list.

One possible solution is to disable interrupts for the duration of the critical loop. Although the code for disabling interrupts was introduced in Chapter 2, in the section Section 2.5.2, it is worth repeating it here:

unsigned long flags;
save_flags(flags);
cli();
/* critical code */
restore_flags(flags);

As a matter of fact, in the device methods, the simpler cli/sti pair can be used instead because you can count on interrupts being enabled when a process enters a system call. However, you have to use the safer save_flags/restore_flags solution in a function that is called by other code, where you can’t make any assumptions about the current value of the interrupt flag.

Using Lock Variables

The third approach to shared data variables is to use locks that are accessed through atomic instructions. Whenever one of two unrelated entities (such as an interrupt handler and the read system call, or two processors in a Symmetric Multi-Processor computer) need to access a shared data item, it must first acquire the lock. If the lock can’t be acquired, it must be waited for.

The Linux kernel exports two sets of functions to deal with locks: bit operations and access to the ``atomic'' data type.

Bit operations

It’s quite common to have single-bit lock variables or to update device status flags at interrupt time--while a process may be accessing them. The kernel offers a set of functions that modify or test single bits atomically. Because the whole operation happens in a single step, no interrupt can interfere.

Atomic bit operations are very fast, as they usually perform the operation using a single machine instruction without disabling interrupts. The functions are architecture-dependent and are declared in <asm/bitops.h>. They are guaranteed to be atomic even on SMP computers and are therefore the suggested way to keep coherence across processors.

Unfortunately, data typing in these functions is architecture-dependent as well. Both the nr argument and the return value are unsigned long for the Alpha and the Sparc and int for the other architectures. The following list describes the bit operations as they appear in versions 1.2 through 2.1.37. The list changed in 2.1.38, as detailed in Section 17.6 in Chapter 17.

set_bit(nr, void *addr);

This function sets bit number nr in the data item pointed to by addr. The function acts on an unsigned long, even though addr is a pointer to void. The value returned is the previous value of the bit—0 or non-zero.

clear_bit(nr, void *addr);

The function clears the specified bit in the unsigned long datum that lives at addr. Its semantics are the same as set_bit.

change_bit(nr, void *addr);

This function toggles the bit. Otherwise, it is identical to set_bit and clear_bit, above.

test_bit(nr, void *addr);

This function is the only bit operation that doesn’t need to be atomic; it simply returns the current value of the bit.

When these functions are used to access and modify a shared flag, you don’t have to do anything except call them. Using bit operations to manage a lock variable that controls access to a shared variable, on the other hand, is more complicated, and deserves an example.

A code segment that needs to access a shared data item tries to atomically acquire a lock using either set_bit or clear_bit. The usual implementation is shown below; it assumes that the lock lives at bit nr of address addr. It also assumes that the bit is 0 when the lock is free and non-zero when the lock is busy.

/* try to set lock */
while (set_bit(nr, addr) != 0)
    wait_for_a_while();

/* do your work */

/* release lock, and check... */
if (clear_bit(nr, addr) == 0)
    something_went_wrong(); /* already released: error */

The downside of accessing shared data this way is that both the contending parties must be able to wait. This is not easily achieved when one of the parties is an interrupt handler.

Atomic integer operations

Kernel programmers often need to share an integer variable between an interrupt handler and other functions. We have just seen how atomic access to bits is not sufficient to guarantee that everything will work well (in the previous case, cli must be used anyway if one of the parties is an interrupt handler).

The need to prevent race conditions, actually, is so compelling that the kernel developers devoted an entire header to the problem: <asm/atomic.h>. This header is quite recent and is missing in Linux 1.2. It is therefore not available to drivers meant to be backward compatible.

The facility offered by atomic.h is much stronger than the bit operations just described. atomic.h defines a new data type, atomic_t, which can be accessed only through atomic operations.

atomic_t is currently defined as int on all supported architectures. The following operations are defined for the type and are guaranteed to be atomic with respect to all processors of an SMP computer. The operations are very fast because they compile to a single machine instruction whenever possible.

void atomic_add(atomic_t i, atomic_t *v);

Add i to the atomic variable pointed to by v. The return value is void, as most of the time there’s no need to know the new value. This function is used by the networking code to update statistics about memory usage in sockets.

void atomic_sub(atomic_t i, atomic_t *v);

Subtract i from *v. The i argument for both these functions is declared as int in recent 2.1 kernels, but this change is mainly aesthetic and shouldn’t affect source code.

void atomic_inc(atomic_t *v); , void atomic_dec(atomic_t *v);

Increment or decrement an atomic variable.

int atomic_dec_and_test(atomic_t *v);

This function was added in version 1.3.84 and is useful for keeping track of usage counts. The return value is 1 if the variable *v is 0 after being decremented, 0 otherwise.

As suggested above, atomic_t data items must be accessed only through these functions. If you pass an atomic item to a function that expects an integer argument, you’ll get a compiler warning. Needless to say, you are allowed to read the current value of an atomic item and to cast atomic variables to other types.

Going to Sleep Without Races

The one race condition that has been omitted so far in this discussion is the problem of going to sleep. It is a problem far more general than interrupt-driven I/O, and an efficient solution requires a little knowledge of the internals of sleep_on.

This particular race condition occurs between the time the condition is checked for going to sleep and the actual invocation of sleep_on. This test case is the same statement used above, but I feel it is worth showing once again:

while (short_head == short_tail) {
    interruptible_sleep_on(&short_queue);
    /* ... */
}

If the comparison and the going to sleep must be performed safely, first disable interrupt reporting and then test the condition and go to sleep. Thus, the variable being tested in the comparison cannot be modified. The kernel allows the process to go to sleep after issuing cli. The kernel simply reenables interrupt reporting just before calling schedule, after inserting the process into its wait queue.

The code examples introduced here use a while loop, which performs signal handling. If a blocked signal is reported to the process, interruptible_sleep_on returns, and the test in the while statement is performed again. The following is one possible implementation:

while (short_head == short_tail) {
    cli();
    if (short_head == short_tail)
        interruptible_sleep_on(&short_queue);
    sti();
    /* .... signal decoding .... */
}

If an interrupt happens after the cli, the interrupt remains pending while the current process is being put to sleep. By the time the interrupt is eventually reported to the processor, the process is already asleep and can be awakened safely.

In this case, I used cli/sti because the sample code is designed to live within the read method; the safer save_flags, cli, and restore_flags functions should be used otherwise.

If you don’t want to disable interrupt reporting while going to sleep, there is another way to perform the same job (one that Linus likes a lot). Anyway, you can skip the following discussion if you’d like, since it’s slightly elaborate.

The idea is that the process can add itself to the wait queue, declare itself to be sleeping, and then perform its tests. This is the typical implementation:

struct wait_queue wait = { current, NULL };

add_wait_queue(&short_queue, &wait);
do {
    current->state = TASK_INTERRUPTIBLE;
    schedule();
}
while ((short_head == short_tail)
        && !(current->signal & ~current->blocked) );
remove_wait_queue(&short_queue, &wait);

if (current->signal & ~current->blocked) /* a signal arrived */
    return -ERESTARTSYS;

This code is somewhat like an unrolling of the internals of sleep_on. The wait variable is explicitly declared because it is needed to put a process to sleep; this fact was explained in Section 5.2.3, in Chapter 5. This example introduced several new symbols:

current->state

This field is a hint for the scheduler. Whenever the scheduler is invoked, it looks at the state field of all processes to decide what to do. Each process is free to modify its state, but the change won’t take effect until the scheduler runs.

#include <linux/sched.h> , TASK_RUNNING , TASK_INTERRUPTIBLE , TASK_UNINTERRUPTIBLE

These symbolic names represent the most common values of current->state. TASK_RUNNING means that the process is running, and the other two mean that it is sleeping.

void add_wait_queue(struct wait_queue ** p, ,                     struct wait_queue * wait) , void remove_wait_queue(struct wait_queue ** p, ,                        struct wait_queue * wait) , void __add_wait_queue(struct wait_queue ** p, ,                       struct wait_queue * wait) , void __remove_wait_queue(struct wait_queue ** p, ,                          struct wait_queue * wait)

Insert and remove a process from a wait queue. The wait argument must point into the process’s stack page. The functions with the leading underscores are faster but can only be called when interrupts are disabled (for example, from a fast interrupt handler).

With this background, let’s look at what happens when an interrupt arrives. The handler calls wake_up_interruptible(&short_queue); for Linux, this means ``set state to TASK_RUNNING.'' Therefore, if an interrupt is reported between the while condition and the call to schedule, the task will already have been marked as running, with no data loss.

If, on the other hand, the process is still ``interruptible,'' schedule leaves it sleeping.

It’s interesting to note that the wake_up call doesn’t remove the process from the wait queue. It’s sleep_on that adds and removes the process from the wait queue. The code must call add_wait_queue and remove_wait_queue explicitly, because sleep_on hasn’t been used in this case.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.0.25