Task Queues

One feature many drivers need is to schedule execution of some tasks at a later time without resorting to interrupts. Linux offers two different interfaces for this purpose: task queues and kernel timers. Task queues provide a flexible utility for scheduling execution at a later time, with various meanings for later; they are most useful when writing interrupt handlers, and we’ll see them again in Section 9.4, in Chapter 9. Kernel timers are used to schedule a task to run at a specific time in the future and are dealt with later in this chapter, in Section 6.5.

A typical situation in which you might use task queues is for managing hardware that cannot generate interrupts but still allows blocking read. You need to poll the device, while taking care not to burden the CPU with unnecessary operations. Waking the reading process at fixed time intervals (for example, using current->timeout) isn’t a suitable approach, because each poll would require two context switches and often a suitable polling mechanism can be implemented only outside of a process’s context.

A similar problem is giving timely input to a simple hardware device. For example, you might need to feed steps to a stepper motor that is directly connected to the parallel port--the motor needs to be moved by single steps on a timely basis. In this case, the controlling process talks to your device driver to dispatch a movement, but the actual movement should be performed step by step after returning from write.

The preferred way to perform such floating operations quickly is to register a task for later execution. The kernel supports task ``queues,'' where tasks accumulate to be ``consumed'' when the queue is run. You can declare your own task queue and trigger it at will, or you can register your tasks in predefined queues, which are run (triggered) by the kernel itself.

The next section first describes task queues, then introduces predefined task queues, which provide a good start for some interesting tests (and hang the computer if something goes wrong), and finally introduces how to run your own task queues.

The Nature of Task Queues

A task queue is a list of tasks, each task being represented by a function pointer and an argument. When a task is run, it receives a single void * argument and returns void. The pointer argument can be used to pass along a data structure to the routine, or it can be ignored. The queue itself is a list of structures (the tasks) that are owned by the kernel module declaring and queueing them. The module is completely responsible for allocating and deallocating the structures; static structures are commonly used for this purpose.

A queue element is described by the following structure, copied directly from <linux/tqueue.h>:

struct tq_struct {
    struct tq_struct *next;         /* linked list of active bh's */
    int sync;                       /* must be initialized to zero */
    void (*routine)(void *);        /* function to call */
    void *data;                     /* argument to function */
};

The ``bh'' in the first comment means bottom-half. A bottom-half is ``half of an interrupt handler''; we’ll discuss this topic thoroughly when we deal with interrupts in Section 9.4 in Chapter 9.

Task queues are an important resource for dealing with asynchronous events, and most interrupt handlers schedule part of their work to be executed when task queues are run. On the other hand, some task queues are bottom halves, in that their execution is triggered by the do_bottom_half function. While I intend for this chapter to make sense even if you don’t understand bottom-halves, nonetheless, I do refer to them when necessary.

The most important fields in the data structure shown above are routine and data. To queue a task for later execution, you need to set both these fields before queueing the structure, while next and sync should be cleared. The sync flag in the structure is used to prevent queueing the same task more than once, as this would corrupt the next pointer. Once the task has been queued, the structure is considered ``owned'' by the kernel and shouldn’t be modified.

The other data structure involved in task queues is task_queue, which is currently just a pointer to struct tq_struct; the decision to typedef this pointer to another symbol permits the extension of task_queue in the future, should the need arise.

The following list summarizes the operations that can be performed on struct tq_structs; all the functions are inlines.

void queue_task(struct tq_struct *task, task_queue *list);

As its name suggests, this function queues a task. It disables interrupts to prevent race conditions and can be called from any function in your module.

void queue_task_irq(struct tq_struct *task, ,                     task_queue *list);

This function is similar to the previous one, but it can be called only from a non-reentrant function (such as an interrupt handler, whence the name). It is slightly faster than queue_task because it doesn’t disable interrupts while queueing. If you call this routine from a reentrant function, you risk corruption of the queue due to the unmasked race condition. However, this function does mask against queueing-while-running (queueing a task at the exact place where the queue is being consumed).

void queue_task_irq_off(struct tq_struct *task, ,                         task_queue *list);

This function can be called only when interrupts are disabled. It is faster than the previous two, but doesn’t prevent either concurrent-queueing or queueing-while-running race conditions.

void run_task_queue(task_queue *list);

run_task_queue is used to consume a queue of accumulated tasks. You won’t need to call it yourself unless you declare and maintain your own queue.

Both queue_task_irq and queue_task_irq_off have been removed in version 2.1.30 of the kernel, as the speed gain was not worth the effort. See Section 17.4 in Chapter 17, for details.

Before delving into the details of the queues, I’d better explain some of the subtleties that hide behind the scenes. Queued tasks execute asynchronously with respect to system calls; this asynchronous execution requires additional care and is worth explaining first.

Task queues are executed at a safe time. Safe here means that there aren’t stringent requirements about the execution times. The code doesn’t need to be extremely fast, because hardware interrupts are enabled during execution of task queues. Queued functions should be reasonably fast anyway because only hardware interrupts can be dealt with by the system as long as a queue is being consumed.

Another concept related to task queues is that of interrupt time. In Linux, interrupt time is a software concept, enforced by the global kernel variable intr_count. This variable keeps a count of the number of nested interrupt handlers being executed at any time.[21]

During the normal computation flow, when the processor is executing on behalf of a process, intr_count is 0. When intr_count is not 0, on the other hand, the code is asynchronous with the rest of the system. Asynchronous code might be handling a hardware interrupt or a ``software interrupt''--a task that is executed independent of any processes, which we’ll refer to as running ``at interrupt time.'' Such code is not allowed to perform certain operations; in particular, it cannot put the current process to sleep because the value of the current pointer is not related to the software interrupt code being run.

A typical example is code executed on exit from a system call. If for some reason there is a job scheduled for execution, the kernel can dispatch it as soon as a process returns from a system call. This is a ``software interrupt,'' and intr_count is incremented before dealing with the pending job. The function being dispatched is at ``interrupt time'' because the main instruction stream has been interrupted.

When intr_count is non-zero, the scheduler can’t be invoked. This also means that kmalloc(GFP_KERNEL) is not allowed. Only atomic allocations (see Section 7.1.1 in Chapter 7) can be performed at interrupt time, and atomic allocations are more prone to fail than ``normal'' allocations.

If code being executed at interrupt time calls schedule, an error message like ``Aiee: scheduling in interrupt'' is printed on the console, followed by the hexadecimal address of the calling instruction. From version 2.1.37 onwards, this message is followed by an oops, to help debug the problem by analyzing the registers. Trying to allocate memory at interrupt time with non-atomic priority generates an error message that includes the caller’s address.

Predefined Task Queues

The easiest way to perform deferred execution is to use the queues that are already maintained by the kernel. There are four of these queues, described below, but your driver can use only the first three. The queues are declared in <linux/tqueue.h>, which you should include in your source.

tq_scheduler

This queue is consumed whenever the scheduler runs. Since the scheduler runs in the context of the process being scheduled out, tasks that run in the scheduler queue can do almost anything; they are not executed at interrupt time.

tq_timer

This queue is run by the timer tick. Since the tick (the function do_timer) runs at interrupt time, any task within this queue runs at interrupt time as well.

tq_immediate

The immediate queue is run as soon as possible, either on return from a system call or when the scheduler is run, whichever comes first. The queue is consumed at interrupt time.

tq_disk

This queue, not present in version 1.2 of the kernel, is used internally in memory management and can’t be used by modules.

The timeline of a driver using a task queue is represented in Figure 6.1. The figure shows a driver that queues a function in tq_scheduler from an interrupt handler.

Timeline of task-queue usage

Figure 6-1. Timeline of task-queue usage

How the examples work

Examples of deferred computation are available in the jiq (Just In Queue) module, from which the source in this section has been extracted. This module creates /proc files that can be read using dd or other tools; this is similar to jit. The sample module can’t run on Linux 1.2 because it uses dynamic /proc files.

The process reading a jiq file is put to sleep until the buffer is full.[22] The buffer is filled by successive runs of a task queue. Each pass through the queue appends a text string to the buffer being filled; each string reports the current time (in jiffies), the process that is current during this pass, and the value of intr_count.

For best results, the file should be read in one shot, with the command dd count=1; if you use a command like cat, the read method is invoked several times, and the results overlap, as explained in Section 4.2.1, in Chapter 4.

The code for filling the buffer is confined to the jiq_print function, which executes at each run through the queue being used. The printing function is not interesting and is not worth showing here; instead, let’s look at the initialization of the task to be inserted in a queue:

struct tq_struct jiq_task; /* global: initialized to zero */

/* this lines are in init_module() */
jiq_task.routine = jiq_print;
jiq_task.data = (void *)&jiq_data;

There’s no need to clear the sync and next fields of jiq_task because static variables are initialized to 0 by the compiler.

The scheduler queue

The easiest queue to use is tq_scheduler because queued tasks are not constrained by being executed at interrupt time.

/proc/jiqsched is a sample file that uses tq_scheduler. The read function for the file dispatches everything to the task queue, in the following way:

int jiq_read_sched(char *buf, char **start, off_t offset,
                   int len, int unused)
{

    jiq_data.len = 0;               /* nothing printed, yet */
    jiq_data.buf = buf;             /* print in this place */
    jiq_data.jiffies = jiffies;     /* initial time */

    /* jiq_print will queue_task() again in jiq_data.queue */
    jiq_data.queue = &tq_scheduler;

    queue_task(&jiq_task, &tq_scheduler); /* ready to run */
    interruptible_sleep_on(&jiq_wait);    /* sleep till completion */

    return jiq_data.len;
}

Reading /proc/jiqsched is interesting, because it shows when the scheduler is run--the value of jiffies shown is the value when the scheduler gets invoked. If CPU-bound processes are active on the system, there is a delay between successive runs of the queue; the scheduler won’t preempt the processes before several clock ticks have elapsed. Reading the file can thus take several seconds, since the file is roughly 100 lines long (or twice that on the Alpha).

The simplest way to test this situation is to run a process that executes an empty loop. The load50 program is a load-raising program that executes 50 concurrent busy loops in the user space; you’ll find its source in the sample programs. When load50 is running in the system, head extracts the following from /proc/jiqsched:

time  delta intr_count pid command
1643733   0        0     701 head
1643747  14        0     658 load50
1643747   0        0       3 kswapd
1643755   8        0     655 load50
1643761   6        0     666 load50
1643764   3        0     650 load50
1643767   3        0     661 load50
1643769   2        0     659 load50
1643769   0        0       6 loadmonitor

Note that the scheduler queue is run immediately after entering schedule, and thus the current process is the one that is being scheduled out. That’s why the first line in /proc/jiqsched always represents the process reading the file; it has just gone to sleep and is being scheduled out. Note also that both kswapd and loadmonitor (a program I run on my system) execute for less than 1 time tick, while load50 is preempted when its time quantum expires, several clock ticks after it acquires the processor.

When no process is actually running, the current process is always the idle task (process 0, historically called ``swapper'') and the queue is run either continuously or once every timer tick. The scheduler, and thus the queue, runs continuously if the processor can’t be put into a ``halted'' state; it runs at every timer tick only if the processor is halted by process 0. A halted processor can be awakened only by an interrupt. When this happens, the idle task runs the scheduler (and the associated queue). The following shows the results of head /proc/jitsched run on an unloaded system:

time  delta intr_count pid command
1704475   0        0     730 head
1704476   1        0       0 swapper
1704477   1        0       0 swapper
1704478   1        0       0 swapper
1704478   0        0       6 loadmonitor
1704479   1        0       0 swapper
1704480   1        0       0 swapper
1704481   1        0       0 swapper
1704482   1        0       0 swapper

The timer queue

Using the timer queue is not too different from using the scheduler queue. The main difference is that, unlike the scheduler queue, the timer queue executes at interrupt time. Additionally, you’re guaranteed that the queue will run at the next clock tick, thus overcoming any dependency on system load. The following is what head /proc/jiqtimer returned while my system was compiling:

time  delta intr_count pid command
1760712   1        1     945 cc1
1760713   1        1     945 cc1
1760714   1        1     945 cc1
1760715   1        1     946 as
1760716   1        1     946 as
1760717   1        1     946 as
1760718   1        1     946 as
1760719   1        1     946 as
1760720   1        1     946 as

One feature of the current implementation of task queues is that a task can requeue itself in the same queue it is run from. For instance, a task being run from the timer tick can reschedule itself to be run on the next tick. Rescheduling is possible because the head of the queue is replaced with a NULL pointer before consuming queued tasks. This implementation dates back to kernel version 1.3.70. In earlier versions (such as 1.2.13), rescheduling was not possible because the kernel didn’t trim the queue before running it. Trying to reschedule a task with Linux 1.2 hangs the system in a tight loop. The ability to reschedule is the only relevant difference in task-queue management from 1.2.13 to 2.0.x.

Although rescheduling the same task over and over might appear to be a pointless operation, it is sometimes useful. For example, my own computer moves a pair of stepper motors one step at a time by rescheduling itself on the timer queue until the target has been fulfilled. Another example is the jiq module, where the printing function reschedules itself to show each pass through the queues.

The immediate queue

The last predefined queue that can be used by modularized code is the immediate queue. It works like a bottom-half interrupt handler, and thus it must be ``marked'' with mark_bh(IMMEDIATE_BH). For efficiency, bottom-halves are run only if they are marked. Note that the handler must be marked after the call to queue_task; otherwise a race condition is created. See Section 9.4 in Chapter 9 for more detail.

The immediate queue is the fastest queue in the system--it’s executed soonest and is consumed after incrementing intr_count. The queue is so ``immediate'' that if you re-register your task, it is rerun as soon as it returns. The queue is run over and over until it is empty. If you read /proc/jiqimmed, you’ll see that the reason it is so fast is that it keeps control of the CPU during the entire reading process.

The queue is consumed either by the scheduler or as soon as one process returns from its system call. It’s interesting to note that the scheduler (at least with the 2.0 kernel) doesn’t keep rerunning the immediate queue until it is empty; this happens only when the queue is run on return from a system call. You can see this behavior in the next sample output--the first line of jiqimmed shows head as the current process, while the next lines don’t.

time  delta intr_count pid command
1975640   0        1    1060 head
1975641   1        1       0 swapper
1975641   0        1       0 swapper
1975641   0        1       0 swapper
1975641   0        1       0 swapper
1975641   0        1       0 swapper
1975641   0        1       0 swapper
1975641   0        1       0 swapper
1975641   0        1       0 swapper

It’s clear that the queue can’t be used to delay the execution of a task--it’s an ``immediate'' queue. Instead, its purpose is to execute a task as soon as possible, but at a ``safe time.'' This feature makes it a great resource for interrupt handlers, because it offers them an entry point for executing program code outside of the actual interrupt management routine.

Although /proc/jiqimmed re-registers its task in the queue, this technique is discouraged in real code; this uncooperative behavior ties up the processor as long as a task is re-registering itself, whith no advantage over completing the work in one pass.

Running Your Own Task Queues

Declaring a new task queue is not difficult. A driver is free to declare a new task queue, or even several of them; tasks are queued just as we’ve seen with tq_scheduler.

Unlike a predefined task queue, a custom queue is not automatically triggered by the kernel. The programmer who maintains a queue must arrange for a way of triggering it.

The following macro declares the queue and needs to be expanded where you want your task queue to be declared:

DECLARE_TASK_QUEUE(tq_custom);

After declaring the queue, you can invoke the usual functions to queue tasks. The call above pairs naturally with the following:

queue_task(&custom_task, &tq_custom);

And the following one will run tq_custom:

run_task_queue(&tq_custom);

If you want to experiment with custom queues now, you need to register a function to trigger the queue in one of the predefined queues. Although this may look like a roundabout way to do things, it isn’t. A custom queue can be useful whenever you need to accumulate jobs and execute them all at the same time, even if you use another queue to select that ``same time.''



[21] Version 2.1.34 of the kernel got rid of intr_count. See Section 17.5 in Chapter 17 for details on this.

[22] The buffer of a /proc file is a page of memory: 4KB or 8KB.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.118.227.69