Bottom Halves

One of the main problems with interrupt handling is how to perform longish tasks within a handler. Linux resolves this problem by splitting the interrupt handler into two halves: the so-called ``top half'' is the routine you register through request_irq, and the ``bottom half'' (``bh'' for short) is a routine that is scheduled by the top half, to be executed later, at a safer time.

But what is a bottom half useful for?

The big difference between the top-half handler and the bottom half is that all interrupts are enabled during execution of the bh—that’s why it runs at a ``safer'' time. In the typical scenario, the top half saves device data to a device-specific buffer, marks its bottom half, and exits: this is very fast. The bh then dispatches newly arrived data to the processes, awakening them if necessary. This setup permits the top half to service a new interrupt while the bottom half is still working. New data arriving before the top half terminates, on the other hand, are lost because the IRQ line is disabled in the interrupt controller.

Every serious interrupt handler is split this way. For instance, when a network interface reports the arrival of a new packet, the handler just retrieves the data and pushes it up to the protocol layer; actual processing of the packet is performed in a bottom half.

This kind of job should be reminiscent of task queues; actually, task queues have evolved from an older implementation of bottom halves. Even version 1.0 of the kernel had bottom halves, while task queues didn’t yet exist.

Unlike task queues, which are dynamic, bottom halves are limited in number and predefined in the kernel; this is similar to the old kernel timers. The static nature of bottom halves is not a problem because some of them evolve into a dynamic object by running a task queue. In <linux/interrupt.h>, you’ll find the list of available bottom halves; the most interesting of them are discussed below.

The Design of Bottom Halves

The bottom halves exist as an array of function pointers and a bitmask--that’s why there are no more than 32 of them. When the kernel is ready to deal with asynchronous events, it calls do_bottom_half. We have seen how this happens on return from a system call and on exiting a slow handler; both events occur frequently. The decision to use a bitmask is mainly dictated by performance: checking the bitmask takes only one machine instruction and minimizes overhead.

Whenever some code wants to schedule a bottom half for running, it calls mark_bh, which sets a bit in the bitmask variable to queue the corresponding function for execution. A bottom half can be scheduled by an interrupt handler or any other function. When the bottom-half handler is executed, it is automatically unmarked.

Marking bottom halves is defined in <linux/interrupt.h> as:

void mark_bh(int nr);

Here, nr is the ``number'' of the bh to activate. The number is a symbolic constant defined in <linux/interrupt.h> that identifies which bit needs to be set in the bitmask. The function that corresponds to each bh is provided by the driver that owns the bottom half. For example, when mark_bh(KEYBOARD_BH) is called, the function being scheduled for execution is kbd_bh, which is part of the keyboard driver.

Since bottom halves are static objects, a modularized driver won’t be able to register its own bottom half. There’s currently no support for dynamic allocation of bottom halves, and it’s unlikely there ever will be, as the immediate task queue can be used instead.

The rest of this section lists some of the most interesting bottom halves. It then describes how the kernel runs a bottom half, which you should understand in order to use bottom halves properly.

Several bottom halves declared by the kernel are interesting to look at, and a few can even be used by a driver, as introduced above. These are the most interesting bottom halves:

IMMEDIATE_BH

This is the most important bh for driver writers. The function being scheduled consumes a task queue, tq_immediate. A driver (like a custom module) that doesn’t own a bottom half can use the immediate queue as if it were its own bh. After registering a task in the queue, the driver must mark the bh in order to have its code actually executed; how to do this was introduced in Section 6.4.2.4, in Chapter 6.

TQUEUE_BH

This bh is activated at each timer tick if a task is registered in tq_timer. In practice, a driver can implement its own bottom half using tq_timer; the timer queue introduced in Chapter 6 (in the section Section 6.4.2.3) is a bottom half, but there’s no need to call mark_bh for it. TQUEUE_BH is always executed later than IMMEDIATE_BH.

NET_BH

Network drivers should mark this queue to notify the upper network layers of events. The bh itself is part of the network layer and not accessible to modules. We’ll see how to use it proficiently in Section 14.7, in Chapter 14.

CONSOLE_BH

The console performs tty switching in a bottom half. This operation can involve process control. For instance, switching between the X Window system and text mode is controlled by the X server. Moreover, if the keyboard driver asks for a console change, the console switching can’t be done during the interrupt. It also can’t be done while a process is writing to the console. Using a bh fits the task because bottom halves can be disabled at the driver’s will; in this case, console_bh is disabled during a console write.[26]

TIMER_BH

This bh is marked by do_timer, the function in charge of the clock tick. The function that this bh executes is the one that drives the kernel timers. There is no way to use this facility for a driver short of using add_timer.

The remaining bottom halves are used by specific kernel drivers. There are no entry points in them for a module, and it wouldn’t make sense for there to be any.

Once a bh has been activated, it is executed when do_bottom_half (kernel/softirq.c) is invoked, which happens within return_from_sys_call. The latter procedure is executed whenever a process exits from a system call or when a slow interrupt handler exits. The bottom halves are not executed on exit from a fast handler; whenever a driver needs fast execution of its bottom half, it should register a slow handler.

ret_from_sys_call is always executed by the clock tick; thus, if a fast handler marks a bh, the actual function will be executed at most 10ms later (less than 1ms later on the Alpha, whose clock tick runs at 1024 Hz).

After a bottom half has run, the scheduler is called if the need_resched variable is set; the variable is set by the various wake_up functions. The top half can thus leave to the bottom half any work related to awakening processes--they’ll be scheduled right away. This is what happens, for example, when a telnet packet arrives from the network. net_bh awakens telnetd, and the scheduler gives it processor time with no additional delays.

Writing a Bottom Half

Bottom-half code runs at a safe time--safer than when the top-half handler runs. Nonetheless, some care is necessary because a bh is still at ``interrupt time''; intr_count is not 0 because the bottom half executes outside the context of a process. The limitations outlined in Section 6.4.1, in Chapter 6, thus apply to code executing in a bottom half.

The main problem with the bottom halves shown is that they often need to share data structures with the top-half interrupt handler, and race conditions must be prevented. This might mean temporarily disabling interrupt reporting or using locking techniques.

It’s quite apparent from the previous list of available bottom halves in Section 9.4.1 that a new driver implementing a bottom half should attach its code to IMMEDIATE_BH, by using the immediate queue. If your driver is important enough, however, you can even have your own bh number assigned in the kernel itself. Important drivers are a minority, however, and I won’t go into detail about them. Three functions exist to deal with privately owned bottom halves: init_bh, enable_bh, and disable_bh. If you’re interested, you’ll find them in the kernel sources.

Actually, using the immediate queue is no different from managing your own bottom half--the immediate queue is a bottom half. When IMMEDIATE_BH is marked, the function in charge of the immediate bottom half just consumes the immediate queue. If your interrupt handler queues its bh handler to tq_immediate and marks the bottom half, the queued task will be called at just the right time. Since in all recent kernels you can queue the same task multiple times without trashing the task queue, you can queue your bottom half every time the top-half handler runs. We’ll see this behavior in a while.

Drivers with exotic configurations--multiple bottom halves or other setups that can’t easily be handled with a plain tq_immediate--can be satisfied by using a custom task queue. The interrupt handler queues the tasks in its own queue, and when it’s ready to run them, a simple queue-consuming function is inserted into the immediate queue. See Section 6.4.3 in Chapter 6 for details.

Let’s now look at the short implementation. When loaded with bh=1, the module installs an interrupt handler that uses a bottom half.

short performs split interrupt management as follows: the top half (the handler) saves the current time value in a circular buffer and schedules the bottom half. The bh prints accumulated time values to the text buffer and then awakens any reading process.

The top half turns out to be really simple:

void short_bh_interrupt(int irq, void *dev_id, struct pt_regs *regs)
{
    do_gettimeofday(tv_head);
    tv_head++;

    if (tv_head == (tv_data + NR_TIMEVAL) ) 
        tv_head = tv_data; /* wrap */

    /* Queue the bh. Don't care for multiple queueing */
    queue_task_irq_off(&short_task, &tq_immediate);
    mark_bh(IMMEDIATE_BH);

    short_bh_count++; /* record that an interrupt arrived */
}

As expected, this code calls queue_task without checking whether the task is aready queued. This behavior doesn’t work with Linux 1.2, and if you compile short against 1.2 headers, it uses a different handler, which queues the task only when short_bh_count is 0.

The bottom half, then, performs the rest of the work. It also records the number of times the top half was invoked before the bottom half was scheduled. The number is always 1 if the top half is a ``slow'' handler, because pending bottom halves are always run whenever a slow handler exits, as described above.

void short_bottom_half(void *unused)
{
    int savecount = short_bh_count;
    short_bh_count = 0; /* we've already been removed from the queue */
    /*
     * The bottom half reads the tv array, filled by the top half,
     * and prints it to the circular text buffer, which is then consumed
     * by reading processes
     */

    /* First write the no. of interrupts that occurred before this bh */

    short_head += sprintf((char *)short_head,
                          "bh after %6i
", savecount);
    if (short_head == short_buffer + PAGE_SIZE)
        short_head = short_buffer; /* wrap */

    /*
     * Then, write the time values. Write exactly 16 bytes at a time,
     * so it aligns with PAGE_SIZE
     */

    do {
        short_head += sprintf((char *)short_head,"%08u.%06u
",
                              (int)(tv_tail->tv_sec % 100000000),
                              (int)(tv_tail->tv_usec));
        if (short_head == short_buffer + PAGE_SIZE)
            short_head = short_buffer; /* wrap */
        
        tv_tail++;
        if (tv_tail == (tv_data + NR_TIMEVAL) ) 
            tv_tail = tv_data; /* wrap */
        
    } while (tv_tail != tv_head);

    wake_up_interruptible(&short_queue); /* awake any reading process */
}

Timings taken on my oldish computer show that, using a bottom half, the interval between two interrupts has shrunk from 53 microseconds to 27, since less work is performed in the top-half handler. While the total work needed to handle the interrupt is the same, a faster top half has the advantage that the interrupt remains disabled for a shorter time. This is not an issue for short because the write function generating interrupts is restarted only after the handler is done, but timing might be relevant for real hardware interrupts.

Here’s an example of what you see when loading short by specifying bh=1:

morgana% echo 1122334455 > /dev/shortint ; cat /dev/shortint
bh after      5
50588804.876653
50588804.876693
50588804.876720
50588804.876747
50588804.876774


[26] The function disable_bh can be used by drivers using their own bottom half, as explained in a while.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.134.118.95