Handling Requests

The most important function in a block driver is the request function, which performs the low-level operations related to reading and writing data. This section introduces the design of such a procedure.

When the kernel schedules a data transfer, it queues the ``request'' in a list, ordered so that it maximizes system performance. The linked list of requests is then passed to the driver’s request function, which should perform the following tasks for each request in the linked list:

  • Check the validity of the current request. This task is performed by the macro INIT_REQUEST, defined in blk.h.

  • Perform the actual data transfer. The CURRENT variable (macro, actually) can be used to retrieve the details of the outstanding request. CURRENT is a pointer to struct request, whose fields are described in the next section.

  • Clean up the current request. This operation is performed by end_request, a static function whose code resides in blk.h. The driver passes the function a single argument, which is 1 in case of success and 0 in case of failure. When end_request is called with an argument of zero, an ``I/O error'' message is delivered to the system logs (via printk).

  • Loop back to the beginning, to consume the next request. A goto, a surrounding for(;;), or a surrounding while(1) can be used, at the programmer’s will.

In practice, the code for the request function is structured like this:

void sbull_request(void)
{
    while(1) {
        INIT_REQUEST;
        printk("request %p: cmd %i sec %li (nr. %li), next %p
",
               CURRENT,
               CURRENT->cmd,
               CURRENT->sector,
               CURRENT->current_nr_sectors,
               CURRENT->next);
        end_request(1); /* success */
    }
}

Although this code does nothing but print messages, running this function provides good insight into the basic design of data transfer. The only unclear part of the code at this point should be the exact meaning of CURRENT and its fields, which I’ll describe in the next section.

My first sbull implementation contained exactly the empty code just shown. I managed to make a filesystem on the ``nonexistent'' device and use it for a while, as long as data remained in the buffer cache. Looking at the system logs while running a verbose request function like this one can help you understand how the buffer cache works.

This empty-and-verbose function can still be run in sbull by defining the symbol SBULL_EMPTY_REQUEST at compile time. If you want to understand how the kernel handles different block sizes, you can experiment with blksize= on the insmod command line. The empty request function uncovers the internal kernel workings by printing the details of each request. You might also play with hardsect=, but currently this is disabled because it’s dangerous (see Section 12.1 at the beginning of this chapter).

The code in a request function doesn’t explicitly issue return(), because INIT_REQUEST does it for you when the list of pending requests is exhausted.

Performing the Actual Data Transfer

In order to build a working data transfer for sbull, let’s look at how the kernel describes a request within a struct request. The structure is defined in <linux/blkdev.h>. By accessing the fields in CURRENT, the driver can retrieve all the information needed to transfer data between the buffer cache and the physical block device.

CURRENT is a macro that is used to access the current request (the one to be serviced first). As you might guess, CURRENT is a short form of blk_dev[MAJOR_NR].current_request.

The following fields of the current request carry useful information for the request function:

kdev_t rq_dev;

The device accessed by the request. The same request function is used for every device managed by the driver. A single request function deals with all the minor numbers; rq_dev can be used to extract the minor device being acted upon. Although Linux 1.2 called this field dev, you can access this field through the macro CURRENT_DEV, which is portable to any kernel version in the range we are addressing.

int cmd;

This field is either READ or WRITE.

unsigned long sector;

The first sector the request refers to.

unsigned long current_nr_sectors; , unsigned long nr_sectors;

The number of sectors (the size) of the current request. The driver should refer to current_nr_sectors and ignore nr_sectors (which is listed here just for completeness). See the next section, Section 12.3.2, for more detail.

char *buffer;

The area in the buffer cache to which data should be written (cmd==READ) or from which data should be read (cmd==WRITE).

struct buffer_head *bh;

The structure describing the first buffer in the list for this request. We’ll use this field in Section 12.3.2.

There are other fields in the structure, but they are primarily meant for internal use in the kernel; the driver is not expected to use them.

The implementation for the working request function in the sbull device is shown below. In the following code, sbull_devices is like scull_devices, introduced in Section 3.5.1 in Chapter 3.

void sbull_request(void)
{
    Sbull_Dev *device;
    u8 *ptr;
    int size;

    while(1) {
        INIT_REQUEST;

        /* Check if the minor number is in range */
        if (DEVICE_NR(CURRENT_DEV) > sbull_devs) {
            static int count = 0;
            if (count++ < 5)  /* print the message at most 5 times */
                printk(KERN_WARNING
                           "sbull: request for unknown device
");
            end_request(0);
            continue;
        }

        /* pointer to device structure, from the global array */
        device = sbull_devices + DEVICE_NR(CURRENT_DEV);
        ptr = device->data + CURRENT->sector * sbull_hardsect;
        size = CURRENT->current_nr_sectors * sbull_hardsect;
        if (ptr + size > device->data + sbull_blksize*sbull_size) {
            static int count = 0;
            if (count++ < 5)
                printk(KERN_WARNING
                           "sbull: request past end of device
");
            end_request(0);
            continue;
        }

        switch(CURRENT->cmd) {
          case READ:
            /* from sbull to buffer */
            memcpy(CURRENT->buffer, ptr, size);
            break;
          case WRITE:
            /* from buffer to sbull */
            memcpy(ptr, CURRENT->buffer, size);
            break;
          default:
            /* can't happen */
            end_request(0);
            continue;
        }


        end_request(1); /* success */
    }
}

Since sbull is just a RAM disk, its ``data transfer'' reduces to a memcpy call. The only ``strange'' feature of the function is the conditional statement that limits it to reporting five errors. This is intended to avoid clobbering the system logs with too many messages, since end_request(0) already prints an ``I/O error'' message when the request fails. The static counter is a standard way to limit message reporting and is used several times in the kernel.

Clustered Requests

Each iteration of the loop in the request function above transfers a number of sectors--usually the number of sectors that equals a ``block'' of data, according to the use of such data. For instance, swapping is performed PAGE_SIZE bytes at a time, while an extended-2 filesystem transfers 1KB blocks.

Although a block is the most convenient data size for I/O, you can get a significant performance boost by clustering the reading or writing of adjacent blocks. In this context, ``adjacent'' refers to the location of blocks on the disk, while ``consecutive'' refers to consecutive memory areas.

There are two advantages to clustering adjacent blocks. First, clustering speeds up the transfer (for example, the floppy driver assembles adjacent blocks and transfers a whole track at a time). It can also save memory in the kernel by avoiding allocation of redundant request structures.

You can, if you want, completely ignore clustering. The skeletal request function shown above works flawlessly, independent of clustering. If you want to exploit clustering, on the other hand, you need to deal in greater detail with the internals of struct_request.

Unfortunately, all kernels I know of (up to at least 2.1.51) don’t perform clustering for custom drivers, just for internal drivers like SCSI and IDE. If you aren’t interested in the internals of the kernel, you can skip the rest of this section. On the other hand, clustering might be available to modules in the future, and it is an interesting way to increase data-transfer performance by reducing inter-request delays for adjacent sectors.

Before I describe how a driver can exploit clustered requests, let’s look at what happens when a request is queued.

When the kernel requests the transfer of a data block, it scans the linked list of active requests for the target device. If the new block is adjacent on the disk to a block that has already been requested, the new block is clustered to the first block; the existing request is enlarged without creating a new one.

Unfortunately, the fact that the contents of two data buffers are adjacent on disk doesn’t necessarily mean that they are consecutive in memory. This observation, plus the need to efficiently manage the buffer cache, led to the creation of a buffer_head structure. One buffer_head is associated with each data buffer.

A ``clustered'' request, then, is a single request_struct that refers to a linked list of buffer_head structures. The end_request function takes care of this problem, and that’s why the request function shown earlier works independent of clustering. In other words, end_request either cleans up the current request and prepares to service the next one, or prepares to deal with the next buffer in the same request. Clustering is therefore transparent to the device driver that doesn’t care about it; the sbull function above is such an example.

A driver may want to benefit from clustering by dealing with the whole linked list of buffer heads at each pass through the loop in its request_fn function. To do this, the driver should refer to both CURRENT->current_nr_sectors (the field I already used above in sbull_request) and CURRENT->nr_sectors, which contains the number of adjacent sectors that are clustered in the ``current'' list of buffer_heads.

The current buffer head is CURRENT->bh, while the data block is CURRENT->bh->b_data. The latter pointer is cached in CURRENT->buffer for drivers like sbull that ignore clustering.

Request clustering is implemented in drivers/block/ll_rw_block.c, in the function make_request; however, as suggested above, clustering is performed only for a few drivers (floppy, IDE, and SCSI), according to their major number. I’ve been able to see how clustering works by loading sbull with major=34 because 34 is IDE3_MAJOR, and I don’t have the third IDE controller on my system.[30]

The following list summarizes what needs to be done when scanning a clustered request. bh is the buffer head being processed--the first in the list. For every buffer head in the list, the driver should carry out the following sequence of operations:

  • Transfer the data block at address bh->b_data, of size bh->b_size bytes. The direction of the data transfer is CURRENT->cmd, as usual.

  • Retrieve the next buffer head in the list: bh->b_reqnext. Then detach the buffer just transferred from the list, by zeroing its b_reqnext--the pointer to the new buffer you just retrieved.

  • Tell the kernel you’re done with the previous buffer, by calling mark_buffer_uptodate(bh,1); unlock_buffer(bh);. These calls guarantee that the buffer cache is kept sane, without wild pointers lying around. The ``1'' argument to mark_buffer_uptodate indicates success; if the transfer failed, substitute ``0''.

  • Loop back to the beginning to transfer the next adjacent block.

When you are done with the clustered request, CURRENT->bh must be updated to point to the first buffer that was ``processed but not unlocked.'' If all the buffers in the list were processed and unlocked, CURRENT->bh can be set to NULL.

At this point, the driver can call end_request. If CURRENT->bh is valid, the function unlocks it before moving to the next buffer--this is what happens for non-clustered operation, where end_request takes care of everything. If the pointer is NULL, the function just moves to the next request.

A full-featured implementation of clustering appears in drivers/block/floppy.c, while a summary of the operations required appears in end_request, in blk.h. Neither floppy.c nor blk.h are easy to understand, but the latter is a better place to start.



[30] While this is a handy trick to play dirty games on one’s home computer, I strongly discourage doing it in a production driver.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.6.114