Chapter 12. Loading Block Drivers

As outlined in Section 1.3, in Chapter 1, Unix device drivers are not limited to char drivers. This chapter introduces the second main class of device drivers—block drivers. A block-oriented device is one that transfers data only in blocks (for example, the floppy disk or the hard drive), where the hardware block is usually called a ``sector.'' The word ``block,'' on the other hand, will be used to denote a software concept: the driver often uses 1KB blocks even if the sector size is 512 bytes.

In this chapter, we’ll build a full-featured block driver called sbull, short for ``Simple Block Utility for Loading Localities.'' This driver is similar to scull in that it uses the computer’s memory as the hardware device. In other words, it’s a RAM-disk driver. sbull can be executed on any Linux computer (although I have been able to test it only on a limited set of platforms).

If you need to write a driver for a real block device, the information in Chapter 8, and Chapter 9, should be used to supplement this chapter.

Registering the Driver

Like a char driver, a block driver in the kernel is identified by a major number. The functions used to register and unregister the driver are:

int register_blkdev(unsigned int major, const char *name, 
                    struct file_operations *fops);
int unregister_blkdev(unsigned int major, const char *name);

The meaning of the arguments is the same as for char drivers, and dynamic assignment of the major number can be performed in the same way. Therefore, the sbull device registers itself just like scull:

result = register_blkdev(sbull_major, "sbull", &sbull_fops);
if (result < 0) {
    printk(KERN_WARNING "sbull: can't get major %d
",sbull_major);
    return result;
}
if (sbull_major == 0) sbull_major = result; /* dynamic */
major = sbull_major; /* Use 'major' later on to save typing */

The fops argument to register_blkdev is similar to the one we used for char drivers. The operations for read, write, and fsync, however, need not be driver-specific. The general functions block_read, block_write, and block_fsync are always used in place of the driver-specific functions. In addition, check_media_change and revalidate make sense for a block driver, and both are defined in sbull_fops.

The fops structure used in sbull is:

struct file_operations sbull_fops = {
    NULL,          /* lseek: default */
    block_read,
    block_write,
    NULL,          /* sbull_readdir */
    NULL,          /* sbull_select */
    sbull_ioctl,
    NULL,          /* sbull_mmap */
    sbull_open,
    sbull_release,
    block_fsync,
    NULL,          /* sbull_fasync */
    sbull_check_change,
    sbull_revalidate
};

General read and write operations are used to achieve better performance. The speed-up is achieved through buffering, which is not available to char drivers. Block drivers can be buffered because their data serves the computer’s file hierarchy, and is never accessed directly by the applications, while data belonging to char drivers is.

However, when the buffer cache cannot satisfy a read request or pending writes must be flushed to the physical disk, the driver must be called to perform the actual data transfer. The fops structure doesn’t carry an entry point other than read and write, so an additional structure, blk_dev_struct, is used to deliver requests for actual data transfer.

This structure is defined in <linux/blkdev.h> and has several fields, but only the first field needs to be set by the driver.

This is the definition of the structure as found in 2.0 kernels:

struct blk_dev_struct {
    void (*request_fn)(void);
    struct request * current_request;
    struct request   plug;
    struct tq_struct plug_tq;
};

extern struct blk_dev_struct blk_dev[MAX_BLKDEV];

When the kernel needs to spawn an I/O operation for the sbull device, it calls the function blk_dev[sbull_major].request_fn. The initialization function for this module should therefore set this field to point to its own request function. The remaining fields of the structure are used internally by the kernel functions and macros; you don’t need to explicitly refer to them in your code.

The relationship between a block-driver module and the kernel is shown in Figure 12.1.

Registering a block device driver

Figure 12-1. Registering a block device driver

In addition to blk_dev, several other arrays hold information about block drivers. These arrays are indexed by major, and sometimes also minor, number. They are declared and described in drivers/block/ll_rw_block.c.

int blk_size[][];

This array is indexed by the major and minor numbers. It describes the size of each device, in kilobytes. If blk_size[major] is NULL, no checking is performed on the size of the device (i.e., the kernel might request data transfers past end-of-device).

int blksize_size[][];

The size of the block used by each device, in bytes. Like the previous one, this two-dimensional array is indexed by both major and minor numbers. If blksize_size[major] is a null pointer, a block size of BLOCK_SIZE (currently 1KB) is assumed. The block size for the device must be a power of two, because the kernel uses bit-shift operators to convert offsets to block numbers.

int hardsect_size[][];

Like the others, this data structure is indexed by the major and minor numbers. The default value for the hardware sector size is 512 bytes. Up to and including version 2.0.x, variable sector sizes aren’t really supported, because some kernel code still assumes that the sector size is half a kilobyte; it’s nonetheless very likely that variable sector size will be truly implemented in version 2.2.

int read_ahead[];

This array is indexed by the major number and defines the number of sectors to be read in advance by the kernel when a file is being read sequentially. Reading data before a process asks for it helps system performance and overall throughput. A slower device should specify a bigger read-ahead value, while fast devices will be happy even with a smaller value. The bigger the read-ahead value, the more memory the buffer cache uses. There is one read-ahead value for each major number, and it applies to all its minor numbers. The value can be changed via the driver’s ioctl method; hard-disk drivers usually set it to 8 sectors, which corresponds to 4KB.

The sbull device allows you to set these values at load time, and they apply to all the minor numbers of the sample driver. The variable names and their default values in sbull are as follows:

size=2048 (kilobytes)

Each ramdisk created by sbull takes two megabytes of RAM.

blksize=1024 (bytes)

The software ``block'' used by the module is one kilobyte, like the system default.

hardsect=512 (bytes)

The sbull sector size is the usual half-kilobyte value. Changing hardsect is disabled because, as mentioned above, other sector sizes aren’t supported. If you try to change it anyway, by removing the security check in sbull/sbull.c, be prepared to experience severe memory corruption unless variable sector-size support has been added by the time you try it.

rahead=2 (sectors)

Since the RAM disk is a fast device, the default read-ahead value is small.

The sbull device also allows you to choose the number of devices to install. devs, the number of devices, defaults to 2, resulting in a default memory usage of 4 megs--2 disks at 2 megs each.

The implementation of init_module for the sbull device is as follows (excluding registration of the major number and error recovery):

blk_dev[major].request_fn = sbull_request;
read_ahead[major] = sbull_rahead;
result = -ENOMEM; /* for the possible errors */

sbull_sizes = kmalloc(sbull_devs * sizeof(int), GFP_KERNEL);
if (!sbull_sizes)
    goto fail_malloc;
for (i=0; i < sbull_devs; i++) /* all the same size */
    sbull_sizes[i] = sbull_size;
blk_size[major]=sbull_sizes;

sbull_blksizes = kmalloc(sbull_devs * sizeof(int), GFP_KERNEL);
if (!sbull_blksizes)
    goto fail_malloc;
for (i=0; i < sbull_devs; i++) /* all the same blocksize */
    sbull_blksizes[i] = sbull_blksize;
blksize_size[major]=sbull_blksizes;
 sbull_hardsects = kmalloc(sbull_devs * sizeof(int), GFP_KERNEL);
if (!sbull_hardsects)
    goto fail_malloc;
for (i=0; i < sbull_devs; i++) /* all the same hardsect */
    sbull_hardsects[i] = sbull_hardsect;
hardsect_size[major]=sbull_hardsects;

The corresponding cleanup function looks like this:

for (i=0; i<sbull_devs; i++)
    fsync_dev(MKDEV(sbull_major, i)); /* flush the devices */

blk_dev[major].request_fn = NULL;
read_ahead[major] = 0;
kfree(blk_size[major]);
blk_size[major] = NULL;
kfree(blksize_size[major]);
blksize_size[major] = NULL;
kfree(hardsect_size[major]);
hardsect_size[major] = NULL;

Here, the call to fsync_dev is needed to free all references to the device that the kernel keeps in various caches. Actually, fsync_dev is the engine that operates behind block_fsync, which is the fsync ``method'' for block devices.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.79.60