As outlined in Section 1.3, in Chapter 1, Unix device drivers are not limited to char drivers. This chapter introduces the second main class of device drivers—block drivers. A block-oriented device is one that transfers data only in blocks (for example, the floppy disk or the hard drive), where the hardware block is usually called a ``sector.'' The word ``block,'' on the other hand, will be used to denote a software concept: the driver often uses 1KB blocks even if the sector size is 512 bytes.
In this chapter, we’ll build a full-featured block driver called sbull, short for ``Simple Block Utility for Loading Localities.'' This driver is similar to scull in that it uses the computer’s memory as the hardware device. In other words, it’s a RAM-disk driver. sbull can be executed on any Linux computer (although I have been able to test it only on a limited set of platforms).
If you need to write a driver for a real block device, the information in Chapter 8, and Chapter 9, should be used to supplement this chapter.
Like a char driver, a block driver in the kernel is identified by a major number. The functions used to register and unregister the driver are:
int register_blkdev(unsigned int major, const char *name, struct file_operations *fops); int unregister_blkdev(unsigned int major, const char *name);
The meaning of the arguments is the same as for char drivers, and dynamic assignment of the major number can be performed in the same way. Therefore, the sbull device registers itself just like scull:
result = register_blkdev(sbull_major, "sbull", &sbull_fops); if (result < 0) { printk(KERN_WARNING "sbull: can't get major %d ",sbull_major); return result; } if (sbull_major == 0) sbull_major = result; /* dynamic */ major = sbull_major; /* Use 'major' later on to save typing */
The fops
argument to register_blkdev is similar to
the one we used for char drivers. The operations for read,
write, and fsync, however, need not be driver-specific. The
general functions block_read, block_write, and
block_fsync are always used in place of the driver-specific
functions. In addition,
check_media_change and revalidate make sense for a block
driver, and both are defined in sbull_fops
.
The fops
structure used in sbull is:
struct file_operations sbull_fops = { NULL, /* lseek: default */ block_read, block_write, NULL, /* sbull_readdir */ NULL, /* sbull_select */ sbull_ioctl, NULL, /* sbull_mmap */ sbull_open, sbull_release, block_fsync, NULL, /* sbull_fasync */ sbull_check_change, sbull_revalidate };
General read and write operations are used to achieve better performance. The speed-up is achieved through buffering, which is not available to char drivers. Block drivers can be buffered because their data serves the computer’s file hierarchy, and is never accessed directly by the applications, while data belonging to char drivers is.
However, when the buffer cache cannot satisfy a read request or
pending writes must be flushed to the physical disk,
the driver must be called to perform the actual data transfer. The
fops
structure doesn’t carry an entry point other than
read and write, so an
additional structure, blk_dev_struct
, is used to deliver requests
for actual data transfer.
This structure is defined in <linux/blkdev.h>
and has
several fields, but only the first field needs to be set by the driver.
This is the definition of the structure as found in 2.0 kernels:
struct blk_dev_struct { void (*request_fn)(void); struct request * current_request; struct request plug; struct tq_struct plug_tq; }; extern struct blk_dev_struct blk_dev[MAX_BLKDEV];
When the kernel needs to spawn an I/O operation for the
sbull device, it calls the function
blk_dev[sbull_major].request_fn
. The initialization
function for this module should therefore set this field to point to
its own request function. The remaining fields of the structure are
used internally by the kernel functions and macros; you don’t need to
explicitly refer to them in your code.
The relationship between a block-driver module and the kernel is shown in Figure 12.1.
In addition to blk_dev
, several other arrays hold
information about block drivers. These arrays are indexed by major,
and sometimes also minor, number. They are declared and described in
drivers/block/ll_rw_block.c
.
int blk_size[][];
This array is indexed by the major and minor numbers. It
describes the size of each device, in kilobytes. If
blk_size[major]
is NULL
, no checking
is performed on the size of the device (i.e., the kernel might
request data transfers past end-of-device).
int blksize_size[][];
The size of the block used by each device, in
bytes. Like the previous one, this two-dimensional array is
indexed by both major and minor numbers. If
blksize_size[major]
is a null pointer, a block
size of BLOCK_SIZE
(currently 1KB) is assumed. The
block size for the device must be a power of two,
because the kernel uses bit-shift operators to convert offsets
to block numbers.
int hardsect_size[][];
Like the others, this data structure is indexed by the major and minor numbers. The default value for the hardware sector size is 512 bytes. Up to and including version 2.0.x, variable sector sizes aren’t really supported, because some kernel code still assumes that the sector size is half a kilobyte; it’s nonetheless very likely that variable sector size will be truly implemented in version 2.2.
int read_ahead[];
This array is indexed by the major number and defines the number of sectors to be read in advance by the kernel when a file is being read sequentially. Reading data before a process asks for it helps system performance and overall throughput. A slower device should specify a bigger read-ahead value, while fast devices will be happy even with a smaller value. The bigger the read-ahead value, the more memory the buffer cache uses. There is one read-ahead value for each major number, and it applies to all its minor numbers. The value can be changed via the driver’s ioctl method; hard-disk drivers usually set it to 8 sectors, which corresponds to 4KB.
The sbull device allows you to set these values at load time, and they apply to all the minor numbers of the sample driver. The variable names and their default values in sbull are as follows:
size=2048
(kilobytes)
Each ramdisk created by sbull takes two megabytes of RAM.
blksize=1024
(bytes)
The software ``block'' used by the module is one kilobyte, like the system default.
hardsect=512
(bytes)
The sbull sector size is the usual half-kilobyte
value. Changing hardsect
is disabled because, as mentioned
above, other sector sizes aren’t supported. If you try to
change it anyway, by removing the
security check in sbull/sbull.c
, be prepared to
experience severe memory corruption unless variable
sector-size support has been added by the time you try it.
rahead=2
(sectors)
Since the RAM disk is a fast device, the default read-ahead value is small.
The sbull device also allows you to choose the number of
devices to install. devs
, the number of devices, defaults to 2,
resulting in a default memory usage of 4 megs--2 disks at 2 megs each.
The implementation of init_module for the sbull device is as follows (excluding registration of the major number and error recovery):
blk_dev[major].request_fn = sbull_request; read_ahead[major] = sbull_rahead; result = -ENOMEM; /* for the possible errors */ sbull_sizes = kmalloc(sbull_devs * sizeof(int), GFP_KERNEL); if (!sbull_sizes) goto fail_malloc; for (i=0; i < sbull_devs; i++) /* all the same size */ sbull_sizes[i] = sbull_size; blk_size[major]=sbull_sizes; sbull_blksizes = kmalloc(sbull_devs * sizeof(int), GFP_KERNEL); if (!sbull_blksizes) goto fail_malloc; for (i=0; i < sbull_devs; i++) /* all the same blocksize */ sbull_blksizes[i] = sbull_blksize; blksize_size[major]=sbull_blksizes; sbull_hardsects = kmalloc(sbull_devs * sizeof(int), GFP_KERNEL); if (!sbull_hardsects) goto fail_malloc; for (i=0; i < sbull_devs; i++) /* all the same hardsect */ sbull_hardsects[i] = sbull_hardsect; hardsect_size[major]=sbull_hardsects;
The corresponding cleanup function looks like this:
for (i=0; i<sbull_devs; i++) fsync_dev(MKDEV(sbull_major, i)); /* flush the devices */ blk_dev[major].request_fn = NULL; read_ahead[major] = 0; kfree(blk_size[major]); blk_size[major] = NULL; kfree(blksize_size[major]); blksize_size[major] = NULL; kfree(hardsect_size[major]); hardsect_size[major] = NULL;
Here, the call to fsync_dev is needed to free all references to the device that the kernel keeps in various caches. Actually, fsync_dev is the engine that operates behind block_fsync, which is the fsync ``method'' for block devices.
3.17.79.60