Read and Write

Reading and writing a scull device means transferring data between the kernel address space and the user address space. The operation cannot be carried out through pointers in the usual way, or through memcpy, because pointers operate in the current address space, and the driver’s code is executing in kernel space, while the data buffers are in user space.

If the target device is an expansion board instead of RAM, the same problem arises, because the driver must nonetheless copy data between user buffers and kernel space. In fact, the role of a device driver is mainly managing data transfers between devices (kernel space) and applications (user space).

Cross-space copy is performed in Linux by special functions, which are defined in <asm/segment.h>. The functions devoted to performing such a copy are optimized for different data sizes (char, short, int, long); most of them will be introduced in Section 5.1.4 in Chapter 5.

Driver code for read and write in scull needs to copy a whole segment of data to or from the user address space. This capability is offered by the following functions, which copy an arbitrary array of bytes:

void memcpy_fromfs(void *to, const void *from, unsigned long count);
void memcpy_tofs(void *to, const void *from, unsigned long count);

The names of the functions date back to the first Linux versions, when the only supported architecture was the i386 and there was a lot of assembler code peeking through the C. On Intel platforms, Linux addresses user space through the FS segment register, and the two functions have kept the old name through Linux 2.0. Things did change with Linux 2.1, but 2.0 is the main target of this book. See Section 17.3 in Chapter 17 for details.

Although the functions introduced above look like normal memcpy functions, a little extra care must be used when accessing user space from kernel code; the user pages being addressed might not be currently present in memory, and the page-fault handler can put the process to sleep while the page is being transferred into place. This happens, for example, when the page must be retrieved from swap space. The net result for the driver writer is that any function that accesses user space must be reentrant and must be able to execute concurrently with other driver functions. That’s why the scull implementation refuses to release device memory when dev->usage is not 0: the read and write methods increment the usage counter before using either memcpy function.

As far as the actual device methods are concerned, the task of the read method is to copy data from the device to user space (using memcpy_tofs), while the write method must copy data from user space to the device (using memcpy_fromfs). Each read or write system call requests transfer of a specific number of bytes, but the driver is free to transfer less data--the exact rules are slightly different for reading and writing.

Both read and write return a negative value if an error occurs. A number greater than or equal to zero tells the calling program how many bytes have been successfully transferred. If some data is transferred correctly and then an error happens, the return value must be the count of bytes successfully transferred, while the error does not get reported until the next time the function is called.

The role of the different arguments to read is depicted in Figure 3.2.

The arguments to read

Figure 3-2. The arguments to read

While kernel functions return a negative number to signal an error, and the value of the number indicates the kind of error that occurred (as introduced in Chapter 2, in Section 2.4.1), programs that run in user space always see -1 as the error return value. These application programs need to access the errno variable to find out what happened. The difference in behavior is dictated by the library conventions on one hand and the advantage of not dealing with errno in the kernel on the other hand.

As far as portability is concerned, it’s interesting to note that the count argument to both the read and write methods has always been int, but changed to unsigned long with release 2.1.0 of the kernel. Also, the return value for the methods has been changed from int to long, because it represents either a count or a negative error code.

This type change is beneficial: unsigned long is a better choice than int for a count item because of its wider range. The choice is so good that the Alpha team changed the typing before 2.1 was released (mainly because the GNU C library uses unsigned long in its definition of system calls).

Although beneficial, this change introduces some platform dependency in driver code. To circumvent the problem, all the sample modules available on the O’Reilly FTP site use the following definitions (from sysdep.h):

#if defined(__alpha__) || (LINUX_VERSION_CODE >= VERSION_CODE(2,1,0))
# define count_t unsigned long
# define read_write_t long
#else
# define count_t int
# define read_write_t int
#endif

After the macros have been evaluated, the count argument to read and write is always declared as count_t, and the return value as read_write_t. I chose to use a preprocessor definition instead of typedef because the typedef introduces more compiler warnings than it removes (see Section 10.3 in Chapter 10). On the other hand, an uppercase type name in function prototypes is really bad-looking, so I named the new ``type'' using the standard typedef convention.

Portability to version 2.1 is more thoroughly described in Chapter 17.

The read Method

The return value for read is interpreted by the calling program as follows:

  • If the value equals the count argument passed to the read system call, the requested number of bytes has been transferred. This is the optimal case.

  • If the value is positive, but smaller than count, only part of the data has been transferred. This may happen for a number of reasons, depending on the device. Most often, the program will retry the read. For instance, if you read using the fread function, the library function reissues the system call till completion of the requested data transfer.

  • If the value is zero, it is interpreted to mean that end-of-file was reached.

  • A negative value means there was an error. The value specifies what the error was, according to <linux/errno.h>.

What is missing from the preceding table is the case of ``there is no data, but it may arrive later.'' In this case, the read system call should block. We won’t deal with blocking input until Section 5.2 in Chapter 5.

The scull code takes advantage of these rules. In particular, it takes advantage of the partial-read rule. Each invocation of scull_read deals only with a single data quantum, without implementing a loop to gather all the data; this makes the code shorter and easier to read. If the reading program really wants more data, it reiterates the call. If the standard library is used to read the device, the application won’t even notice the quantization of the data transfer.

If the current read position is greater than the device size, the read method of scull returns 0 to signal that there’s no data available (in other words, we’re at end-of-file). This situation can happen if process A is reading the device while process B opens it for writing, thus truncating the device to a length of 0. Process A suddenly finds itself past end-of-file, and the next read call returns 0.

Here is the code for read:

read_write_t scull_read (struct inode *inode, struct file *filp,
                         char *buf, count_t count)
{
    Scull_Dev *dev = filp->private_data; /* the first listitem */
    int quantum = dev->quantum;
    int qset = dev->qset;
    int itemsize = quantum * qset; /* how many bytes in the listitem */
    unsigned long f_pos = (unsigned long)(filp->f_pos);
    int item, s_pos, q_pos, rest;

    if (f_pos > dev->size)
        return 0;
    if (f_pos + count > dev->size)
        count = dev->size - f_pos;
    /* find listitem, qset index, and offset in the quantum */
    item = f_pos / itemsize;
    rest = f_pos % itemsize;
    s_pos = rest / quantum; q_pos = rest % quantum;

    /* follow the list up to the right position (defined elsewhere) */
    dev = scull_follow(dev, item);

    if (!dev->data)
        return 0; /* don't fill holes */
    if (!dev->data[s_pos])
        return 0;
    if (count > quantum - q_pos)
        count = quantum - q_pos; /* read only up to */
                                 /* the end of this quantum */

    dev->usage++; /* the following call may sleep */
    memcpy_tofs(buf, dev->data[s_pos]+q_pos, count);
    dev->usage--;

    filp->f_pos += count;
    return count;
}

The write Method

write, like read, can transfer less data than was requested, according to the following rules for the return value:

  • If the value equals count, the requested number of bytes has been transferred.

  • If the value is positive, but smaller than count, only part of the data has been transferred. Again, the program will most likely retry writing the rest of the data.

  • If the value is zero, nothing was written. This result is not an error, and there is no reason to return an error code. Once again, the standard library retries the call to write. We’ll examine the significance of this case in a later chapter, when blocking write is introduced.

  • A negative value means an error occurred; the semantics are the same as for read.

Unfortunately, there are a few misbehaving programs that issue an error message and abort when a partial transfer is performed. Most notably, a not-so-old version of the GNU file utilities has such a bug. If your installation dates back to 1995 (for example, Slackware 2.3), your cp will fail to handle scull. You’ll know you have this version if you see the message /dev/scull0: no such file or directory when cp writes a data chunk bigger than the scull quantum. The GNU dd implementation refuses to read or write partial blocks by design, and cat refuses to write partial blocks. Therefore, cat shouldn’t be used with the scull module and dd should be passed a block size equal to scull’s quantum. Note that this limitation in the scull implementation could be fixed, but I didn’t want to complicate the code more than necessary.

The scull code for write deals with a single quantum at a time, as the read method does:

read_write_t scull_write (struct inode *inode, struct file *filp,
                          const char *buf, count_t count)
{
    Scull_Dev *dev = filp->private_data;
    Scull_Dev *dptr;
    int quantum = dev->quantum;
    int qset = dev->qset;
    int itemsize = quantum * qset;
    unsigned long f_pos = (unsigned long)(filp->f_pos);
    int item, s_pos, q_pos, rest;

    /* find listitem, qset index and offset in the quantum */
    item = f_pos / itemsize;
    rest = f_pos % itemsize;
    s_pos = rest / quantum; q_pos = rest % quantum;

    /* follow the list up to the right position */
    dptr = scull_follow(dev, item);
    if (!dptr->data) {
        dptr->data = kmalloc(qset * sizeof(char *), GFP_KERNEL);
        if (!dptr->data)
            return -ENOMEM;
        memset(dptr->data, 0, qset * sizeof(char *));
    }
    if (!dptr->data[s_pos]) {
        dptr->data[s_pos] = kmalloc(quantum, GFP_KERNEL);
        if (!dptr->data[s_pos])
            return -ENOMEM;
    }
    if (count > quantum - q_pos)
        count = quantum - q_pos; /* write only up to */
                                 /*the end of this quantum */

    dev->usage++; /* the following call may sleep */
    memcpy_fromfs(dptr->data[s_pos]+q_pos, buf, count);
    dev->usage--;

    /* update the size */
    if (dev->size < f_pos + count)
        dev-> size = f_pos + count;
    filp->f_pos += count;
    return count;
}
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.84.155