Reading and writing a scull device means transferring data between the kernel address space and the user address space. The operation cannot be carried out through pointers in the usual way, or through memcpy, because pointers operate in the current address space, and the driver’s code is executing in kernel space, while the data buffers are in user space.
If the target device is an expansion board instead of RAM, the same problem arises, because the driver must nonetheless copy data between user buffers and kernel space. In fact, the role of a device driver is mainly managing data transfers between devices (kernel space) and applications (user space).
Cross-space copy is performed in Linux by special functions, which
are defined in <asm/segment.h>
.
The functions devoted to performing such a copy are
optimized for different data sizes (char
, short
, int
, long
);
most of them will be introduced in
Section 5.1.4 in Chapter 5.
Driver code for read and write in scull needs to copy a whole segment of data to or from the user address space. This capability is offered by the following functions, which copy an arbitrary array of bytes:
void memcpy_fromfs(void *to, const void *from, unsigned long count); void memcpy_tofs(void *to, const void *from, unsigned long count);
The names of the functions date back to the first Linux versions, when the only supported architecture was the i386 and there was a lot of assembler code peeking through the C. On Intel platforms, Linux addresses user space through the FS segment register, and the two functions have kept the old name through Linux 2.0. Things did change with Linux 2.1, but 2.0 is the main target of this book. See Section 17.3 in Chapter 17 for details.
Although the functions introduced above look like normal
memcpy functions, a little extra care must be used when accessing
user space from kernel code; the user pages being addressed might not be currently
present in memory, and the page-fault handler can put the process to
sleep while the page is being transferred into place. This happens, for example,
when the page must be retrieved from swap space. The net result for the
driver writer is that any function that accesses user space must be
reentrant and must be able to execute concurrently with other
driver functions. That’s why the scull implementation refuses
to release device memory when dev->usage
is not 0: the read
and write methods increment the usage
counter
before using either memcpy function.
As far as the actual device methods are concerned, the task of the read method is to copy data from the device to user space (using memcpy_tofs), while the write method must copy data from user space to the device (using memcpy_fromfs). Each read or write system call requests transfer of a specific number of bytes, but the driver is free to transfer less data--the exact rules are slightly different for reading and writing.
Both read and write return a negative value if an error occurs. A number greater than or equal to zero tells the calling program how many bytes have been successfully transferred. If some data is transferred correctly and then an error happens, the return value must be the count of bytes successfully transferred, while the error does not get reported until the next time the function is called.
The role of the different arguments to read is depicted in Figure 3.2.
While kernel functions return a negative number to signal an error,
and the value of the number indicates the kind of error that
occurred (as introduced in Chapter 2,
in Section 2.4.1), programs that run in
user space always see -1 as the error return value. These application
programs need to access the errno
variable to find out
what happened. The difference in behavior is dictated by the
library conventions on one hand and the advantage of not dealing with
errno
in the kernel on the other hand.
As far as portability is concerned, it’s interesting to note that
the count
argument to both the read and write
methods has always been int
, but changed to unsigned long
with release 2.1.0 of the kernel. Also,
the return value for the
methods has been changed from int
to long
, because it
represents either a count
or a negative error code.
This type change is beneficial: unsigned long
is a
better choice than int
for a count
item because of its
wider range. The choice is so good that the Alpha team changed the
typing before 2.1 was released (mainly because the GNU C library uses
unsigned long
in its definition of system calls).
Although beneficial, this change introduces some platform
dependency in driver code. To circumvent the
problem, all the sample modules available on the O’Reilly FTP site use
the following definitions (from sysdep.h
):
#if defined(__alpha__) || (LINUX_VERSION_CODE >= VERSION_CODE(2,1,0)) # define count_t unsigned long # define read_write_t long #else # define count_t int # define read_write_t int #endif
After the macros have been evaluated, the count
argument to
read and write is always declared as count_t
,
and the return value as read_write_t
. I chose to use a
preprocessor definition instead of typedef
because the
typedef
introduces more compiler warnings than it removes (see
Section 10.3 in Chapter 10).
On the other hand, an uppercase type name
in function prototypes is really bad-looking, so I named
the new ``type'' using the standard typedef
convention.
Portability to version 2.1 is more thoroughly described in Chapter 17.
The return value for read is interpreted by the calling program as follows:
If the value equals the count
argument passed
to the read system call, the requested number of bytes has
been transferred. This is the optimal case.
If the value is positive, but smaller than count
,
only part of the data has been transferred. This may happen
for a number of reasons, depending on the device. Most often,
the program will retry the read. For instance, if you
read using the fread function, the library function
reissues the system call till completion of the requested
data transfer.
If the value is zero, it is interpreted to mean that end-of-file was reached.
A negative value means there was an error. The
value specifies what the error was, according to
<linux/errno.h>
.
What is missing from the preceding table is the case of ``there is no data, but it may arrive later.'' In this case, the read system call should block. We won’t deal with blocking input until Section 5.2 in Chapter 5.
The scull code takes advantage of these rules. In particular, it takes advantage of the partial-read rule. Each invocation of scull_read deals only with a single data quantum, without implementing a loop to gather all the data; this makes the code shorter and easier to read. If the reading program really wants more data, it reiterates the call. If the standard library is used to read the device, the application won’t even notice the quantization of the data transfer.
If the current read position is greater than the device size, the read method of scull returns 0 to signal that there’s no data available (in other words, we’re at end-of-file). This situation can happen if process A is reading the device while process B opens it for writing, thus truncating the device to a length of 0. Process A suddenly finds itself past end-of-file, and the next read call returns 0.
Here is the code for read:
read_write_t scull_read (struct inode *inode, struct file *filp, char *buf, count_t count) { Scull_Dev *dev = filp->private_data; /* the first listitem */ int quantum = dev->quantum; int qset = dev->qset; int itemsize = quantum * qset; /* how many bytes in the listitem */ unsigned long f_pos = (unsigned long)(filp->f_pos); int item, s_pos, q_pos, rest; if (f_pos > dev->size) return 0; if (f_pos + count > dev->size) count = dev->size - f_pos; /* find listitem, qset index, and offset in the quantum */ item = f_pos / itemsize; rest = f_pos % itemsize; s_pos = rest / quantum; q_pos = rest % quantum; /* follow the list up to the right position (defined elsewhere) */ dev = scull_follow(dev, item); if (!dev->data) return 0; /* don't fill holes */ if (!dev->data[s_pos]) return 0; if (count > quantum - q_pos) count = quantum - q_pos; /* read only up to */ /* the end of this quantum */ dev->usage++; /* the following call may sleep */ memcpy_tofs(buf, dev->data[s_pos]+q_pos, count); dev->usage--; filp->f_pos += count; return count; }
write, like read, can transfer less data than was requested, according to the following rules for the return value:
If the value equals count
, the requested number of
bytes has been transferred.
If the value is positive, but smaller than count
,
only part of the data has been transferred. Again, the program
will most likely retry writing the rest of the data.
If the value is zero, nothing was written. This result is not an error, and there is no reason to return an error code. Once again, the standard library retries the call to write. We’ll examine the significance of this case in a later chapter, when blocking write is introduced.
A negative value means an error occurred; the semantics are the same as for read.
Unfortunately, there are a few misbehaving programs that
issue an error message and abort when a partial transfer is performed.
Most notably, a not-so-old version of the GNU file utilities has such a bug.
If your installation dates back to 1995 (for example, Slackware 2.3),
your cp will fail to handle scull. You’ll know you have this
version if you see the message /dev/scull0: no such file or directory
when cp writes
a data chunk bigger than the scull quantum. The GNU dd
implementation refuses to read or write partial blocks by design, and
cat refuses to write partial blocks. Therefore, cat shouldn’t
be used with the scull module and dd should be passed a
block size equal to scull’s quantum. Note that this limitation in the
scull implementation could be fixed, but I didn’t want to
complicate the code more than necessary.
The scull code for write deals with a single quantum at a time, as the read method does:
read_write_t scull_write (struct inode *inode, struct file *filp, const char *buf, count_t count) { Scull_Dev *dev = filp->private_data; Scull_Dev *dptr; int quantum = dev->quantum; int qset = dev->qset; int itemsize = quantum * qset; unsigned long f_pos = (unsigned long)(filp->f_pos); int item, s_pos, q_pos, rest; /* find listitem, qset index and offset in the quantum */ item = f_pos / itemsize; rest = f_pos % itemsize; s_pos = rest / quantum; q_pos = rest % quantum; /* follow the list up to the right position */ dptr = scull_follow(dev, item); if (!dptr->data) { dptr->data = kmalloc(qset * sizeof(char *), GFP_KERNEL); if (!dptr->data) return -ENOMEM; memset(dptr->data, 0, qset * sizeof(char *)); } if (!dptr->data[s_pos]) { dptr->data[s_pos] = kmalloc(quantum, GFP_KERNEL); if (!dptr->data[s_pos]) return -ENOMEM; } if (count > quantum - q_pos) count = quantum - q_pos; /* write only up to */ /*the end of this quantum */ dev->usage++; /* the following call may sleep */ memcpy_fromfs(dptr->data[s_pos]+q_pos, buf, count); dev->usage--; /* update the size */ if (dev->size < f_pos + count) dev-> size = f_pos + count; filp->f_pos += count; return count; }
3.144.84.155