A module can’t accomplish its task without using system resources, such as memory, I/O ports, and interrupt lines, as well as DMA channels if you use the mainboard’s DMA controller.
As a programmer, you are already accustomed to managing memory
allocation, and writing kernel code is no different in this regard.
Your program obtains a memory area using kmalloc and
releases it using kfree. These functions behave like malloc
and free, except that kmalloc takes an additional argument, the
priority. Most of the time, a priority of GFP_KERNEL
will
do. The GFP
acronym stands for ``Get Free Page.''
Requesting I/O ports and interrupt lines, on the other hand, looks strange at first, because normally a programmer accesses them with explicit instructions in the code, without telling the operating system about it. ``Allocating'' ports and interrupts is different from memory allocation in that memory is allocated from a pool, and every address behaves the same; I/O ports have individual roles, and a driver needs to work with specific ports, not just some ports.
The job of a typical driver is, for the most part, writing and reading ports. This is true both at initialization time and during normal work. A device driver must be guaranteed exclusive access to its ports in order to prevent interference from other drivers—if a module probing for its hardware should happen to write to ports owned by another device, weird things would undoubtedly happen.
The developers of Linux chose to implement a request/free mechanism for ports, mainly as a way to prevent collisions between different devices. However, unauthorized port access doesn’t produce any error condition equivalent to ``segmentation fault''—the hardware can’t enforce port registration.
Information about registered ports is available in text form in the
file /proc/ioports
, which looks like the following:
0040-005f : timer 0060-006f : kbd 0070-007f : rtc 00f0-00ff : npu 0170-0177 : ide1 01f0-01f7 : ide0 02f8-02ff : serial(auto) 0300-031f : NE2000 0376-0376 : ide1 03c0-03df : vga+ 03f6-03f6 : ide0
Each entry in the file specifies (in hex) a range of ports locked by a driver. No other driver should try to access those ports before they are released by the driver holding them.
Collision is avoided in two ways. First, a user adding a new device
to the system can check /proc/ioports
in order to configure the
new device to use free ports—this assumes the device is configured
by moving jumpers around. Later, when the software driver initializes
itself, it can autodetect the new device
without risking harm to other
devices: the driver won’t probe I/O ports already in use by other
drivers.
In practice, collision avoidance based on the I/O registry works best for modularized drivers, while it may fail for drivers directly linked into the kernel. Although we’re not concerned with such drivers, it’s worth noting that a driver initializing itself at boot time might misconfigure other devices by using ports that will be registered at a later time. Nonetheless, there’s no way for a compliant driver to interact with hardware that has already been configured in the system, unless the previously loaded driver didn’t register its ports. As a matter of fact, ISA probing is a risky task, and several drivers distributed with the official Linux kernel refuse to perform probing when loaded as modules, to avoid corrupting system operation by interacting with hardware whose module has yet to be loaded.
The problem with device probing is that the only way to identify the hardware is by trying to write and read the candidate ports—the processor (and thus any program) can only look at electric signals on its data lines. Driver writers know that if a device is connected to a particular port, it will reply to queries with particular codes. But if another device is connected to the port, it will nonetheless be written to, and nobody can foresee how it will react to the unexpected probing. Sometimes port probing can be avoided by reading the peripheral’s BIOS looking for a known string; this technique is exploited by several SCSI drivers, but not every device carries its own BIOS.
A compliant driver should call check_region to find out if a
range of ports is already locked by other drivers,
request_region to lock ports for later use, and
release_region when it’s done. The prototypes for these
functions reside in <linux/ioport.h>
.
The typical sequence for registering ports is the following (the function skull_probe_hw embeds device-specific code and is not shown here):
#include <linux/ioport.h> #include <linux/errno.h> static int skull_detect(unsigned int port, unsigned int range) { int err; if ((err=check_region(port,range)) < 0) return err; /* busy */ if (skull_probe_hw(port,range) != 0) return -ENODEV; /* not found */ request_region(port,range,"skull"); /* always succeeds */ return 0; }
Ports are released by cleanup_module:
static void skull_release(unsigned int port, unsigned int range) { release_region(port,range); }
A similar request/free policy is used for interrupt lines, but managing interrupts is trickier than handling ports, and a detailed explanation of the whole story is deferred to Chapter 9.
The request/free approach to resources is similar to the
register/unregister task described earlier for facilities and fits well
into the goto
-based implementation scheme already outlined.
The problems related to probing are not encountered by programmers writing drivers for PCI devices. I’ll introduce PCI in Chapter 15.
This section is quite technical, and can easily be skipped if you are not (yet) confident dealing with hardware issues.
On Intel platforms, target devices fitting into an ISA slot may offer on-board memory in the range 640KB-1MB (0xA0000 to 0xFFFFF); this is another kind of system resource used by a device driver.
This memory layout dates back to the old days of the 8086 processor, which could address only one megabyte of memory. The designers of the PC decided that the low 640KB would host the RAM, while the other 384KB would be reserved for ROM and memory-mapped devices. Nowadays even the most powerful personal computers still have this memory hole in the first meg. The PC version of Linux marks the region as reserved and simply doesn’t consider it. The code presented in this section of the book can be used to access the memory in this range, though its use is limited to the x86 platforms and to Linux kernels up to and including version 2.0.x, for any ``x.'' Version 2.1 changed the way physical memory is accessed, such that I/O memory in the 640KB-1MB range can’t be accessed in this way any more. The correct way to access I/O memory is the topic of the section Section 8.3.1, in Chapter 8, and is outside the scope of this chapter.
Although the kernel supports a request/free mechanism for ports and interrupts, it doesn’t currently support anything similar for I/O memory ranges, so you’re on your own. This won’t ever change, if I understand Linus’ attitude towards the PC architecture.
Sometimes it happens that a driver needs to detect ISA memory during initialization; for example, I needed to locate a free memory area to tell a frame grabber where to map the grabbed image. The problem is that, without probing, you can’t tell how address areas in that range are used. The probe needs to be able to identify three different cases: RAM is mapped to the address, ROM is there (the VGA BIOS, for example), or the area is free.
The skull sample source shows a way to deal with such memory, but since skull is not related to any physical device, it just prints information about the 640KB-1MB memory region and then exits. However, the code used to analyze memory is worth describing, because it must deal with race conditions. A race condition is a situation where two tasks might contend for the same resource, and where unsynchronized access can cause system damage.
While driver writers don’t need to handle multitasking, we must always remember that interrupts can happen in the middle of our code, and an interrupt handler can modify global items without telling us about it. Although the kernel offers several utilities to deal with race conditions, the following simple rules state the general way to deal with the problem; a complete treatment of the issue appears in the section Section 9.7, in Chapter 9.
If the shared item is read and not written, declare it
as volatile
, asking the compiler to skip
optimization. Thus, the compiled code actually reads the item
every time the source code reads it.
If the code needs to check and change the value, interrupts must be disabled during the operation to prevent other processes from changing the item after we have checked it, but before our change takes effect.
The suggested sequence for temporarily disabling interrupts is the following:
unsigned long flags; save_flags(flags); cli(); /* critical code */ restore_flags(flags);
where cli means ``clear interrupt flag.'' The functions
shown above are defined in <asm/system.h>
.
The classic sequence cli and sti should be avoided, as there are times when you can’t tell if interrupts are enabled before you disable them. Calling sti in such situations can lead to sporadic errors that are very difficult to track down.
The code to check for RAM segments makes use of both volatile
declarations and cli, because these regions can be identified only
by physically writing and rereading data, and real RAM might be changed
by other drivers in the middle of our tests during an interrupt. The
following code is not completely foolproof, because it might mistake
RAM memory on acquisition boards for empty regions if a device is
actively writing to its own memory while this code is scanning the
area. However, this situation is quite unlikely to happen.
In the source code below, each printk is prefixed with the
KERN_INFO
symbol. This symbol is a priority string that gets
concatenated to the format string and is defined in
<linux/kernel.h>
. Its expansion is similar to the
<1>
strings used in hello.c
at the beginning
of this chapter.
volatile unsigned char *ptr; /* pointed data is volatile */ unsigned char oldval, newval; /* values read from memory */ unsigned long flags; /* used to hold system flags */ unsigned long add, i; /* probe all the memory hole in 2KB steps */ for (add = 0xA0000; add < 0x100000; add += 2048) { ptr = (unsigned char *)add; save_flags(flags); cli(); oldval = *ptr; /* read a byte */ *ptr= oldval^0xff; /* change it */ newval=*ptr; /* re-read */ *ptr=oldval; /* restore */ restore_flags(flags); /* FIXME--user getmem_fromio or such */ if ((oldval^newval) == 0xff) { /* we re-read our change: it's ram */ printk(KERN_INFO "%lx: RAM ", (long)ptr); continue; } if ((oldval^newval) != 0) { /* random bits changed: it's empty */ printk(KERN_INFO "%lx: empty ",(long)ptr); continue; } /* * Expansion rom (executed at boot time by the bios) * has a signature of 0x55, 0xaa, and the third byte tells * the size of such rom */ if ( (*ptr == 0x55) && (*(ptr+1) == 0xaa)) { int size = 512 * *(ptr+2); printk(KERN_INFO "%lx: Expansion ROM, %i bytes ", (long)ptr, size); add += ((size+2047) & ~2047) -2048; /* skip it */ continue; } /* * If the tests above failed, we still don't know if it is ROM or * empty. Since empty memory can appear as 0x00, 0xff, or the low * address byte, we must probe multiple bytes: if at least one of * them is different from these three values, then this is rom * (though not boot rom). */ printk(KERN_INFO "%lx: ", (long)ptr); for (i=0; i<5; i++) { ptr+=57; /* a "random" value */ if (*ptr && *ptr!=0xFF && *ptr!=((long)ptr&0xFF)) break; } printk("%s ", i==5 ? "empty" : "ROM"); }
Detecting memory doesn’t cause collisions with other devices, as long as you take care to restore any byte you modified while you were probing.[8]
An attentive reader might ask now about ISA memory in the 15MB-16M address range. Unfortunately, that’s a more difficult issue, which we’ll discuss in the section Section 8.3.2, in Chapter 8.
[8] Note that in some cases writing to memory can have side effects, as some devices can map I/O registers to memory addresses. This and other considerations lead to the conclusion that the code just shown should definitely be avoided in production drivers. It is nonetheless a simple introductory module and is shown here as such.
18.221.13.173