Chapter 7. Getting Hold of Memory

Until now, we have always used kmalloc and kfree for memory allocation. However, sticking to these functions would be a simplistic approach to managing memory. This chapter describes other allocation techniques. We’re not interested yet in how the different architectures actually administer memory. Modules are not involved in issues of segmentation, paging, and so on, since the kernel offers a unified memory-management interface to the drivers. In addition, I won’t describe the internal details of memory management in this chapter, but will defer it to Section 13.1, in Chapter 13.

The Real Story of kmalloc

The kmalloc allocation engine is a powerful tool, and easily learned due to its similarity to malloc. The function is fast--unless it blocks--and it doesn’t clear the memory it obtains; the allocated region still holds its previous content. In the next few sections, I’ll talk in detail about kmalloc, so you can compare it to the memory allocation techniques that I’ll discuss later.

The Priority Argument

The first argument to kmalloc is the size, which I’ll talk about in the next section. The second argument, the priority, is much more interesting, because it causes kmalloc to modify its behavior when it has difficulty finding a page.

The most-used priority, GFP_KERNEL, means that the allocation (internally performed by calling get_free_pages, which explains the name) is performed on behalf of a process running in kernel space. In other words, this means that the calling function is executing a system call on behalf of a process. Using GFP_KERNEL allows kmalloc to delay returning if free memory is under the low-water mark, min_free_pages. In low-memory situations, the function puts the current process to sleep to wait for a page.

The new page can be retrieved in one of several ways. One way is by swapping out another page; since swapping takes time, the process waits for it to complete, while the kernel schedules other tasks. Therefore, every kernel function that calls kmalloc(GFP_KERNEL) should be reentrant. See Section 5.2.2 in Chapter 5, for more about reentrancy.

GFP_KERNEL isn’t always the right priority to use; sometimes kmalloc is called from outside a process’s context—this happens, for instance, in interrupt handlers, task queues, and kernel timers. In this case, the current process should not be put asleep, and GFP_ATOMIC must be used as kmalloc priority. Atomic allocations are allowed to use every free bit of memory, independent of min_free_pages. In fact, the only reason the low-water mark exists is to be able to fulfill atomic requests. The kernel isn’t allowed to swap out data or shrink filesystem buffers to fulfill the allocation request, so some real free memory has to be available.

Other priorities are defined for kmalloc, but they aren’t used often and some of them are used only in internal memory management algorithms. The only other value of some interest is GFP_NFS, which allows the NFS filesystem to shrink the free list slightly below min_free_pages before putting the process to sleep. Needless to say, using GFP_NFS instead of GFP_KERNEL in order to get a ``faster'' driver degrades overall system performance.

In addition to the conventional priorities, kmalloc also recognizes a bitfield: GFP_DMA. The GFP_DMA flag should be used together with GFP_KERNEL or GFP_ATOMIC to allocate pages suitable for Direct Memory Access (DMA). We’ll see how to use this flag in Section 13.3 in Chapter 13.

The Size Argument

The kernel manages the system’s physical memory, which is available only in page-sized chunks. This fact leads to a page-oriented allocation technique to obtain the maximum flexibility from the computer’s RAM. A simple linear allocation technique similar to that used by malloc wouldn’t work; a linear allocation pool is hard to maintain in a page-oriented environment like a Unix kernel. Hole management would soon become a problem, resulting in memory waste and performance penalties.

Linux addresses the problem of managing kmalloc’s needs by administering a page pool, so that pages can be added or removed from the pool easily. To be able to fulfill requests for more than PAGE_SIZE bytes, fs/kmalloc.c manages lists of page clusters. Each cluster holds a set of consecutive pages and is thus suitable for DMA allocations. I won’t talk about the low-level details, as the internal structures can change at any time without affecting the allocation semantics or the driver code. As a matter of fact, version 2.1.38 replaced the implementation of kmalloc with a completely new one. The 2.0 implementation of memory allocation can be seen in mm/kmalloc.c, while the new one lives in mm/slab.c. See Section 16.5.2 in Chapter 16, for a more complete overview of the 2.0 implementation.

The net result of the allocation policies used by Linux is that the kernel can allocate only certain predefined fixed-sized byte arrays. If you ask for an arbitrary amount of memory, you’re likely to get slightly more than you asked for.

The data sizes available are generally ``slightly less than a power of two'' (while the new implementation manages chunks of memory that are exactly a power of two). If you keep this fact in mind, you’ll use memory more efficiently. For example, if you need a buffer of about 2000 bytes and run Linux 2.0, you’re better off asking for 2000 bytes, rather than 2048. Requesting exactly a power of two is the worst possible case with any kernel older than 2.1.38--the kernel will allocate twice as much as you requested. This is why scull used 4000 bytes per quantum instead of 4096.

You can find the exact values used for the allocation blocks in mm/kmalloc.c (or mm/slab.c), but remember that they can change again without notice. The trick of allocating less than 4KB works with both the current 2.0 and 2.1 kernels, but it’s not guaranteed to be optimal in the future.

In any case, the maximum size that can be allocated by kmalloc in Linux 2.0 is slightly less than 32 pages--256KB on the Alpha or 128KB on the Intel and other architectures. The limit is 128KB for any platform with 2.1.38 and newer kernels. If you need more than a few kilobytes, however, there are better ways to obtain memory, as outlined below.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.217.144.32