The files in the mm
directory implement the
architecture-independent portion of memory management for the Linux
kernel. This directory contains the functions for paging,
allocation and deallocation of memory, and the various techniques that allow
user processes to map memory ranges to their address space.
Surprisingly, swap.c
doesn’t actually implement the swapping
algorithm. Instead, it deals with the kernel command-line options
swap=
and buff=
. These options can also be tuned
via the sysctl system call or by writing to the
/proc/sys/vm
files.
swap_state.c
is in charge of maintaining the swap cache
and is the most difficult file in this directory; I won’t go into
detail about it, as it’s hard to understand its design, unless
a good knowledge of the relevant data structures and policies has
been developed in advance.
swapfile.c
implements the management of swap files and devices.
The swapon and swapoff system calls are defined here,
the latter being very difficult code. For a comparison, several Unix
systems don’t implement swapoff, and can’t stop
swapping to a device or file without
rebooting. swapfile.c
also declares get_swap_page, which
retrieves a free page from the swap pool.
vmscan.c
is the code that implements paging policies.
The kswapd daemon is defined in this file, as well as all the
functions that scan memory and the running processes looking for pages
to swap out.
Finally, page_io.c
implements the low-level data transfer
to and from swap space. The file manages the locking needed to assure
system coherence and provides both synchronous and asynchronous I/O.
It also deals with problems related to the different block sizes used
by different devices. (In the early versions of Linux, it was impossible
to swap to a FAT partition, because 512-byte blocks were not supported.)
The memory allocation techniques described in Chapter 7
are all implemented in the mm
directory. Let’s start once
again with the most frequently used function: kmalloc.
kmalloc.c
implements the allocation and freeing of memory
areas. The memory pool for kmalloc is made up of ``buckets,''
where each bucket is a list of memory areas of the same size. The
primary function of kmalloc.c
is to manage the linked lists for
each bucket.
When new pages are needed or pages are freed, the file makes use of
functions defined in page_alloc.c
. Pages are retrieved from free
memory by __get_free_pages, which is a short function that
extracts pages from the free-page lists. If there’s no memory available on
the free lists, try_to_free_pages (vmscan.c
) is called.
vmalloc.c
implements the vmalloc,
vremap, and vfree functions. vmalloc returns
contiguous memory in the kernel virtual address space, while
vremap gives a new virtual address to a specific physical
address; it is used mainly to access PCI buffers in high memory. As its
name implies, vfree frees memory.
The most important functions of Linux memory management are part
of the memory.c
file. These functions are generally
not accessible through system calls, because they deal with the
hardware paging mechanisms.
Module writers, on the other hand, do use some of these
functions. verify_area and remap_page_range are defined
in memory.c
. Other interesting functions are do_wp_page
and do_no_page, which implement the kernel’s response to minor and
major page faults. The remaining functions in the file deal with page
tables and are extremely low-level.
Memory mapping is the other big task performed by files in the mm
directory. filemap.c
is a complex piece of code. It
implements memory mapping of regular files, providing the ability to
support shared mappings. Mapped files are supported by means of
special struct vm_operations
structures
for the mapped pages, as described in "Section 13.1.2" in Chapter 13. This source
also deals with asynchronous read-ahead; comments explain the
meaning of the four read-ahead fields in struct file
. The
only system call that appears in this file is sys_msync.
The top-level mmap interface to memory mapping (i.e.,
do_mmap) appears in mmap.c
. This file begins by defining
the brk system call, which is used by a process to request that
its highest-allowed virtual address be increased or decreased.
The sys_brk code is informative, even if you’re not a
master of memory management.
The rest of mmap.c
is centered on do_mmap and
do_munmap. Memory mapping works, as you might expect, through
filp->f_op
, though filp
can be NULL
for
do_mmap. This is how brk allocates new virtual space. It
falls back on memory-mapping the zero-page without needing
special code.
mremap.c
includes sys_mremap. It is an easy file to
read if you’ve figured out mmap.c
.
The four system calls related to memory locking and unlocking are
defined in mlock.c
, which is a rather simple source. Similarly,
mprotect.c
is in charge of performing sys_mprotect. The
files are similar in design, because they both modify the system
flags associated with the process’s pages.
3.15.25.32