The later section Section 16.7 explains what happens when pages are swapped out. As we indicated at the beginning of this chapter, swapping out pages is a last resort and appears as part of a general strategy to free memory that uses other tactics as well. In this section, we show how the kernel performs a swap out. This is achieved by a series of functions called in cascading fashion. Let’s start with the functions at the higher level.
The swap_out( )
function acts on a single
classzone
parameter that specifies the memory zone
from which pages should be swapped out (see Section 7.1.2). Two other parameters,
priority
and gfp_mask
, are not
used.
The swap_out( )
function scans existing memory
descriptors and tries to swap out the pages referenced in each
process’s Page Tables. It terminates as soon as one
of the following conditions occurs:
The function succeeds in releasing
SWAP_CLUSTER_MAX
page frames (by default, 32). A
page frame is considered released when it is removed from the Page
Tables of all processes that share it.
The function scans n memory descriptors, where n is the length of the memory descriptor list when the function starts.[111]
To ensure that all processes are evenly penalized by
swap_out( )
, the function starts scanning the list
from the memory descriptor that was last analyzed in the previous
invocation; the address of this memory descriptor is stored in the
swap_mm
global variable.
For each memory descriptor mm
to be considered,
the swap_out( )
function increments the usage
counter mm->mm_users
, thus ensuring that the
memory descriptor cannot disappear from the list while the swapping
algorithm is working on it. Then, swap_out( )
invokes the swap_out_mm( )
function, passing to it
the memory descriptor address mm
, the memory zone
classzone
, and the number of page frames still to
be released. Once swap_out_mm( )
returns,
swap_out( )
decrements the usage counter
mm->mm_users
, and then decides whether it
should analyze the next memory descriptor in the list or just
terminate.
swap_out_mm( )
returns the number of pages of the
process that owns the memory descriptor that the function has
released. The swap_out( )
function uses this value
to update a counter of how many pages have been released since the
beginning of its execution; if the counter reaches the value
SWAP_CLUSTER_MAX
, swap_out( )
terminates.
The swap_out_mm( )
function scans the memory
regions of the process that owns the memory descriptor
mm
passed as a parameter. Usually, the function
starts analyzing the first memory region object in the
mm->mmap
list (remember that they are ordered
by starting linear addresses). However, if mm
is
the memory descriptor that was analyzed last in the previous
invocation of swap_out( )
, swap_out_mm( )
does not restart from the first memory region, but from
the memory region that includes the linear address last analyzed in
the previous invocation. This linear address is stored in the
swap_address
field of the memory descriptor; if
all memory regions of the process have been analyzed, then the field
stores the conventional value TASK_SIZE
.
For each memory region of the process that owns the memory descriptor
mm
, swap_out_mm( )
invokes the
swap_out_vma( )
function, passing to it the number
of pages yet to be released, the first linear address to analyze, the
memory region object, and the memory descriptor. Again,
swap_out_vma( )
returns the number of released
pages belonging to the memory region. The loop of
swap_out_mm( )
continues until either the
requested number of pages is released or all memory regions are
considered.
The swap_out_vma( )
function checks that the
memory region is swappable (e.g., the flag
VM_RESERVED
is cleared). It then starts a sequence
in which it considers all entries in the process’s
Page Global Directory that refer to linear addresses in the memory
region. For each such entry, the function invokes the
swap_out_pgd( )
function, which in turn considers
all entries in a Page Middle Directory corresponding to address
intervals in the memory region. For each such entry,
swap_out_pgd( )
invokes the swap_out_pmd( )
function, which considers all entries in a Page Table
referencing pages in the memory region. Also, swap_out_pmd( )
invokes the try_to_swap_out( )
function, which finally attempts to swap out the page. As usual, this
chain of function invocations breaks as soon as the requested number
of released page frames is reached.
The try_to_swap_out( )
function attempts to free a
given page frame, either discarding or swapping out its contents. The
function returns the value 1 if it succeeds in releasing the page,
and 0 otherwise. Remember that by “releasing the
page,” we mean that the references to the page frame
are removed from the Page Tables of all processes that share the
page. In this case, however, the page frame is not necessarily
released to the buddy system; for instance, it could be referenced by
the swap cache.
The parameters of the function are:
mm
Memory descriptor address
vma
Memory region object address
address
Initial linear address of the page
page_table
Address of the Page Table entry that maps address
page
Page descriptor address
classzone
The memory zone from which pages should be swapped out
The try_to_swap_out( )
function uses the
Accessed
and Dirty
flags
included in the Page Table entry. We stated in Section 2.4.1 that the Accessed
flag is
automatically set by the CPU’s paging unit at every
read or write access, while the Dirty
flag is
automatically set at every write access. These two flags offer a
limited degree of hardware support that allows the kernel to use a
primitive LRU replacement algorithm.
try_to_swap_out( )
must recognize many different
situations demanding different responses, but the responses all share
many of the same basic operations. In particular, the function
performs the following steps:
Checks the Accessed
flag of the
page_table
entry. If it is set, the page must be
considered “young”; in this case,
the function clears the flag, invokes mark_page_accessed( )
(see Section 16.7.2 later in this chapter), and
returns 0. This check ensures that a page can be swapped out only if
it was not accessed since the previous invocation of
try_to_swap_out( )
on it.
If the memory region is locked (VM_LOCKED
flag
set), invokes mark_page_accessed( )
on it, and
returns 0.
If the PG_active
flag in the
page->flags
field is set, the page is
considered actively used and shouldn’t be swapped
out; the function returns 0.
If the page does not belong to the memory zone specified by the
classzone
parameter, returns 0.
Tries to lock the page; if it is already locked
(PG_locked
flag set), it is not possible to swap
out the page because it is involved in an I/O data transfer; the
function returns 0.
At this point, the function knows that the page can be swapped out.
Forces the value zero into the Page Table entry addressed by
page_table
and invokes flush_tlb_page( )
to invalidate the corresponding TLB entries.
If the Dirty
flag in the Page Table entry was set,
invokes the set_page_dirty( )
function to set the
PG_dirty
flag in the page descriptor. Moreover,
this function moves the page in the dirty_pages
list of the address_space
object referenced by
page->mapping
, if any, and marks the inode
page->mapping->host
as dirty (see
Section 14.1.2.2).
If the page belongs to the swap cache, it performs the following substeps:
Gets the swapped-out page identifier from
page->index
.
Invokes swap_duplicate( )
to verify whether the
page slot index is valid and to increment the corresponding usage
counter in swap_map
.
Stores the swapped-out page identifier in the Page Table entry
addressed by page_table
.
Decrements the rss
field of the memory descriptor
mm
.
Unlocks the page.
Decrements the page usage counter page->count
.
If the page is no longer referenced by any process, it returns 1; otherwise, it returns 0.[112]
Notice that the function does not have to allocate a new page slot, because the page frame has already been swapped out when scanning the Page Tables of some other process.
The page is not inserted into the swap cache. Checks whether the page
belongs to an address_space
object (the
page->mapping
field is not null); in this case,
the page belongs to a shared file memory mapping, so the function
jumps to Step 8d to release the page frame, leaving the corresponding
Page Table entry null.
Notice that the page frame reference of the process is released even if the page is not saved into a swap area. This is because the page has an image on disk, and the function has already triggered, if necessary, the update of this image in Step 7. Moreover, notice also that the page frame is not released to the buddy system because the page is still owned by the page cache (see Section 14.1.2.3).
If the function reaches this point, the page is not inserted into the
swap cache, and it does not belong to an
address_space
object. The function checks the
status of the PG_dirty
flag; if it is cleared, the
function jumps to Step 8d to release the page frame, leaving the
corresponding Page Table entry null.
There is no need to save the page contents on a swap area because the
process never wrote into the page frame. The kernel recognizes this
case because the PG_dirty
flag is cleared, and
this flag is never reset if the page has no image on disk or if it
belongs to a private memory mapping. When the process accesses the
same page again, the kernel handles the Page Fault through the demand
paging technique (see Section 8.4.3); then
the new page frame is filled with exactly the same data as that
stored in this released page frame.
If the function reaches this point, the page is not inserted into the
swap cache, it does not have an image on disk, and it is dirty; here
the function checks whether the page contains buffers (it is a buffer
page, its page->buffers
field is not null). In
this case, the function restores the original contents of the Page
Table entry, unlocks the page, and returns 0.
How could the page host some buffers if the page
doesn’t belong to an
address_space
object—that is, it has no
image on disk? Actually, this might occur in rare
circumstances—for instance, if the page maps a portion of a
file that has just been truncated. In these cases,
try_to_swapout( )
does nothing.
At this point, the page is not inserted into the swap cache, it does
not have an image on disk, and it is dirty; the function must
definitively swap it out in a new page slot. It invokes the
get_swap_page( )
function to allocate a free page
slot in an active swap area. If there are none, it restores the
original content of the Page Table entry, unlocks the page, and
returns 0.
Invokes add_to_swap_cache( )
to insert the page in
the swap cache. The function might fail if another kernel control
path is trying to swap in the page. As we shall see in the next
section, this can happen even if the page slot is not referenced by
any process. In this case, it invokes swap_free( )
to release the page slot and restarts from Step 12.
Sets the PG_uptodate
flag of the page.
Invokes the set_page_dirty( )
function again (see
Step 7 above) because add_to_swap_cache( )
resets
the PG_dirty
flag.
Jumps to Step 8c to store the swapped-out page identifier in the Page Table entry and to release the page frame.
The try_to_swap_out( )
function does not directly
invoke rw_swap_page( )
to trigger the activation
of the I/O data transfer. Rather, the function limits itself to
inserting the page in the swap cache, if necessary, and to marking
the page as dirty. However, we’ll see in the later
section Section 16.7.4 that the
kernel periodically flushes the disk caches to disk by invoking the
writepage
methods of the
address_space
objects that own the dirty pages.
As mentioned in the earlier section Section 16.3, the
address_space
object of the pages that belong to
the swap cache is a special object stored in
swapper_space
. Its writepage
method is implemented by the swap_writepage( )
function, which executes the following steps:
Checks whether the page is not included in the Page Tables of any process; in this case, it removes the page from the swap cache and releases the swap page slot.
Otherwise, it invokes rw_swap_page( )
on the page,
specifying the WRITE
command (see the earlier
section Section 16.4.1).
[111] The swap_out( )
function can block, so memory descriptors might appear
and disappear on the list during a single invocation of the
function.
[112] The check is easily done by
looking at the value of the page->count
usage
counter. Of course, the function must consider that the counter is
incremented when the page is inserted into the swap cache (or the
page cache), and when there are buffers allocated on the page (i.e.,
when the page->buffers
field is not
null).
52.15.65.65