Transferring swap pages wouldn’t be so complicated if there weren’t so many race conditions and other potential hazards to guard against. Here are some of the things that have to be checked regularly:
The process that owns a page may terminate while the page is being swapped in or out.
Another process may be in the middle of swapping in a page that the current one is trying to swap out (or vice versa).
Like any other disk access type, I/O data transfers for swap pages are blocking operations. Therefore, the kernel must take care to avoid simultaneous transfers involving the same page frame, the same page slot, or both.
Race conditions can be avoided on the page frame through the mechanisms discussed
in Chapter 13. Specifically, before starting an I/O
operation on the page frame, the kernel waits until its
PG_locked
flag is off. When the function returns,
the page frame lock has been acquired, and therefore no other kernel
control path can access the page frame’s contents
during the I/O operation.
But the state of the page slot must also be tracked. The
PG_locked
flag of the page descriptor is used once
again to ensure exclusive access to the page slot involved in the I/O
data transfer. Before starting an I/O operation on a swap page, the
kernel checks that the page frame involved is included in the swap
cache; if not, it adds the page frame into the swap cache.
Let’s suppose some process tries to swap in a page
while the same page is currently being transferred. Before doing any
work related to the swap in, the kernel looks in the swap cache for a
page frame associated with the given swapped-out page identifier.
Since the page frame is found, the kernel knows that it must not
allocate a new page frame, but must simply use the cached page frame.
Moreover, since the PG_locked
flag is set, the
kernel suspends the kernel control path until the bit becomes 0, so
that both the page frame’s contents and the page
slot in the swap area are preserved until the I/O operation
terminates.
In short, thanks to the swap cache, the PG_locked
flag of the page frame also acts as a lock for the page slot in the
swap area.
The rw_swap_page( )
function is used to swap in or
swap out a page. It receives the following parameters:
rw
A flag specifying the direction of data transfer:
READ
for swapping in, WRITE
for
swapping out.
page
The address of a descriptor of a page in the swap cache.
Before invoking the function, the caller must ensure that the page is
included in the swap cache and lock the page to prevent race
conditions due to concurrent accesses to the page frame or to the
page slot in the swap area, as described in the previous section. To
be on the safe side, the rw_swap_page( )
function
checks that these two conditions effectively hold, and then gets the
swapped-out page identifier from page->index
and invokes the rw_swap_page_base( )
function,
passing to it the page identifier, the page descriptor address
page
, and the direction flag
rw
.
The rw_swap_page_base( )
function is the core of
the swapping algorithm; it performs the following steps:
If the data transfer is for a swap-in operation
(rw
set to READ
), it clears the
PG_uptodate
flag of the page frame. The flag is
set again only if the swap-in operation terminates successfully.
Gets the proper swap area descriptor and the slot index from the swapped-out page identifier.
If the swap area is a disk partition, gets the corresponding block
device number from the swap_device
field of the
swap area descriptor. In this case, the slot index also represents
the logical block number of the requested data because the block size
of any swap disk partition is always equal to the page size
(PAGE_SIZE
).
Otherwise, if the swap area is a regular file, it executes the following substeps:
Gets the number of the block device that stores the file from the
i_dev
field of its inode object (the
swap_files->d_inode
field in the swap area
descriptor).
Gets the block size of the device (the
i_sb->s_blocksize
field of the inode).
Computes the file block number corresponding to the given slot index.
Fills a local array with the logical block numbers of the blocks in
the page slot; every logical block number is obtained by invoking the
bmap
method of the
address_space
object whose address is stored in
the i_mapping
field of the inode. If the
bmap
method fails, rw_swap_page_base( )
returns 0 (failure).
Invokes the brw_page( )
function to start a page
I/O operation on the block (or blocks) identified in the previous
steps and returns 1 (success).
Since the page I/O operation activated by brw_page( )
is asynchronous, the rw_swap_page( )
function might terminate before the actual I/O data transfer
completes. However, as described in Section 13.4.8.2, the kernel eventually executes the
end_buffer_io_async( )
function (which verifies
that all data transfers successfully completed), unlocks the page,
and sets its PG_uptodate
flag.
The read_swap_cache_async( )
function, which
receives as a parameter a swapped-out page identifier, is invoked
whenever the kernel must swap in a page. As we know, before accessing
the swap partition, the function must check whether the swap cache
already includes the desired page frame. Therefore, the function
essentially executes the following operations:
Invokes find_get_page( )
to search for the page in
the swap cache. If the page is found, it returns the address of its
descriptor.
The page is not included in the swap cache. Invokes
alloc_page( )
to allocate a new page frame. If no
free page frame is available, it returns 0 (indicating the system is
out of memory).
Invokes add_to_swap_cache( )
to insert the new
page frame into the swap cache. As mentioned in the earlier section
Section 16.3.1, this function also
locks the page.
The previous step might fail if add_to_swap_cache( )
finds a duplicate of the page in the swap cache. For
instance, the process could block in Step 2, thus allowing another
process to start a swap-in operation on the same page slot. In this
case, the function releases the page frame allocated in Step 3 and
restarts from Step 1.
Otherwise, the new page frame is inserted into the swap cache.
Invokes rw_swap_page( )
to read the
page’s contents from the swap area, passing the
READ
parameter and the page descriptor to that
function.
Returns the address of the page descriptor.
There is just one case in which the kernel wants to read a page from
a swap area without putting it in the swap cache. This happens when
servicing the swapon( )
system call: the kernel
reads the first page of a swap area, which contains the
swap_header
union, and then immediately discards
the page frame. Since the kernel is activating the swap area, no
process can swap in or swap out a page on it, so there is no need to
protect the access to the page slot.
The rw_swap_page_nolock ( )
function receives as
parameters the type of I/O operation (READ
or
WRITE
), a swapped-out page identifier, and the
address of a page frame (already locked). It performs the following
operations:
Gets the page descriptor of the page frame passed as a parameter.
Initializes the swapping
field of the page
descriptor with the address of the swapper_space
object; this is done because the sync_page
method
is executed in Step 4.
Invokes rw_swap_page_base( )
to start the I/O swap
operation.
Waits until the I/O data transfer completes by invoking
wait_on_page( )
.
Unlocks the page.
Sets the mapping
field of the page descriptor to
NULL
and returns.
18.222.111.134