struct address_space

Memory caches are an integral part of modern memory management. In simple words, a cache is a collection of pages used for specific needs. Most operating systems implement a buffer cache, which is a framework that manages a list of memory blocks for caching persistent storage disk blocks. The buffer cache allows filesystems to minimize disk I/O operations by grouping and deferring disk sync until appropriate time.

The Linux kernel implements a page cache as a mechanism for caching; in simple words, the page cache is a collection of page frames that are dynamically managed for caching disk files and directories, and support virtual memory operations by providing pages for swapping and demand paging. It also handles pages allocated for special files, such as IPC shared memory and message queues. Application file I/O calls such as read and write cause the underlying filesystem to perform the relevant operation on pages in the page cache. Read operations on an unread file cause the requested file data to be fetched from disk into pages of the page cache, and write operations update the relevant file data in cached pages, which are then marked dirty and flushed to disk at specific intervals.

Groups of pages in cache that contain data of a specific disk file are represented through a descriptor of type struct address_space, so each address_space instance serves as an abstraction for a set of pages owned by either a file inode or block device file inode:

struct address_space {
struct inode *host; /* owner: inode, block_device */
struct radix_tree_root page_tree; /* radix tree of all pages */
spinlock_t tree_lock; /* and lock protecting it */
atomic_t i_mmap_writable;/* count VM_SHARED mappings */
struct rb_root i_mmap; /* tree of private and shared mappings */
struct rw_semaphore i_mmap_rwsem; /* protect tree, count, list */
/* Protected by tree_lock together with the radix tree */
unsigned long nrpages; /* number of total pages */
/* number of shadow or DAX exceptional entries */
unsigned long nrexceptional;
pgoff_t writeback_index;/* writeback starts here */
const struct address_space_operations *a_ops; /* methods */
unsigned long flags; /* error bits */
spinlock_t private_lock; /* for use by the address_space */
gfp_t gfp_mask; /* implicit gfp mask for allocations */
struct list_head private_list; /* ditto */
void *private_data; /* ditto */
} __attribute__((aligned(sizeof(long))));

The *host pointer refers to the owner inode whose data is contained in the pages represented by the current address_space object. For instance, if a page in the cache contains data of a file managed by the Ext4 filesystem, the corresponding VFS inode of the file stores the address_space object in its i_data field. The inode of the file and the corresponding address_space object is stored in the i_data field of the VFS inode object. The nr_pages field contains the count of pages under this address_space.

For efficient management of file pages in cache, the VM subsystem needs to track all virtual address mappings to regions of the same address_space; for instance, a number of user-mode processes might map pages of a shared library into their address space through vm_area_struct instances. The i_mmap field of the address_space object is the root element of a red-black tree that contains all vm_area _struct instances currently mapped to this address_space; since each vm_area_struct instance refers back to the memory descriptor of the respective process, it would always be possible to track process references.

All physical pages containing file data under the address_space object are organized through a radix tree for efficient access; the page_tree field is an instance of struct radix_tree_root that serves a root element for the radix tree of pages. This structure is defined in the kernel header <linux/radix-tree.h>:

struct radix_tree_root {
gfp_t gfp_mask;
struct radix_tree_node __rcu *rnode;
};

Each node of the radix tree is of type struct radix_tree_node; the *rnode pointer of the previous structure refers to the first node element of the tree:

struct radix_tree_node {
unsigned char shift; /* Bits remaining in each slot */
unsigned char offset; /* Slot offset in parent */
unsigned int count;
union {
struct {
/* Used when ascending tree */
struct radix_tree_node *parent;
/* For tree user */
void *private_data;
};
/* Used when freeing node */
struct rcu_head rcu_head;
};
/* For tree user */
struct list_head private_list;
void __rcu *slots[RADIX_TREE_MAP_SIZE];
unsigned long tags[RADIX_TREE_MAX_TAGS][RADIX_TREE_TAG_LONGS];
};

The offset field specifies the node slot offset in the parent, count holds the total count of child nodes, and *parent is a pointer to the parent node. Each node can refer to 64 tree nodes (specified by the macro RADIX_TREE_MAP_SIZE) through the slots array, where unused slot entries are initialized with NULL.

For efficient management of pages under an address space, it is important for the memory manager to set a clear distinction between clean and dirty pages; this is made possible through tags assigned for pages of each node of the radix tree. The tagging information is stored in the tags field of the node structure, which is a two-dimensional array . The first dimension of the array distinguishes between the possible tags, and the second contains a sufficient number of elements of unsigned longs so that there is a bit for each page that can be organized in the node. Following is the list of tags supported:

/*
* Radix-tree tags, for tagging dirty and writeback pages within
* pagecache radix trees
*/
#define PAGECACHE_TAG_DIRTY 0
#define PAGECACHE_TAG_WRITEBACK 1
#define PAGECACHE_TAG_TOWRITE 2

The Linux radix tree API provides various operation interfaces to set, clear, and get tags:

void *radix_tree_tag_set(struct radix_tree_root *root,
unsigned long index, unsigned int tag);
void *radix_tree_tag_clear(struct radix_tree_root *root,
unsigned long index, unsigned int tag);
int radix_tree_tag_get(struct radix_tree_root *root,
unsigned long index, unsigned int tag);

The following diagram depicts the layout of pages under the address_space object:

Each address space object is bound to a set of functions that implement various low-level operations between address space pages and the back-store block device. The a_ops pointer of the address_space structure refers to the descriptor containing address space operations. These operations are invoked by VFS to initiate data transfers between pages in cache associated with an address map and back-store block device:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.227.114.125