Slub data structures

Having looked at the layout of a cache and descriptors involved at a generic level, let's push further to view specific data structures used by the slub allocator and explore the management of free lists. A slub defines its version of cache descriptor, struct kmem_cache, in kernel header /include/linux/slub-def.h:

struct kmem_cache {
 struct kmem_cache_cpu __percpu *cpu_slab;
 /* Used for retriving partial slabs etc */
 unsigned long flags;
 unsigned long min_partial;
 int size; /* The size of an object including meta data */
 int object_size; /* The size of an object without meta data */
 int offset; /* Free pointer offset. */
 int cpu_partial; /* Number of per cpu partial objects to keep around */
 struct kmem_cache_order_objects oo;

 /* Allocation and freeing of slabs */
 struct kmem_cache_order_objects max;
 struct kmem_cache_order_objects min;
 gfp_t allocflags; /* gfp flags to use on each alloc */
 int refcount; /* Refcount for slab cache destroy */
 void (*ctor)(void *);
 int inuse; /* Offset to metadata */
 int align; /* Alignment */
 int reserved; /* Reserved bytes at the end of slabs */
 const char *name; /* Name (only for display!) */
 struct list_head list; /* List of slab caches */
 int red_left_pad; /* Left redzone padding size */
 ...
 ...
 ...
 struct kmem_cache_node *node[MAX_NUMNODES];
};

The list element refers to a list of slab caches. When a new slab is allocated, it is stored on a list in the cache descriptor, and is considered empty, since all its objects are free and available. Upon allocation of an object, the slab turns into partial state. Partial slabs are the only type of slabs that the allocator needs to keep track of and are connected in a list inside the kmem_cache structure. The SLUB allocator has no interest in tracking full slabs whose objects have all been allocated, or empty slabs whose objects are free. SLUB tracks partial slabs for each node through an array of pointers of type struct kmem_cache_node[MAX_NUMNODES], which encapsulates a list of partial slabs:

struct kmem_cache_node {
 spinlock_t list_lock;
 ...
 ...
#ifdef CONFIG_SLUB
 unsigned long nr_partial;
 struct list_head partial;
#ifdef CONFIG_SLUB_DEBUG
 atomic_long_t nr_slabs;
 atomic_long_t total_objects;
 struct list_head full;
#endif
#endif 
};

All free objects in a slab form a linked list; when allocation requests arrive, the first free object is removed from the list and its address is returned to the caller. Tracking free objects through a linked list requires significant metadata; while the traditional SLAB allocator maintained metadata for all pages of a slab within the slab header (causing data alignment issues), SLUB maintains per-page metadata for pages in a slab by cramming more fields into the page descriptor structure, thereby eliminating metadata from the slab head. SLUB metadata elements in the page descriptor are only valid when the corresponding page is part of a slab. Pages engaged for slab allocations have the PG_slab flag set.

The following are fields of the page descriptor relevant to SLUB:

struct page {
      ...
      ...
     union {
      pgoff_t index; /* Our offset within mapping. */
      void *freelist; /* sl[aou]b first free object */
   };
     ...
     ...
   struct {
          union {
                  ...
                   struct { /* SLUB */
                          unsigned inuse:16;
                          unsigned objects:15;
                          unsigned frozen:1;
                     };
                   ...
                };
               ...
          };
     ...
     ...
       union {
             ...
             ...
             struct kmem_cache *slab_cache; /* SL[AU]B: Pointer to slab */
         };
    ...
    ...
};

The freelist pointer refers to the first free object in the list. Each free object is composed of a metadata area that contain a pointer to the next free object in the list. index holds the offset to the metadata area of the first free object (contains a pointer to next free object). The metadata area of last free object would contain the next free object pointer set to NULL. inuse contains the total count of allocated objects, and objects contains the total number of objects. frozen is a flag that is used as a page lock: if a page has been frozen by a CPU core, only that core can retrieve free objects from the page. slab_cache is a pointer to the kmem cache currently using this page:

When an allocation request arrives, the first free object is located through the freelist pointer, and is removed from the list by returning its address to the caller. The inuse counter is also incremented to indicate an increase in the number of allocated objects. The freelist pointer is then updated with the address of the next free object in the list.

For achieving enhanced allocation efficiency, each CPU is assigned a private active-slab list, which comprises a partial/free slab list for each object type. These slabs are referred to as CPU local slabs, and are tracked by struct kmem_cache_cpu:

struct kmem_cache_cpu {
     void **freelist; /* Pointer to next available object */
     unsigned long tid; /* Globally unique transaction id */
     struct page *page; /* The slab from which we are allocating */
     struct page *partial; /* Partially allocated frozen slabs */
     #ifdef CONFIG_SLUB_STATS
        unsigned stat[NR_SLUB_STAT_ITEMS];
     #endif
};

When an allocation request arrives, the allocator takes the fast path and looks into the freelist of the per-CPU cache, and it then returns free objects. This is referred as the fast path since allocations are carried out through interrupt-safe atomic instructions that does not require lock contention. When the fast path fails, the allocator takes the slow path and looks through page and partial lists of the cpu cache sequentially. If no free objects are found, the allocator moves into the partial lists of nodes; this operation requires the allocator to contend for appropriate exclusion lock. On failure, the allocator gets a new slab from the buddy system. Fetching from either node lists or acquiring a new slab from buddy system are considered very slow paths, since both of these operations are not deterministic.

The following diagram depicts the relationship between slub data structures and free lists:

Table of Contents for Slub data structures

Create new playlist

Sign In

Sign Up

Table of Contents for
Slub data structures