Memory Allocation

The XNU kernel provides a rich set of tools for allocating memory. Kernel memory allocation is not as trivial and straightforward as the malloc() / free() interface found in user space libraries. Kernel memory allocation facilities range from high-level mechanisms analogous to the user space malloc() interface to direct allocation of raw pages. There are dozens of various functions for obtaining memory. Which one to use depends on the subsystem you are working within—for example, Mach, BSD, or the I/O Kit—as well as the requirements for the memory, such as size or alignment. Memory is arguably one of the most limited resources on a computer system, especially for the iOS platform, which has limited amounts of physical memory compared to most Mac OS X-based computers.

At the fundamental level, the kernel keeps track of physical memory using the structure vm_page. A vm_page structure exists for every physical page of memory. Available pages are part of one of the following page lists:

  • Active List: Contains physical pages mapped into at least one virtual address space and have recently been used.
  • Inactive List: Contains pages that are allocated but have not recently been used.
  • Free List: Contains unallocated pages.

Getting a free page from the free list is done with the vm_page_grab() function or its higher-level interface vm_page_alloc(), which unlike the former, places the page in a vm_object as opposed to merely removing it from the free list. The kernel will signal the pageout daemon if it detects that the level of free pages falls behind a threshold. In this case, the pager will evict pages from the inactive list in a least recently used (LRU) fashion. Pages, which are mapped from an on-disk file, are prime candidates and can simply be discarded. The VM page cache and file system cache are combined on Mac OS X and iOS, which avoids duplication, and are collectively referred to as the Universal Buffer Cache (UBC). Pages originating from the file system are managed by the vnode pager, while pages in the VM cache are managed by the default pager.

The following sections will provide an overview of the various mechanisms for memory allocation available to kernel developers, as well as of their use and restrictions.

Low-Level Allocation Mechanisms

The kernel has several families of memory allocation routines. Each major subsystem, such as Mach, BSD, or I/O Kit, has their own families of functions. The VM subsystem lives in the Mach portion of the kernel, which implements the fundamental interfaces for allocating memory. These interfaces are in turn used to form higher-level memory allocation mechanisms for use in other subsystems such as BSD and I/O Kit.

For working in the Mach sections of the kernel, the kmem_alloc*() family of functions is used. These functions are fairly low-level and are only a few levels away from the raw vm_page_alloc() function. The following functions are available:

kern_return_t kmem_alloc(vm_map_t map, vm_offset_t* addrp, vm_size_t  size);
kern_return_t mem_alloc_aligned(vm_map_t map, vm_offset_t* addrp, vm_size_t size);
kern_return_t kmem_alloc_wired(vm_map_t map, vm_offset_t* addrp, vm_size_t size);
kern_return_t kmem_alloc_pageable(vm_map_t map, vm_offset_t* addrp, vm_size_t size);
kern_return_t kmem_alloc_contig(vm_map_t map, vm_offset_t* addrp, vm_size_t size,
                                vm_offset_t mask, int flags);
void kmem_free(vm_map_t map, vm_offset_t addr, vm_size_t size);

All the functions require you to specify a VM Map belonging to either a user space task or kernel_map. All the above functions allocate wired memory, which cannot be paged out, with the exception of kmem_alloc_pageable().

The Mach Zone Allocator

The Mach zone allocator is an allocation mechanism that can allocate fixed-size blocks of memory called zones. A zone usually represents a commonly used kernel data structure, such as a file descriptor or a task descriptor, but can also point to blocks of memory for more general use. Examples of data structures allocated by the zone allocator include:

  • file descriptors
  • BSD sockets
  • tasks (struct task)
  • virtual memory structures (VM Maps, VM Objects)

As a kernel programmer, you can create your own zones with the zinit() function if you have a need for frequent and fast allocation and de-allocation of data objects of the same type. To create a new zone, you need to tell the allocator the size of the object, the maximum size of the queue, and the allocation size, which specifies how much memory will be added when the zone is exhausted.

The kalloc Family

The kalloc family provides a slightly higher-level interface for fast memory allocation. The API would be familiar to those who have used the malloc() interface in user space. In fact, the kernel also has a malloc() function defined by the libkern kernel library, which again uses memory sourced by kalloc().

void* kalloc(vm_size_t size);
void* kalloc_noblock(vm_size_t size);
void* kalloc_canblock(vm_size_t size, boolean_t canblock);
void* krealloc(void** addrp, vm_size_t old_size, vm_size_t new_size);
void kfree(void *data, vm_size_t size);

Memory for the kalloc family of functions is obtained via the Mach zone allocator discussed in the previous section. Larger memory allocations are handled by kmem_alloc() function. Because memory can come from two sources, the kfree() function needs to know the size of the original allocation to determine its origin and to free the memory in the appropriate place. The kalloc family provides the API upon which fundamental memory functions in I/O Kit and the BSD layer are built. It is also the function used to provide memory for the C++ new and new[] operators for memory allocation.

The kalloc functions and variants, except kalloc_noblock(), may block (sleep) to obtain memory. The same is true for the kfree() function. Therefore, you must use kalloc_noblock() if you need memory in an interrupt context or while holding a simple lock.

The available zones can be queried; following is the trimmed output of the zprint command showing the zones used by the kalloc functions.

                      elem   cur      max    cur      max    cur    alloc    alloc
zone name             size   size    size   #elts    #elts   inuse   size    count
-----------------------------------------------------------------------------------
kalloc.16               16    660K    922K   42240   59049   30284    4K     256 C
kalloc.32               32   3356K   4920K  107392  157464   73407    4K     128 C
kalloc.64               64   4792K   6561K   76672  104976   75837    4K      64 C
kalloc.128             128   2732K   3888K   21856   31104   20571    4K      32 C
kalloc.256             256   4248K   5184K   16992   20736   15950    4K      16 C
kalloc.512             512    968K   1152K    1936    2304    1870    4K       8 C
kalloc.1024           1024    784K   1024K     784    1024     735    4K       4 C
kalloc.2048           2048   3396K   4608K    1698    2304    1586    4K       2 C
kalloc.4096           4096   2204K   4096K     551    1024     508    4K       1 C
kalloc.8192           8192   3160K  32768K     395    4096     383    8K       1 C
kalloc.large         41375   5697K   6743K     141     166     141   40K       1 C

There is one zone for each size up to 8 KB. Allocations smaller than 8 KB return an element from the smallest matching zone. It is not possible to partially allocate an element, so, for example, if you need 5000 bytes of memory, you will actually be allocated 8192 bytes (3192 bytes wasted per allocation!). Allocations greater than 8 KB are handled by the appropriate kmem_alloc() function instead of the zone allocator, but are nevertheless recorded in the virtual zone kalloc.large.

Memory Allocation in BSD

Memory allocation in the BSD subsystem is implemented by the following functions and macros:

#define MALLOC(space, cast, size, type, flags)  (space) = (cast)_MALLOC(size, type, flags)
#define FREE(addr, type)_   FREE((void *)addr, type)
#define MALLOC_ZONE(space, cast, size, type, flags)
                   (space) = (cast)_MALLOC_ZONE(size, type, flags)
#define FREE_ZONE(addr, size, type) _FREE_ZONE((void *)addr, size, type)

void* _MALLOC(size_t size, int type, int flags);
void _FREE(void *addr, int type);
void* _MALLOC_ZONE(size_t size, int type, int flags);
void _FREE_ZONE(void *elem, size_t size, int type);

Under the hood, the _MALLOC() function allocates memory using some variant of kalloc(), depending on the flags that are passed; for example, if non-blocking allocation is required, (M_NOWAIT) kalloc_noblock() is called. The _MALLOC_ZONE() function invokes the zone allocator directly instead of indirectly through kalloc(). Instead of using the general purpose kalloc.X zones, it allows you to access zones of commonly used object types, such as file descriptors, network sockets, or mbuf descriptors, used by the networking subsystem. The type argument is used to determine which zone to access. Although _MALLOC() also takes a type argument, it is ignored, except to check that the value is less than the maximum allowed. There are over a hundred different types defined. The flags parameter can be one of the following:

#define M_WAITOK                0x0000
#define M_NOWAIT                0x0001
#define M_ZERO                  0x0004          /* bzero the allocation */

images Tip MALLOC family of functions, along with zone types, are defined in sys/malloc.h.

The M_ZERO flag, if specified, will use the bzero() function to overwrite the memory with zeros before the memory is returned to the caller. If not, the memory will still have the contents written there by the last user or will contain random garbage if never used.

I/O Kit Memory Allocation

The I/O Kit provides a full set of functions for memory allocation. All the following functions return kernel virtual addresses, which can be accessed directly:

void* IOMalloc(vm_size_t size);
void* IOMallocAligned(vm_size_t size, vm_size_t alignment);
void* IOMallocPageable(vm_size_t size, vm_size_t alignment);

The corresponding functions for freeing memory are as follows.

void IOFree(void* address, vm_size_t size);
void IOFreeAligned(vm_size_t size);
void IOFreePageable(void* address, vm_size_t size);

The first function, IOMalloc(), is a wrapper for kalloc() and is subject to the same restrictions. Specifically, it cannot be used in an atomic context, such as a primary interrupt handler, as it may block (sleep) to obtain memory. Nor can IOMalloc() be used if aligned memory is required, as no guarantees are made. IOFree() is a wrapper for the kfree() function and may also block (sleep). It is also possible to deadlock the system if you call either IOMalloc() or IOFree() while holding a simple lock, such as OSSpinLock, as the thread may be preempted if either function sleeps. It could cause a deadlock if an interrupt handler attempted to claim the same lock. Furthermore, memory from IOMalloc() is intended for small and fast allocations and is not suitable for mapping into user space. Because the memory reserved for IOMalloc() comes from a small fixed-size pool, excessive use of IOMalloc() can drain this pool and panic the kernel if the pool is exhausted.

images Caution It is a bug to free memory allocated by, for example, IOMallocAligned() with IOFree(). Always use the free function corresponding to the original allocation function. Even if it works now (by accident), the mechanism could change in a future update and cause a crash.

IOMallocAligned() is subject to the same restrictions as IOMalloc(), but unlike IOMalloc(), it will return memory addresses aligned to a specific value. For example, if you need page-aligned memory you can pass in 4096 to get an address aligned to the beginning of a page. Following are some reasons for requesting aligned memory.

  • Hardware cannot access memory that is not aligned to a specific boundary, or it does so slowly.
  • Memory used in vector computation may be excessively slow from addresses not aligned to a specific byte boundary (typically 16 bytes for SSE).
  • Memory will be used for mapping into a user space process. Since mapping is only possible for whole pages, you may wish to ensure the buffer starts on a page boundary.
  • You want a data structure that is friendly to the CPU cache.

IOMallocPageable() allocates memory that can be paged, unlike the other variants, which always create memory that is wired and cannot be paged out. The restrictions that apply to IOMalloc() and IOMallocAligned() are also valid for IOMallocPageable(). Memory obtained by it cannot be used for device I/O such as DMA or in a code path that is not able to block/sleep without it being wired down first.

There is also a last variant, IOMallocContiguous(), that allocates memory that is physically contiguous. Its use is now deprecated. Apple recommends using IOBufferMemoryDescriptor instead.

Each of the memory allocation functions has a corresponding function to free the memory. It is important to call the right free function that matches the function you used for allocating the memory. Each of the variants source memory from different low-level mechanisms, hence they are not interchangeable. In fact, IOMalloc() may source its memory from more than one source. Larger allocations (>8 KB) may be allocated with kmem_alloc(); however, smaller allocations come from the zone allocator.

This happens to be the reason why you must pass in the size of the original allocation to the IOFree*() functions, as it is used to determine where the memory came from.

Allocating Memory with the C++ New Operator

The libkern library implements a basic C++ runtime, upon which I/O Kit is built. Memory allocation in C++ is typically done with the new and new[] operators for single objects and arrays, respectively. In libkern, the new operator is implemented internally by calling kalloc() to obtain memory. Because kfree() requires the size of the original allocation, libkern modifies the size passed to the new operator to include space for a small structure that can hold the size of the allocation, so that when the delete operator calls kfree(), it can retrieve the size in the four bytes preceding the address returned by new.

Memory allocated by new or new[] is always zeroed out, unlike most implementations of these operators in user space.

images Tip The implementation of the new, new[], delete, delete[] operators can be found in the XNU source distribution under libkern/c++/OSRuntime.cpp.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.61.179