424 26.AHighlyOptimizedPortableMemoryManager
wait until the absolute last allocation is used before requesting a new page. Or
use some hybrid approach in which the next three or four block sizes are consid-
ered valid if the correct one is full.
MBAPages
Medium block page management is more complicated only because the lists are
constructed based on the size of the largest available block in each page. Conse-
quently, for nearly every allocation or deallocation, the affected page most likely
has its list pointers adjusted. Unlike SBA, it is entirely possible that no big block
matches the requested allocation size exactly (see Figure 26.5 for an example of a
sparse page list). In fact, a brand-new page would show the big block being the
full size of the page (minus the metadata overhead in the
Tail structure), so re-
questing a new page is also not likely to add a page in the list to which the alloca-
tion first indexes. So, whenever a specific-sized list is empty, the page manager
must scan the larger lists looking for a page, guaranteeing that the allocation
function succeeds. However, since list pointers are four bytes each in a 32-bit
machine, an array of pointers actually requires multiple cache lines and far too
many fetches from memory for good performance. Instead, we implement the
lookup first as a bit mask that uses 0 to indicate a null pointer and 1 to indicate
any non-null pointer. In this way, a single cache line can track the availability of
hundreds of lists with just a few memory fetches.
Figure 26.5. This MBA page table shows two pages that are completely full, one with
384 bytes in a single block, and another with 640 bytes in a single block. It makes no
statement about how many blocks of that size might exist or how many smaller ones
exist.
Ar
r
anged
b
yBigBlockSize
0bytes
128 bytes
256 bytes
384 bytes
512 bytes
640 bytes
26.7OSAPIIdeas 425
26.7OSAPIIdeas
All interactions that the memory manager has with the underlying operating sys-
tem are tucked away in the
OSAPI. The OSAPI has to be able to tell immediately
what kind of page an allocation comes from and be able to return the pointer to
the start of that page. This has to be very fast, or it becomes the bottleneck for all
other memory operations. It also must be able to allocate new pages and delete
pages that are empty and no longer needed.
There are many possible approaches to implementing the
OSAPI. Our re-
search on this has not yet reached a final best solution, and indeed, we believe
that the right solution depends on the project and platform. Even so, there are a
few ways to implement the
OSAPI’s page tracking, so we discuss some here.
The
OSAPIForWindows is implemented such that all SBA and MBA pages
are tracked in a
std::set, which is implemented as a red-black tree. While theo-
retically fast, each node requires its own allocation, which can both fragment the
underlying heap and be slow. Pages are allocated using the
VirtualAlloc()
function, which has some implications for ideal page size, especially on 32-bit
systems where the amount of address space mapped should be equal to the physi-
cal memory being mapped. Performance is not great, but it does work well as a
rudimentary example of how to implement the features required by the interface.
It also suffers due to the many calls to
VirtualAlloc() and VirtualFree(),
causing thunks into the kernel to allocate memory. With this technique, operating
system functions show up on performance profiles—never a good sign.
The
OSAPIFastForWindows is an improved version that preallocates a big
chunk of memory and assigns pages from this chunk. To maximize flexible use
of this space, SBA and MBA pages are set to the same size and are allowed to
change from small- to medium-sized pages while the page is empty. This is ac-
complished simply by tracking the type of page with an array of bits and tracking
whether the page is in use with another array of bits. To limit time spent search-
ing for free pages, a simple start index is stored in the class that remembers the
last index to a freed page. It is not required to be accurate, but it does give a good
place to start searching and usually returns a hit on the first attempt to allocate a
page. However, this implementation has the downside of being rather static, and
if there are times when too many large allocations are made, or too many small or
medium allocations are made, the system could catastrophically fail.
Future systems could be written to have completely different allocation strat-
egies. For instance, it could be smart about allocating pages from lower memory
addresses and allocating large blocks from higher memory addresses until the
two collide somewhere in the middle.
426 26.AHighlyOptimizedPortableMemoryManager
It is also intended that the OSAPI be platform-specific, tuned to the exact
amount of memory that is used in the system. There is no real benefit to having a
generic page handling system if there is no flexibility in the amount of memory
available to the game on a specific piece of hardware. A custom page handler,
specifically one that knows the address range that could possibly exist for alloca-
tions and can reduce the identification of pages to a shift and bit mask operation
to create an index into a hash table is ideal. For instance, the Nintendo Wii can
only return certain addresses for MEM1 and MEM2, so if the LBA is known to
allocate exclusively from MEM2, it is easy to identify. Similarly, if the SBA and
MBA allocate from MEM1, we can create a simple bit array that maps exactly
the memory addresses used on the console and know which kind of page each
pointer belongs to simply by rounding off the address and looking it up.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.179.100