Memory Basics

Memory access is so integral to application architecture and performance that a considerable portion of the Windows infrastructure is devoted to managing it and making it accessible to applications. Effective memory management is key to achieving application performance that is both acceptable and consistent. Despite the relatively low prices of today's RAM modules, memory is still a finite resource and is probably the single most important factor affecting application performance and overall system throughput. In many instances, you'll get a better performance boost from adding RAM to a machine than you will by upgrading to a faster CPU.

Key Memory Terms and Concepts

Process address space— the 4GB address space for an application. Addresses in Win32 applications are limited to 4GB because 4,294,967,296 (232) is the largest integer value a 32-bit pointer can store. Of these 4GB, 2GB are reserved by default for the kernel and 2GB are set aside for user mode access. On some editions of Windows, the user mode address space can be increased to 3GB (at the expense of kernel mode space) via the /3GB BOOT.INI switch for applications that are configured to take advantage of it. All memory allocated by an application comes from this space.

Virtual memory— the facility by which a memory manager provides more memory than physically exists in a machine. The Windows virtual memory manager makes it appear to applications as though 4GB of memory exists in the machine, regardless of how much physical memory there actually is. Windows virtual memory is implemented primarily through the system paging file.

Page size— the memory page size that a given processor architecture requires. On the x86, this is 4K. All Windows memory allocations must occur in multiples of the system page size.

Allocation granularity— the boundary at which virtual memory reservations must be made under Windows. On all current versions of Windows, this is 64K, so user mode virtual memory reservations must be made at 64K boundaries within the process address space.

System paging file— the file (or files) that Windows uses to provide physical storage for virtual memory. Windows uses the paging file to swap physical memory pages to and from disk in a manner that is transparent to the application. The total physical memory storage on a given machine is equal to the size of the physical memory plus the size of all the paging files combined.

Address translation— the process of translating a virtual memory address into a physical one.

Page fault— a condition raised by the memory management unit (MMU) of a processor that causes the Windows fault-handling code to load a page from the system paging file into physical memory if it can be located.

Thrashing— a condition that occurs when the system is pressured for physical memory and continually swaps pages to and from the system paging file, often preventing applications from running in a timely fashion.

NULL pointer assignment partition— the first 64K of the user mode address space; it's marked off limits in order to make NULL pointer references easier to detect.

Large-address-aware application—+ an application whose executable has the IMAGE_FILE_LARGE_ADDRESS_AWARE flag set in its header. An application that is large address aware will receive a 3GB user mode address space when executed on an appropriate version of Windows that has been booted with the /3GB option.

AWE— Address Windowing Extensions, the facility Windows provides for accessing physical memory above 4GB.

Application memory tuning— the facility whereby a large-address-aware application can use up to 3GB of the process address space.

Key Memory APIs

Table 4.1. Key Memory-Related Win32 API Functions
FunctionDescription
GetSystemInfoGets system-level information about machine resources such as processors and memory
VirtualAllocReserves, commits, and resets virtual memory
AllocateUserPhysicalPagesAllocates physical memory for use with Windows' AWE facility
MapUserPhysicalPagesMaps a portion of the AWE physical memory into a virtual memory buffer set aside by VirtualAlloc
ReadProcessMemoryAllows one process to read memory belonging to another
WriteProcessMemoryAllows one process to write memory belonging to another

Key Memory Tools

The best all around tool for monitoring Windows memory statistics and performance is Perfmon. Task Manager is also surprisingly helpful. Keep in mind that Task Manager's Mem Usage column lists each process's working set size, not its total virtual memory usage. Since this column includes shared pages, you can't total it to get the total physical memory used by all processes. Also, Task Manager's VM Size column actually lists a process's private bytes (its private committed pages), not its total virtual memory size.

Table 4.2. Key Memory-Monitoring Tools
 Reserved Virtual MemoryPaging File SizePage FaultsWorking Set SizePaged PoolNonpaged Pool
Perfmon
Pstat
Pview 
pmon  
TaskMgr 
TList    

Key Perfmon Counters

Table 4.3. Key Memory-Related Perfmon Counters
CounterDescription
Memory:Committed BytesThe committed private address space (in both the paging file and physical memory)
Memory:Commit LimitThe amount of memory that can be committed without causing the system paging file to grow
Memory:% Committed Bytes In UseMemory:Committed Bytes divided by Memory:Commit Limit
Process:Virtual BytesThe total size of the process address space (shared and private pages)
Process:Private BytesThe size of the nonshared committed address space
Process:Page File BytesSame as Process:Private Bytes
Process:Page File PeakThe peak value of the Process:Page File Bytes counter

Addresses

Because Windows is a 32-bit operating system, all user processes have a flat 4GB address space. This space is limited to 4GB because a 32-bit pointer can have one of 4,294,967,296 (232) values. This means that pointer values in Windows applications can range from 0x00000000 to 0xFFFFFFFF.

On 64-bit Windows, processes have a flat 16EB (exabyte) address space. A 64-bit pointer can have one of 18,446,744,073,709,551,616 (264) values, ranging from 0x0000000000000000 to 0xFFFFFFFFFFFFFFFF.

The fact that user processes are limited to 4GB of address space on 32-bit Windows doesn't mean that apps can't access more than 4GB of physical memory. As you're probably aware, it's not unusual for server machines to have more than 4GB of RAM installed. Windows' AWE facility allows applications to fully utilize the physical memory available in their host machines. We'll discuss AWE in more detail later in the chapter. For now, just keep in mind that it allows an application to access physical memory beyond 4GB. Windows 2000 Professional and Windows 2000 Server both support up to 4GB of physical memory. Windows 2000 Advanced Server supports up to 8GB, and Windows 2000 Data Center supports up to 64GB. Through AWE, an application can make use of as much physical memory as the operating system supports.

Keep in mind that the 4GB that a 32-bit process has to work with is virtual address space, not physical storage. By virtual, I mean that the address space is simply a range of memory addresses. Physical storage must be mapped to portions of this space before an application can make use of it without causing an access violation.

Basic Memory Management Services

In its bare essence, Windows memory management consists of implementing virtual memory and managing the interchange between virtual memory and physical memory. This involves a couple of fundamental tasks:

1.
Mapping the virtual space for a process into physical memory

2.
Paging memory to and from disk when process threads attempt to use more physical memory than is currently available

Beyond the virtual memory management services it provides, the memory manager also provides core services to Windows' environment subsystems. These include the following:

  • Memory-mapped files

  • Support for apps using sparsely populated address spaces

  • Copy-on-write memory

Granularities

All processor chips define a fixed page size for working with memory. The page size on the x86 family of processors is 4K. Any allocation request an application makes is rounded up to the nearest page boundary. This means, for example, that a 5K allocation request will actually require 8K of memory.

Like most operating systems, 32-bit Windows has a fixed allocation granularity—a boundary on which all application memory reservations must occur. The boundary will always be a multiple of the system page size. In the case of 32-bit Windows, this boundary is 64K, so when an application requests a memory reservation, that reservation must begin on a 64K boundary in the process address space. Though many apps let Windows decide the precise location of the buffers they allocate, some make allocations at specific addresses. For those that do, they must pass a starting reservation address into Windows that aligns with a 64K boundary in the process address space. Windows will round down any starting reservation address that does not correctly align with the allocation granularity.

An app that's not mindful of the system's 64K allocation granularity can cause address space to be wasted. If an application reserves a virtual memory region less than 64K in size, the remainder of the 64K region is unusable by the application thanks to the system-enforced allocation granularity. Because an app cannot then specify a reservation that occupies the remainder of the region without having the system automatically round it down to the start of the 64K region, the unused address space is essentially wasted. So, it's possible to exhaust the address space for a process without actually reserving or allocating much memory. We'll talk more about memory reservation and commitment in the Virtual Memory section below.

You can retrieve both the system allocation granularity and the system page size via the Win32 GetSystemInfo API function. It's conceivable that both of these could vary in future versions of Windows, so it's wise not to hard-code references to them. See Exercise 4.4 later in the chapter for an example of how to use GetSystemInfo in a SQL Server extended procedure.

Process Memory Protection

Windows isolates processes from one another such that no user process can corrupt the address space of another process or of the OS itself. This makes Windows more robust and protects applications from one another. There are four fundamental aspects of this protection.

  1. All processor chips supported by Windows provide some form of hardware-based memory protection.

  2. System-wide data structures and memory areas used by kernel mode components are accessible only while in kernel mode—user mode code can't touch them.

  3. Windows provides each process a private address space. Threads belonging to other processes are prohibited from accessing it.

  4. Shared memory sections have standard Access Control Lists (ACLs) that are checked when processes access them.

These four aspects of the Windows memory management architecture make the operating system far more robust than it otherwise would be. They help prevent intentional and unintentional corruption of one process's address space by another, and they help make Windows itself resilient in the face of catastrophic application errors.

NOTE: As I've mentioned earlier, Windows does provide API functions such as ReadProcessMemory and WriteProcessMemory that allow one process to access another's address space. That said, using these functions requires specific access rights; you cannot accidentally read or modify memory belonging to another process. Typically (but not always), these functions are used by a debugger to access the memory of a process being debugged. Also note that, by default, when one process spawns another via a call to CreateProcess, the parent process has the access permissions required to access the child process's virtual memory. Again, this is typically used to facilitate debugging.


Partitions

At a high level, the 4GB process address space is organized as shown in Table 4.4.

Table 4.4. The Process Address Space and What It Contains
Address RangeDescription
0x00000000–0x7FFFFFFFApplication and DLL code, global variables, thread stacks—user mode memory
0x80000000–0xBFFFFFFFKernel and executive, HAL, boot drivers
0xC0000000–0xC07FFFFFProcess page tables, hyperspace
0xC0800000–0xFFFFFFFFSystem cache, paged pool, nonpaged pool

Unless the /3GB boot option has been enabled, the user mode portion of this space takes up the first 2GB, and the kernel occupies the remaining 2GB. If /3GB has been enabled, the user mode portion occupies the first 3GB (0x00000000–0xBFFFFFFF) and the kernel is squeezed into the remaining 1GB. See the subsection titled Application Memory Tuning on page 122 for more information on this option. For purposes of this discussion, we'll assume that /3GB is not enabled.

Within the user mode portion, there are several smaller partitions (Table 4.5). The following subsections briefly discuss these partitions.

NULL Pointer Assignment Partition

Have you ever wondered why NULL (address 0x00000000) can't be used by an application? After all, isn't it just another address within the process address space (the first address, in fact) just like any other address? No, it isn't. And the reason it isn't is because, in the interest of helping programmers catch NULL pointer assignments, Windows has marked the first 64K of the process address space as off limits.

Table 4.5. Partitions in the User Mode Portion of a Process's Address Space
Address RangeSizeDescription
0x00000000–0x0000FFFF64KOff-limits region (prevents NULL pointer assignments).
0x00010000–0x7FFEFFFF2GB–~192KPrivate process address space.
0x7FFDE000–0x7FFDEFFF4KTEB for the process's main thread. TEBs for other threads reside at the previous page (0x7FFDD000) and working backward.
0x7FFDF000–0x7FFDFFFF4KThe process's PEB.
0x7FFE0000–0x7FFE0FFF4KShared user data page.
0x7FFE1000–0x7FFEFFFF60KOff-limits region (remainder of 64K containing shared user data page).
0x7FFF0000–0x7FFFFFFF64KOff-limits region (prevents buffers from straddling the user mode/kernel mode boundary).

The NULL pointer assignment partition is a very simple yet surprisingly useful feature in the operating system that helps programs catch failed allocations. For example, consider the following C code.

char *pszLastName = (char *)malloc(LAST_NAME_SIZE);
strcpy(pszLastName,"Smith");

This code performs no error checking. If malloc is unable to allocate a buffer of the requested size, it returns NULL. Because Windows has marked the entirety of the first 64K of the process's address space as off limits (including address 0x00000000—NULL), any attempt to access a NULL pointer will result in an access violation. In the code above, if the call to malloc returns NULL, the call to strcpy will cause an access violation to be raised. This isn't because Windows checks every pointer reference to make sure that it doesn't equal NULL; it's because no address within the first 64K of the user mode space—0x00000000 or otherwise—may be used.

Does this mean that the operating system wastes 64K of the memory in your system? No, not at all. Remember: A process's address space is virtual—those sections marked off limits by the operating system are not backed by physical memory. For such a useful feature as the NULL pointer assignment partition, you give up only a 64K range of memory addresses—no physical memory is wasted.

Why is the NULL assignment partition 64K in size? Why not just make the NULL address, 0x00000000, off limits, or, at most, a single 4K page? Windows makes the entire 64K off limits for two reasons.

  1. Reservations by user mode apps are required to be on allocation granularity (64K) boundaries. So, even if only the first 4K page was marked off limits, you still couldn't reserve memory in the remaining 60K of the first 64K of address space.

  2. NULL pointer references are often buried in pointer arithmetic where a NULL memory address is not actually referenced, but one based on NULL plus an offset of some kind is. This means that your NULL pointer reference may actually end up causing your app to reference a memory location other than 0x00000000. Marking the entire 64K region off limits helps catch many of these situations.

This is best explained by way of example. Exercises 4.1 through 4.3 later in this chapter walk you through building a few test applications that demonstrate NULL pointer references and how Windows helps you detect them.

Process Private Address Space Partition

A process's private address space is where an application's executable and DLLs are loaded. All private memory allocations come from this region, and memory-mapped files are mapped here as well. It's the space within which an application operates.

Kernel Mode Partition

The kernel mode partition is where the code for file system support, thread management, memory management, networking support, and all device drivers resides. Everything residing in the kernel mode partition is shared among all processes.

You may be wondering whether the kernel really needs the top half of the process address space. Unfortunately, the answer is yes, it does. The kernel needs this space for OS code, device I/O cache buffers, process page tables, device driver code, and so forth. To be sure, the kernel could really make good use of much more space. It finally gets all the space it needs in 64-bit Windows.

One thing to keep in mind about kernel mode space: If you boot with the /3GB option (discussed below), the kernel space is reduced to just 1GB. This, in turn, limits the sizes of some of the data structures typically stored in the kernel mode space. For example, when /3GB is enabled, you may access only 16GB of total system memory because the size of the process page table is constricted by the limited kernel mode space.

PEB and TEB Regions

The PEB and TEB areas aren't regions that you'll make direct use of much, but it's instructive to know about them and what they are. As I mentioned in Chapter 3, each process has a process environment block (PEB) that's allocated in the user mode space. As Table 4.5 indicates, the precise address of a process's PEB is 0x7FFDF000. This means that you can dump this region of memory from under a debugger in order to view the PEB for a process. WinDbg has a special command for doing exactly this, !peb. The next time you attach to SQL Server with WinDbg, try the !peb command. You'll see that it returns a number of interesting pieces of data including the modules currently loaded within the process, the command line passed into the process, the address of the default heap, and many others.

As I said in Chapter 3, every thread has an associated thread environment block (TEB). The user mode address space contains a TEB for each thread owned by the process. As with the PEB, these blocks are stored in the user mode space in order to allow the system to access them without having to switch to kernel mode.

As shown in Table 4.5, the address of the TEB for a process's main thread is at 0x7FFDE000. You can list the contents of a TEB using the WinDbg !teb command. If you execute !teb without any parameters, you get the TEB for the current thread. If you pass an address into !teb, you'll get the TEB at that address if there is one.

TEBs for the worker threads in a multithreaded application are stored on the page at address 0x7FFDD000 and the pages immediately preceding it in memory (e.g., 0x7FFDC000, 0x7FFDB000, and so on).

Shared User Data Page

The memory page at 0x7FFE0000 is known as the shared user data page. It contains global items such as the clock tick count, the system time, the version number, and various other system-level data elements. It is read-only and is backed by a memory page that actually resides in the kernel address space. It exists in the user mode space in order to allow API routines to access key system data without having to switch to kernel mode.

Boundary Partitions

The last two regions of the user mode address space are off limits to applications. The first is the remainder of the 64K region containing the shared user data page. This 60K region is marked off limits by the operating system; any attempt to access it will result in an access violation. The fact that the remainder of the 64K region containing the shared user data page is marked off limits doesn't really affect user mode applications because that region would be inaccessible to them anyway given that user mode reservations must begin on an allocation granularity boundary.

The second region is the last 64K of the user mode address space. Windows marks it off limits in order to prevent an application from accessing a region of virtual memory that straddles the boundary between user mode and kernel mode. Because routines such as WriteProcessMemory are actually validated by kernel mode code, they can access address regions normally off limits to user mode code. By marking the last 64K of user mode space off limits, Windows protects against memory access that starts in the user mode space and extends into the kernel mode space.

The System Paging File

In order to implement virtual memory—that is, in order to allow applications to access more memory than physically exists in the machine—the Windows memory manager transparently copies pages to and from disk as necessary. The file it uses to store these pages is called the system paging file.

From an application standpoint, the system paging file increases the amount of memory available for use. It makes the system appear to have much more physical memory than it actually does. This is why a machine with, say, 1GB of physical memory can run many apps simultaneously, each having a 4GB process address space that is, perhaps, 50% backed by physical storage.

Conceptually, it's helpful to think of the physical storage behind virtual memory as the system paging file. Even though pages are constantly being copied in and out of physical RAM, the vast majority of the physical storage behind the virtual memory in the system is typically in the system paging file.

Although it is possible to run Windows without a paging file, this isn't usually recommended. In a typical configuration, the system paging file is considerably larger than the physical memory in the machine and provides apps with an efficient mechanism for accessing more memory than the machine actually has.

The paging file size is the most important variable affecting how much storage is available to an application. The amount of RAM has very little impact on the physical storage available to an app, but it does, of course, affect performance very significantly. When physical RAM is too low, the system will constantly copy data pages to and from the paging file (a condition known as thrashing), and, of course, performance will suffer commensurately.

Address Windowing Extensions

Windows' AWE facility exists to allow applications to access more than 4GB of physical memory. As I mentioned earlier, a 32-bit pointer is an integer that is limited to storing values of 0xFFFFFFFF or less—that is, to references within a 4GB memory address space. AWE allows an application to circumvent this limitation and access all the memory supported by the operating system.

At a conceptual level, AWE is nothing new—operating systems and applications have been using similar mechanisms to get around pointer limitations practically since the dawn of computers. For example, back in the DOS days, 32-bit extenders (e.g., Phar Lap, Plink, and others) were commonly used to allow 16-bit apps to access memory outside their normal address space. Special-purpose managers and APIs for extended and expanded memory were common; you may even remember products such as Quarterdeck's QEMM-386 product, which was commonly used for this sort of thing way back when.

Typically, mechanisms that allow a pointer to access memory at locations beyond its direct reach (i.e., at addresses too large to store in the pointer itself) pull off their magic by providing a window or region within the accessible address space that is used to transfer memory to and from the inaccessible region. This is how AWE works: You provide a region in the process address space—a window—to serve as a kind of staging area for transfers to and from memory above the 4GB mark.

In order to use AWE, an application follows these steps.

1.
Allocate the physical memory to be accessed using the Win32 AllocateUserPhysicalPages API function. This function requires that the caller have the Lock Pages in Memory permission.

2.
Create a region in the process address space to serve as a window for mapping views of this physical memory using the VirtualAlloc API function. We'll discuss VirtualAlloc further in just a moment.

3.
Map a view of the physical memory into the virtual memory window using the MapUserPhysicalPages or MapUserPhysicalPagesScatter Win32 API functions.

While AWE exists on all editions of Windows 2000 and later and can be used even on systems with less than 2GB of physical RAM, it's most typically used on systems with 2GB or more of memory because it's the only way a 32-bit process can access memory beyond 3GB, as I mentioned earlier in the chapter. If you enable AWE support in SQL Server on a system with less than 3GB of physical memory, the system ignores the option and uses conventional virtual memory management instead.

One interesting characteristic of AWE memory is that it is never swapped to disk. You'll notice that the AWE-specific API routines refer to the memory they access as physical memory. This is exactly what AWE memory is: physical memory outside the control of the Windows virtual memory manager.

The virtual memory window used to buffer the physical memory provided by AWE requires read-write access. Hence, the only protection attribute that can be passed into VirtualAlloc when you set up this window is PAGE_READWRITE. Not surprisingly, this also means that you can't use VirtualProtect to protect pages within this region from modification or access.

Application Memory Tuning

The /3GB boot option is available on the Advanced Server and Data Center editions of Windows 2000 (and later). It allows a process's user mode address space to be expanded from 2GB to 3GB at the expense of the kernel mode address space (which is reduced from 2GB to 1GB). In Windows parlance, this facility is known as application memory tuning or 4GB tuning (4GT).

You enable application memory tuning by adding “/3GB” (without the quotes) to the appropriate line in the [operating systems] section of your BOOT.INI. It's common for people to configure their systems to be bootable with and without /3GB by setting up the entries in the [operating systems] section of BOOT.INI such that they can choose either option at startup.

WARNING: You can also boot Windows 2000 Professional, Windows 2000 Server, and Windows XP with the /3GB switch. However, this has the negative consequence of reducing kernel mode space to 1GB without increasing user mode space. In other words, you gain nothing for the kernel mode space you give up.


NOTE: Windows Server 2003 introduced a new boot option to set the user mode process space, /USERVA. You add /USERVA to your BOOT.INI just as you would /3GB. The advantage of /USERVA over /3GB is that it gives you a finer level of control over exactly how much address space to set aside for user mode use versus kernel mode use. For example, /USERVA=2560 configures 2.5GB for user mode space and leaves the remaining 1.5GB for the kernel. The caveats that apply to the /3GB switch apply here as well.


Large-Address-Aware Executables

Before support for /3GB was added to Windows, an application could never access a pointer with the high bit set. Only addresses that could be represented by the first 31 bits of a 32-bit pointer could be accessed by user mode applications. This left 1 bit unused, so some developers, being the clever coders they were and not wanting to waste so much as a bit in the process address space, made use of it for other purposes (e.g., to flag a pointer as referencing a particular type of application-specific allocation). This caused a conundrum when /3GB was introduced because these types of apps would not be able to easily distinguish a legitimate pointer that happened to reference memory above the 2GB boundary from a pointer that referenced memory below 2GB but had its high bit set for other reasons. Basically, booting a machine with /3GB would likely have broken such apps.

To deal with this, Microsoft added support for a new bit flag in the Characteristics field of the Win32 Portable Executable (PE) file format (the format that defines the layout of executable files—EXEs and DLLs—under Windows) that indicates whether an application is large address aware. When this flag (IMAGE_FILE_LARGE_ADDRESS_AWARE) is enabled, bit 32 in the Characteristics field in an executable file's header will be set. By having this flag set in its executable header, an application indicates to Windows that it can correctly handle pointers with the high bit set—that it doesn't do anything exotic with this bit. When this flag is set and the appropriate version of Windows has been booted with the /3GB option, the system will provide the process with a 3GB private user mode address space. You can check whether an executable has this flag enabled by using utilities such as DumpBin and ImageCfg that can dump the header of an executable file.

Visual C++ exposes IMAGE_FILE_LARGE_ADDRESS_AWARE via its /LARGEADDRESSAWARE linker switch. (You can also change this flag in an existing executable using ImageCfg.) SQL Server has this flag enabled, so if you boot with the /3GB switch on the appropriate version of Windows, the system will set the size of SQL Server's private process address space to 3GB.

NOTE: The IMAGE_FILE_LARGE_ADDRESS_AWARE flag is checked at process startup and is ignored for DLLs. DLLs must always behave appropriately when presented with a pointer whose high bit is set.


/3GB vs. AWE

The ability to increase the private process address space by 50% is certainly a handy and welcome enhancement to Windows' memory management facilities; however, Windows' AWE facility is far more flexible and scalable. As I said earlier, when you increase the private process address space by a gigabyte, that gigabyte comes from the kernel mode address space, which shrinks from 2GB to 1GB. Since the kernel mode code is already cramped for space even when it has the full 2GB to work with, shrinking this space means that certain internal kernel structures must also shrink. Chief among these is the table Windows uses to manage the physical memory in the machine. When you shrink the kernel mode partition to 1GB, you limit the size of this table such that it can manage a maximum of only 16GB of physical memory. For example, if you're running under Windows 2000 Data Center on a machine with 64GB of physical memory and you boot with the /3GB option, you'll be able to access only 25% of the machine's RAM—the remaining 48GB will not be usable by the operating system or applications.

AWE also allows you to access far more memory than /3GB does. Obviously, you get just one additional gigabyte of private process space via /3GB. This additional space is made available to apps that are large address aware automatically and transparently, but it is limited to just 1GB. AWE, by contrast, can make the entirety of the physical RAM that's available to the operating system available to an application provided it has been coded to make use of the AWE Win32 API functions. So, while AWE is more trouble to use and access, it's far more flexible and open ended.

Address Translation

Address translation refers to the process of translating a virtual address into a physical RAM address. This occurs each time a process attempts to access a block of data using its virtual address. Each time a process tries to access a data block by address, three things can happen.

  1. The address will be valid and the page will already reside in physical memory.

  2. The address will be valid and the page will be stored in the system paging file. In this case, the data will be paged into physical memory so that it can be accessed. This is known as a page fault. (You can track the page faults for a process via Perfmon's Process:Page Faults/sec counter and via Task Manager's Page Faults column.)

  3. The address will be invalid and the system will raise an access violation exception (user mode) or blue screen (kernel mode).

Virtual addresses aren't mapped directly to physical addresses. Instead, each virtual address is composed of three elements: the page directory index, the page table index, and the byte index. These elements establish the mapping between the virtual address and the physical RAM it references.

For each process, the Windows memory manager creates a page directory that it uses to map all the page tables for the process. Windows stores the physical address of this page directory in each process's KPROCESS block (the kernel process block stored within the EPROCESS block mentioned in Chapter 3) and maps it to address 0xC0300000 in the process address space.

The CPU keeps track of the address of a process's page directory table via a special register (CR3, or Control Register 3, on x86; the PDR, or Page Directory Register, on Alpha). Each time a context switch occurs wherein a thread from a different process is scheduled on the CPU, this register is loaded from the KPROCESS block so that the CPU's MMU can determine where the page directory table resides. Context switches among threads in the same process do not require the register to be reloaded because all threads in a process share the same address space.

This special register serves as a bootstrap for the system's memory management facilities. Without it, a process's page directory cannot be located. Without the page directory, the process address space itself cannot be accessed. The register provides the entry point for the CPU's memory management hardware to access an individual process's address space.

Each page directory consists of a series of page directory entries. The first 10 bits of a 32-bit virtual address store a page directory entry (PDE) index that tells Windows which page table to use to locate the physical memory associated with the address.

Each page table consists of series of page table entries. The second 10 bits of a 32-bit virtual address provide an index into this table and indicate which page table entry (PTE) contains the address of the page in physical memory to which the virtual address is mapped.

On x86 processors, the last 12 bits of a 32-bit virtual address contain the byte offset on the physical memory page to which the virtual address refers. The system page size determines the number of bits required to store the offset. Since the system page size on x86 processors is 4K, 12 bits are required to store a page offset (4,096 = 212).

When an address is translated, the following events occur.

1.
The CPU's MMU locates the page directory for the process using the special register mentioned above.

2.
The page directory index (from the first 10 bits of the virtual address) is used to locate the PDE that identifies the page table needed to map the virtual address to a physical one.

3.
The page table index (from the second 10 bits of the virtual address) is used to locate the PTE that maps the physical location of the virtual memory page referenced by the address.

4.
The PTE is used to locate the physical page. If the virtual page is mapped to a page that is already in physical memory, the PTE will contain the page frame number (PFN) of the page in physical memory that contains the data in question. (Processors reference memory locations by PFN.) If the page is not in physical memory, the MMU raises a page fault, and the Windows page fault–handling code attempts to locate the page in the system paging file. If the page can be located, it is loaded into physical memory, and the PTE is updated to reflect its location. If it cannot be located and the translation is a user mode translation, an access violation occurs because the virtual address references an invalid physical address. If the page cannot be located and the translation is occurring in kernel mode, a bug check (also called a blue screen) occurs.

The four-step process required to resolve a virtual address to a physical one may seem inefficient at first glance. It may seem that it would be far simpler and more efficient to compose a virtual address of two basic components: (1) a PTE that stores the reference to the page in physical storage to which the virtual address maps and (2) a page offset that pinpoints the precise data location of the data block referenced by the address. However, the x86 and Alpha processors take the four-step approach they do in order to conserve memory. If we simplify this process into a basic one-step translation where each virtual address is composed of only two components as I've just described, we end up consuming far more memory to manage this table than we do in the four-step process, especially on systems where the majority of the address space is unallocated. We would need 1,048,576 PTEs to map a 4GB address space (4GB ÷ 4K page size = 1,048,576). With each PTE requiring a 32-bit pointer, we would need 4MB of physical memory to map the address space for each process (1,048,576 × 4 bytes = 4MB). Using the four-step process that x86 and Alpha processors employ, only the page directory must be fully defined—memory for the page directory can be allocated as necessary. Given that the address space for many processes is mostly unallocated, the physical memory this approach saves is significant.

That said, if this process occurred with every memory access, performance would likely be very poor, so the x86 and Alpha processors cache virtual-to-physical address translation pairs. The cache memory set aside for storing these address pairs is known as a Translation Buffer (TB) or Translation Look-aside Buffer (TLB). When the MMU is presented with a virtual address, it takes the virtual page number and compares it with the virtual page number of every entry in the cache. If it finds a match, it bypasses the four-step process and simply locates the PFN in physical memory from the cache entry. A downside of the Windows scheduler switching from one process to another is that cache entries associated with the process being taken off the scheduler must be cleared. The four-step process then fills the cache with entries from the new process.

Physical Address Extension

Intel processors starting with the Pentium Pro and later include support for a memory-mapping model called Physical Address Extension (PAE). PAE can provide access for up to 64GB of physical memory. In PAE mode, the MMU still implements page directories and page tables, but a new level exists above them: the page directory pointer table. Also, in PAE mode, PDEs and PTEs are 64 bits wide (rather than the standard 32 bits.) The system can address more memory than the standard translation because PDEs and PTEs are twice their standard width, not because of the page directory pointer table. The page directory pointer table is needed to manage these high-capacity tables and the indexes into them.

A special version of the Windows kernel is required to use PAE mode. This kernel ships with every version of Windows 2000 and later and resides in Ntkrnlpa.exe for uniprocessor machines and in Ntkrnlpamp for multiprocessor machines. You enable PAE use by adding the /PAE switch to your BOOT.INI file, just as you might add /3GB or /USERVA.

Exercises

Earlier in the chapter we discussed NULL pointer references and how Windows helps applications detect them (though it cannot completely prevent them). The next three exercises take you through some sample code that exhibits different types of NULL pointer references and shows how Windows handles each type.

Exercise 4.1 NULL Pointer References

1.
Create a console app based on Listing 4.1 by loading and compiling the Visual Studio project in the CH04memexamp00 subfolder on the CD accompanying this book . I'm assuming that you're working with Visual Studio C++ (VC++) version 6.0 or later in the steps that follow.

Listing 4.1. A NULL Pointer Reference
// memexamp00.cpp : NULL pointer reference example.
//

#include "stdafx.h"
#include "stdlib.h"
#include "string.h"

#define LAST_NAME_SIZE 2147483647

int main(int argc, char* argv[])
{
  char *pszLastName = (char *)malloc(LAST_NAME_SIZE);
  strcpy(pszLastName,"Smith");
  return 0;
}

2.
Set a breakpoint on the strcpy line and run the app.

3.
When the app stops at the strcpy, place your mouse over pszLastName in the VC++ editor window. A tool-tip hint should display indicating that pszLastName has a value of 0x00000000. Why is this? The pointer is NULL because we requested a larger memory allocation (2GB) than Windows could satisfy.

Because the code does no error checking, strcpy will attempt to copy the string “Smith” into this invalid address.

4.
Hit F10 to execute the strcpy line. You should now see an access violation. Windows has intercepted the attempted access of memory address 0x00000000 (NULL) and raised the error you see. Press Shift+F5 to stop debugging.

Exercise 4.2 An Obscured NULL Pointer Reference

Now let's modify the app to cause a NULL pointer reference that is not so obvious.

1.
Change your code to look like Listing 4.2 (or load memexamp01 from the CD).

Listing 4.2. A Less Obvious NULL Pointer Reference
// memexamp01.cpp : NULL pointer reference example.
//

#include "stdafx.h"
#include "stdlib.h"
#include "string.h"

#define LAST_NAME_SIZE 2147483647
char szLastName[]="Smith";

int main(int argc, char* argv[])
{
  char *pszLastName = (char *)malloc(LAST_NAME_SIZE);
  *(pszLastName+strlen(szLastName)+1)='';
  strncpy(pszLastName,szLastName,strlen(szLastName));
  return 0;
}

2.
In this code, we use strncpy rather than strcpy to fill the address referenced by pszLastName with data. strncpy is often preferred over strcpy because it helps prevent buffer overruns—you can control the number of characters copied. Because we've used strncpy, we have to take care of terminating the string referenced by pszLastName, so we begin by placing an ASCII 0 character at the end of the target buffer for szLastName. To compute the target address for the string terminator, we simply take the string length of szLastName, add it to the address contained in pszLastName, and add 1.

3.
Unfortunately, this code also assumes that the malloc call won't fail. When malloc fails, it returns NULL into pszLastName. This address is then used when we compute where to put the string terminator. Since it's 0, we're effectively attempting to place an ASCII 0 at a memory address that's equivalent to the length of the string referenced by szLast Name plus 1. So, rather than a plain NULL reference, we are referring to address 0x00000006—5 (the length of “Smith”) + 1.

4.
This is easy to see by looking at the disassembly for our app.

13:       char *pszLastName = (char *)malloc(LAST_NAME_SIZE);
00401028   push        7FFFFFFFh
0040102D   call        malloc (00401220)
00401032   add         esp,4
00401035   mov         dword ptr [ebp-4],eax
14:       *(pszLastName+strlen(szLastName)+1)='';
00401038   push        offset szLastName (00421a30)
0040103D   call        strlen (004011a0)
00401042   add         esp,4
00401045   mov         ecx,dword ptr [ebp-4]
00401048   mov         byte ptr [ecx+eax+1],0
								

  1. The call to malloc (Line 13) begins by pushing 0x7FFFFFFF onto the stack. This is the value of our LAST_NAME_SIZE constant: 2,147,483,647, or 2GB minus 1.

  2. Register eax contains the return value from malloc. Because we know the call will fail, we know that this value is NULL or 0x00000000. This value is moved into pszLastName immediately before our attempt to set up the string terminator.

  3. Line 14 computes the string length of szLastName, adds that value to the previous value stored in pszLastName plus 1, and attempts to treat this new value as an address (to dereference it) so that it can assign the string terminator. The actual dereference (and the cause of the ensuing access violation) appears in bold type in Listing 4.2.

5.
Because address 0x00000006 is within the first 64K of the process address space, an access violation is raised when we attempt to dereference it.

In the next exercise, we'll cause a NULL pointer reference by overwriting a pointer value. This is a common problem in applications, especially those that feature pointers prominently such as C and C++.

Exercise 4.3 A NULL Pointer Reference Due to a Memory Overwrite

Here's a fairly contrived example that demonstrates, once again, the usefulness of the NULL pointer access partition.

1.
Load the app shown in Listing 4.3 from the CD (CH04memexamp02) and compile it.

Listing 4.3. A NULL Pointer Reference Caused by Pointer Corruption
// memexamp02.cpp : NULL pointer reference caused by pointer
// corruption.
//

#include "stdafx.h"
#include "stdlib.h"
#include "string.h"

#define MAX_FIRST_NAME_SIZE 10
#define MAX_LAST_NAME_SIZE 30

#pragma pack(1)

struct NAME
{
  char szFirstName[MAX_FIRST_NAME_SIZE];
  char *pszLastName;
} nmEmployee;

int main(int argc, char* argv[])
{
  int dwFirstNameLen=__min(strlen(argv[1]),MAX_FIRST_NAME_SIZE);
  int dwLastNameLen=__min(strlen(argv[2]),MAX_LAST_NAME_SIZE);

  nmEmployee.pszLastName=(char *)malloc(dwLastNameLen);

  strncpy(nmEmployee.szFirstName,argv[1],dwFirstNameLen);
  strncpy(nmEmployee.pszLastName,argv[2],dwLastNameLen);

  nmEmployee.szFirstName[dwFirstNameLen+2]='';
  nmEmployee.pszLastName[dwLastNameLen+2]='';

  strupr(nmEmployee.pszLastName);

  printf("First Name=%s Last
    Name=%s
",nmEmployee.szFirstName,nmEmployee.pszLastName);
  return 0;
}

This code will work fine so long as the first argument passed into it is 8 characters or less. Thanks to the faulty pointer arithmetic used throughout the app, but especially when the name strings are terminated, a first name that's longer than 8 characters will cause the pszLastName pointer to be overwritten with an ASCII 0.

2.
To see how this works, set the command line parameters (Alt+F7 | Debug | Program arguments) to “Wolfgangus Mozart” (without quotes).

3.
Set a breakpoint at the line that assigns the string terminator for szFirstName:

nmEmployee.szFirstName[dwFirstNameLen+2]='';

4.
Now, run the app from inside the VC++ IDE. When the debugger stops at your breakpoint, add nmEmployee to your Watch window, then expand it so that you can see its members as you step through the code.

5.
Press F10 to step over the breakpoint line. You should notice in the Watch window that not only was szFirstName changed by the line just executed but pszLastName was changed as well (both members should appear red in the Watch window). This is because the ASCII 0 assigned to the end of szFirstName was actually written 3 bytes past the end of the string. Because szFirstName is 10 characters wide and because arrays in C++ are always zero-based, the valid indexes for szFirstName are 0–9. However, dwFirstNameLen equals 10. Assigning ASCII 0 to szFirstName[dwFirstNameLen] would have also overwritten pszLastName but would have gotten only the first byte of the four-byte pointer. Adding 2 to this offset pushes us into the third byte of the pszLastName pointer. By zeroing this byte, we change the address to one that happens to be in the first 64K of the process address space.

6.
Now attempt to step over the next line. Because the previous line corrupted the pszLastName pointer, you should see an access violation. The specific reason for the access violation is that you are referencing an address in the first 64K of memory, and Windows' NULL pointer access partition protection has caught that invalid reference.

I mentioned earlier in the chapter that you can retrieve the system's page size and allocation granularity through a call to the GetSystemInfo Win32 API function. In this next exercise, you'll build and run a SQL Server extended procedure that returns this same information.

Exercise 4.4 A GetSystemInfo Extended Stored Procedure

1.
Copy the xp_sysinfo project from the CH04xp_sysinfo subfolder on the book's CD onto your hard drive and load it into Visual C++. For curious readers, Listing 4.4 shows the complete source code of the xp_sysinfo extended procedure.

Listing 4.4. An Extended Procedure That Returns System Memory Information
RETCODE __declspec(dllexport) xp_sysinfo(SRV_PROC *srvproc)
{

  DBCHAR colname[MAXCOLNAME];
  DBCHAR szProcType[MAX_PATH];
  DBCHAR szMinAddress[MAXCOLNAME];
  DBCHAR szMaxAddress[MAXCOLNAME];
  DBCHAR szAffinityMask[MAXCOLNAME];
  SYSTEM_INFO si;

  GetSystemInfo(&si);

  //Set up the column names
  wsprintf(colname, "PageSize");
  srv_describe(srvproc, 1, colname, SRV_NULLTERM, SRVINT4,
      sizeof(DBINT), SRVINT4, sizeof(DBINT), &si.dwPageSize);

  wsprintf(colname, "AllocationGranularity");
  srv_describe(srvproc, 2, colname, SRV_NULLTERM, SRVINT4,
      sizeof(DBINT), SRVINT4, sizeof(DBINT),
      &si.dwAllocationGranularity);

  wsprintf(colname, "NumberOfProcessors");
  srv_describe(srvproc, 3, colname, SRV_NULLTERM, SRVINT4,
      sizeof(DBINT), SRVINT4, sizeof(DBINT),
      &si.dwNumberOfProcessors);

  wsprintf(colname, "ProcessorType");
  switch (si.wProcessorArchitecture)
  {
    case PROCESSOR_ARCHITECTURE_INTEL :
    {
      strcpy(szProcType,"Intel ");
      switch (si.wProcessorLevel)
      {
      case 3 :
        {
          strcat(szProcType,"386");
          break;
        }
      case 4 :
        {
          strcat(szProcType,"486");
          break;
        }
      case 5 :
        {
          strcat(szProcType,"Pentium");
          break;
        }
      case 6 :
        {
          strcat(szProcType,"Pentium II or Pentium Pro or later");
          break;
        }
      case 7 :
        {
          strcat(szProcType,"Pentium III");
          break;
        }
      case 8 :
        {
          strcat(szProcType,"Pentium 4");
          break;
        }
      default :
        {
          strcat(szProcType,"Unknown");
          break;
        }

      }
      break;
    }
    case PROCESSOR_ARCHITECTURE_MIPS :
      {
      strcpy(szProcType,"MIPS ");
      switch (si.wProcessorLevel)
      {
      case 4:
        {
          strcat(szProcType,"R4000");
          break;
        }
      default:
        {
          strcat(szProcType,"Unknown");
          break;
        }
      }
      break;
      }
    case PROCESSOR_ARCHITECTURE_ALPHA :
      {
      strcpy(szProcType,"Alpha ");
      switch (si.wProcessorLevel)
      {
      case 21064:
        {
          strcat(szProcType,"21064");
          break;
        }
      case 21066:
        {
          strcat(szProcType,"21066");
          break;
        }
      case 21164:
        {
          strcat(szProcType,"21164");
          break;
        }
      default:
        {
          strcat(szProcType,"Unknown");
          break;
        }
      }
      break;
      }
    case PROCESSOR_ARCHITECTURE_PPC :
      {
      strcpy(szProcType,"PPC ");
      switch (si.wProcessorLevel)
      {
      case 1:
        {
          strcpy(szProcType, "601");
          break;
        }
      case 3:
        {
          strcpy(szProcType, "603");
          break;
        }
      case 4:
        {
          strcpy(szProcType, "604");
          break;
        }
      case 6:
        {
          strcpy(szProcType, "603+");
          break;
        }
      case 9:
        {
          strcpy(szProcType, "604+");
          break;
        }
      case 20:
        {
          strcpy(szProcType, "620");
          break;
        }
      default:
        {
          strcat(szProcType,"Unknown");
          break;
        }
      }
      break;
      }
    default :
      {
      strcpy(szProcType,"Unknown ");
      break;
      }
  }
  srv_describe(srvproc, 4, colname, SRV_NULLTERM, SRVCHAR,
      strlen(szProcType), SRVCHAR, strlen(szProcType),
      &szProcType);

  wsprintf(colname, "ProcessorAffinityMask");
  wsprintf(szAffinityMask,"0x%08X",si.dwActiveProcessorMask);
  srv_describe(srvproc, 5, colname, SRV_NULLTERM, SRVCHAR,
      strlen(szAffinityMask), SRVCHAR, strlen(szAffinityMask),
      &szAffinityMask);

  wsprintf(colname, "MinimumAppAddress");
  wsprintf(szMinAddress,"0x%08X",si.lpMinimumApplicationAddress);
  srv_describe(srvproc, 6, colname, SRV_NULLTERM, SRVCHAR,
      strlen(szMinAddress), SRVCHAR, strlen(szMinAddress),
      &szMinAddress);

  wsprintf(colname, "MaximumAppAddress");
  wsprintf(szMaxAddress,"0x%08X",si.lpMaximumApplicationAddress);
  srv_describe(srvproc, 7, colname, SRV_NULLTERM, SRVCHAR,
      strlen(szMaxAddress), SRVCHAR, strlen(szMaxAddress),
      &szMaxAddress);

  wsprintf(colname, "UserModeAddressSpace");
  DWORD dwUserModeSpace = ((DWORD)si.lpMaximumApplicationAddress -
      (DWORD)si.lpMinimumApplicationAddress);
  srv_describe(srvproc, 8, colname, SRV_NULLTERM, SRVINT4,
      sizeof(DBINT), SRVINT4, sizeof(DBINT), &dwUserModeSpace);

  srv_sendrow(srvproc);

  // Now return the number of rows processed
  srv_senddone(srvproc, SRV_DONE_MORE | SRV_DONE_COUNT,
      (DBUSMALLINT)0, 1);

    return XP_NOERROR;

}

2.
Compile the project. This should produce a DLL named xp_sysinfo.dll in the Release subfolder under your root xp_sysinfo folder.

3.
Copy xp_sysinfo.dll to the binn folder under your SQL Server installation's root folder. If you've worked through the exercises in previous chapters, you may be asked whether to replace the existing xp_sysinfo. Answer Yes to this prompt.

4.
Add the xproc to the master database with this command:

sp_addextendedproc 'xp_sysinfo','xp_sysinfo.dll'

5.
Run xp_sysinfo from Query Analyzer. You should see output something like this (results abridged):

PageSize  AllocGranularity Processors ProcessorType    AffinityM
--------- ---------------- ---------- ---------------- ---------
4096      65536            2          Intel Pentium... 0x0000000

As you can see, the system page size is 4K and the allocation granularity is 64K. Note that these numbers may differ on other processors or in future versions of Windows.

Note also the UserModeSpace column. On this machine, the maximum user mode space is roughly 2GB. This tells us that the /3GB boot option was not successfully enabled. Since SQL Server is a large-address-aware application, it would reflect a user mode address space of roughly 3GB if it were running on an appropriate version of Windows and the system had been booted with /3GB.

Memory Basics Recap

Windows provides a rich set of facilities for making memory available to applications. Even though a machine may have a relatively small amount of physical RAM installed, Windows provides each process a 4GB virtual address space in which to run and transparently handles swapping physical memory to and from disk as necessary.

The x86 family of processors has a memory page size of 4K. This means that all memory allocations under Windows are actually carried out in multiples of 4K. For example, a 5K allocation request actually requires 8K of memory.

AWE and /3GB provide applications mechanisms for accessing memory beyond the standard 2GB user mode partition. The /3GB option actually limits the total amount of physical memory that Windows can manage, so it is generally not recommended. AWE is the more flexible of the two and can make all the physical memory that's visible to the operating system available to applications.

Memory Basics Knowledge Measure

  1. What is the system page size on the x86 family of processors?

  2. What is the allocation granularity size on 32-bit Windows?

  3. True or false: A page fault causes an exception to be raised that will crash an application if the application does not trap it with structured exception-handling (SEH) code.

  4. If you enable the /3GB option on Windows 2000 Professional, how much user mode address space will SQL Server be allotted when it starts up?

  5. True or false: Address translation refers to the two-step process in which the two components of a virtual address, the page table index and the page offset, are used to translate a virtual address into a physical one.

  6. True or false: Thrashing is the condition in which physical memory pages are continually swapped to and from the system paging file, often preventing applications from running in a timely fashion.

  7. What address region is set aside by Windows to help applications detect NULL pointer assignments?

  8. How large is the default user mode space in a 32-bit Windows process?

  9. How much total physical memory can Windows 2000 Data Center manage?

  10. True or false: Using the AWE functions causes the kernel mode space to be so compressed that only 16GB of total physical memory can be accessed by Windows.

  11. What VC++ linker switch enables an executable to be large address aware?

  12. Before support for the /3GB boot option was added to Windows, how many bits in a virtual address could a user mode application use to reference virtual memory directly?

  13. True or false: The system paging file can actually consist of several physical files that may reside on different disk drives.

  14. What does Task Manager's Mem Usage column indicate for a process?

  15. When an address translation is attempted on an invalid user mode address, what happens?

  16. What Windows API function covered in this chapter will return both the system page size and the system allocation granularity?

  17. True or false: The PEB is not allocated at a specific address in a process's virtual address space, and its location will almost always vary between processes.

  18. True or false: All processor chips supported by Windows have some form of built-in memory protection.

  19. What's the typical difference between Perfmon's Process:Private Bytes and Process:Page File Bytes counters?

  20. True or false: Because Windows is a 32-bit operating system, all user processes have a flat 4GB address space.

  21. What is the WinDbg command for displaying a process's PEB?

  22. What does Task Manager's VM Size column indicate for a process?

  23. What Win32 API function covered in this chapter can you use to deduce whether a process has an oversized user mode address space?

  24. True or false: Because the largest integer a 32-bit pointer can store is 232, the maximum memory that a user mode application may access is 4GB.

  25. True or false: The majority of the physical storage used to implement virtual memory comes from the physical RAM installed in the machine.

  26. What special-purpose register is used to store the location of the page directory on x86 processors?

  27. True or false: If a process needs more than the standard 2GB of virtual memory space, AWE is generally preferred over the /3GB option.

  28. What is a Translation Look-aside Buffer (TLB)?

  29. True or false: The shared user data page is actually backed by a page in kernel mode memory.

  30. True or false: Although an application can specify the size of a memory allocation it wants to make, it cannot specify the precise location for the allocation.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.224.62.105