23. Platform Interoperability and Unsafe Code

C# has great capabilities, especially when you consider that the underlying framework is entirely managed. Sometimes, however, you need to escape out of all the safety that C# provides and step back into the world of memory addresses and pointers. C# supports this action in two significant ways. The first option is to go through Platform Invoke (P/Invoke) and calls into APIs exposed by unmanaged dynamic link libraries (DLLs). The second way is through unsafe code, which enables access to memory pointers and addresses.

A figure depicts the platform interoperability and unsafe coding methods.

The majority of the chapter discusses interoperability with unmanaged code and the use of unsafe code. This discussion culminates with a small program that determines the processor ID of a computer. The code requires that you do the following:

  1. Call into an operating system DLL and request allocation of a portion of memory for executing instructions.

  2. Write some assembler instructions into the allocated area.

  3. Inject an address location into the assembler instructions.

  4. Execute the assembler code.

Aside from the P/Invoke and unsafe constructs covered here, the complete listing demonstrates the full power of C# and the fact that the capabilities of unmanaged code are still accessible from C# and managed code.

Platform Invoke

Whether a developer is trying to call a library of existing unmanaged code, accessing unmanaged code in the operating system not exposed in any managed API, or trying to achieve maximum performance for an algorithm by avoiding the runtime overhead of type checking and garbage collection, at some point there must be a call into unmanaged code. The Common Language Infrastructure (CLI) provides this capability through P/Invoke. With P/Invoke, you can make API calls into exported functions of unmanaged DLLs.

The APIs invoked in this section are Windows APIs. Although the same APIs are not available on other platforms, developers can still use P/Invoke for APIs native to their operating systems or for calls into their own DLLs. The guidelines and syntax are the same.

Declaring External Functions

Once the target function is identified, the next step of P/Invoke is to declare the function with managed code. Just as with all regular methods that belong to a class, you need to declare the targeted API within the context of a class, but by using the extern modifier. Listing 23.1 demonstrates how to do this.

Listing 23.1: Declaring an External Method

using System;
using System.Runtime.InteropServices;
class VirtualMemoryManager
{
  [DllImport("kernel32.dll", EntryPoint="GetCurrentProcess")]
  internal static extern IntPtr GetCurrentProcessHandle();
}

In this case, the class is VirtualMemoryManager, because it will contain functions associated with managing memory. (This particular function is available directly off the System.Diagnostics.Processor class, so there is no need to declare it in real code.) Note that the method returns an IntPtr; this type is explained in the next section.

The extern methods never include any body and are (almost) always static. Instead of a method body, the DllImport attribute, which accompanies the method declaration, points to the implementation. At a minimum, the attribute needs the name of the DLL that defines the function. The runtime determines the function name from the method name, although you can override this default by using the EntryPoint named parameter to provide the function name. (The .NET framework will automatically attempt calls to the Unicode [...W] or ASCII [...A] API version.)

In this case, the external function, GetCurrentProcess(), retrieves a pseudohandle for the current process that you will use in the call for virtual memory allocation. Here’s the unmanaged declaration:

HANDLE GetCurrentProcess();

Parameter Data Types

Assuming the developer has identified the targeted DLL and exported function, the most difficult step is identifying or creating the managed data types that correspond to the unmanaged types in the external function.1 Listing 23.2 shows a more difficult API.

1. One particularly helpful resource for declaring Win32 APIs is http://www.pinvoke.net. It provides a great starting point for many APIs, helping you avoid some of the subtle problems that can arise when coding an external API call from scratch.

Listing 23.2: The VirtualAllocEx() API

LPVOID VirtualAllocEx(
    HANDLE hProcess,        // The handle to a process. The
                            // function allocates memory within
                            // the virtual address space of this
                            // process.
    LPVOID lpAddress,       // The pointer that specifies a
                            // desired starting address for the
                            // region of pages that you want to
                            // allocate. If lpAddress is NULL,
                            // the function determines where to
                            // allocate the region.
    SIZE_T dwSize,          // The size of the region of memory to
                            // allocate, in bytes. If lpAddress
                            // is NULL, the function rounds dwSize
                            // up to the next page boundary.
    DWORD flAllocationType, // The type of memory allocation
    DWORD flProtect);       // The type of memory allocation

VirtualAllocEx() allocates virtual memory that the operating system specifically designates for execution or data. To call it, you need corresponding definitions in managed code for each data type; although common in Win32 programming, HANDLE, LPVOID, SIZE_T, and DWORD are undefined in the CLI managed code. The declaration in C# for VirtualAllocEx(), therefore, is shown in Listing 23.3.

Listing 23.3: Declaring the VirtualAllocEx() API in C#

using System;
using System.Runtime.InteropServices;
class VirtualMemoryManager
{
  [DllImport("kernel32.dll")]
  internal static extern IntPtr GetCurrentProcess();

  [DllImport("kernel32.dll", SetLastError = true)]
  private static extern IntPtr VirtualAllocEx(
      IntPtr hProcess,
      IntPtr lpAddress,
      IntPtr dwSize,
      AllocationType flAllocationType,
      uint flProtect);
}

One distinct characteristic of managed code is that primitive data types such as int do not change their size on the basis of the processor. Whether the processor is 16, 32, or 64 bits, int is always 32 bits. In unmanaged code, however, memory pointers will vary depending on the processor. Therefore, instead of mapping types such as HANDLE and LPVOID simply to ints, you need to map to System.IntPtr, whose size will vary depending on the processor memory layout. This example also uses an AllocationType enum, which we discuss in the section “Simplifying API Calls with Wrappers” later in this chapter.

An interesting point to note about Listing 23.3 is that IntPtr is useful for more than just pointers—that is, it is useful for other things such as quantities. IntPtr does not mean just “pointer stored in an integer”; it also means “integer that is the size of a pointer.” An IntPtr need not contain a pointer but simply needs to contain something the size of a pointer. Lots of things are the size of a pointer but are not actually pointers.

Using ref Rather Than Pointers

Frequently, unmanaged code uses pointers for pass-by-reference parameters. In these cases, P/Invoke doesn’t require that you map the data type to a pointer in managed code. Instead, you map the corresponding parameters to ref (or out, depending on whether the parameter is in/out or just out). In Listing 23.4, lpflOldProtect, whose data type is PDWORD, returns the “pointer to a variable that receives the previous access protection of the first page in the specified region of pages.”2

2. MSDN documentation.

Listing 23.4: Using ref and out Rather Than Pointers

class VirtualMemoryManager
{
  // ...
  [DllImport("kernel32.dll", SetLastError = true)]
  static extern bool VirtualProtectEx(
      IntPtr hProcess, IntPtr lpAddress,
      IntPtr dwSize, uint flNewProtect,
      ref uint lpflOldProtect);
}

Although lpflOldProtect is documented as [out] (even though the signature doesn’t enforce it), the description also mentions that the parameter must point to a valid variable and not NULL. This inconsistency is confusing but commonly encountered. The guideline is to use ref rather than out for P/Invoke type parameters, since the callee can always ignore the data passed with ref, but the converse will not necessarily succeed.

The other parameters are virtually the same as VirtualAllocEx() except that lpAddress is the address returned from VirtualAllocEx(). In addition, flNewProtect specifies the exact type of memory protection: page execute, page read-only, and so on.

Using StructLayoutAttribute for Sequential Layout

Some APIs involve types that have no corresponding managed type. Calling these types requires redeclaration of the type in managed code. You declare the unmanaged COLORREF struct, for example, in managed code (see Listing 23.5).

Listing 23.5: Declaring Types from Unmanaged Structs

[StructLayout(LayoutKind.Sequential)]
struct ColorRef
{
  public byte Red;
  public byte Green;
  public byte Blue;
  // Turn off the warning about not accessing Unused
  #pragma warning disable 414
  private byte Unused;
  #pragma warning restore 414

  public ColorRef(byte red, byte green, byte blue)
  {
      Blue = blue;
      Green = green;
      Red = red;
      Unused = 0;
  }
}

Various Microsoft Windows color APIs use COLORREF to represent RGB colors (i.e., levels of red, green, and blue).

The key in the Listing 23.5 declaration is StructLayoutAttribute. By default, managed code can optimize the memory layouts of types, so layouts may not be sequential from one field to the next. To force sequential layouts so that a type maps directly and can be copied bit for bit (blitted) from managed to unmanaged code, and vice versa, you add the StructLayoutAttribute with the LayoutKind.Sequential enum value. (This is also useful when writing data to and from filestreams where a sequential layout may be expected.)

Since the unmanaged (C++) definition for struct does not map to the C# definition, there is no direct mapping of unmanaged struct to managed struct. Instead, developers should follow the usual C# guidelines about whether the type should behave like a value or a reference type, and whether the size is small (approximately less than 16 bytes).

Error Handling

One inconvenient aspect of Win32 API programming is the fact that the APIs frequently report errors in inconsistent ways. For example, some APIs return a value (0, 1, false, and so on) to indicate an error, whereas others set an out parameter in some way. Furthermore, the details of what went wrong require additional calls to the GetLastError() API and then an additional call to FormatMessage() to retrieve an error message corresponding to the error. In summary, Win32 error reporting in unmanaged code seldom occurs via exceptions.

Fortunately, the P/Invoke designers provided a mechanism for error handling. To enable it, if the SetLastError named parameter of the DllImport attribute is true, it is possible to instantiate a System .ComponentModel.Win32Exception() that is automatically initialized with the Win32 error data immediately following the P/Invoke call (see Listing 23.6).

Listing 23.6: Win32 Error Handling

class VirtualMemoryManager
{
  [DllImport("kernel32.dll", ", SetLastError = true)]
  private static extern IntPtr VirtualAllocEx(
      IntPtr hProcess,
      IntPtr lpAddress,
      IntPtr dwSize,
      AllocationType flAllocationType,
      uint flProtect);

  // ...
  [DllImport("kernel32.dll", SetLastError = true)]
  static extern bool VirtualProtectEx(
      IntPtr hProcess, IntPtr lpAddress,
      IntPtr dwSize, uint flNewProtect,
      ref uint lpflOldProtect);

  [Flags]
  private enum AllocationType : uint
  {
      // ...
  }

  [Flags]
  private enum ProtectionOptions
  {
      // ...
  }

  [Flags]
  private enum MemoryFreeType
  {
      // ...
  }

  public static IntPtr AllocExecutionBlock(
      int size, IntPtr hProcess)
  {
      IntPtr codeBytesPtr;
      codeBytesPtr = VirtualAllocEx(
          hProcess, IntPtr.Zero,
          (IntPtr)size,
          AllocationType.Reserve | AllocationType.Commit,
          (uint)ProtectionOptions.PageExecuteReadWrite);

      if (codeBytesPtr == IntPtr.Zero)
      {
          throw new System.ComponentModel.Win32Exception();                                    
      }

      uint lpflOldProtect = 0;
      if (!VirtualProtectEx(
          hProcess, codeBytesPtr,
          (IntPtr)size,
          (uint)ProtectionOptions.PageExecuteReadWrite,
          ref lpflOldProtect))
      {
          throw new System.ComponentModel.Win32Exception();                                    
      }
      return codeBytesPtr;
  }

  public static IntPtr AllocExecutionBlock(int size)
  {
      return AllocExecutionBlock(
          size, GetCurrentProcessHandle());
  }
}

This code enables developers to provide the custom error checking that each API uses while still reporting the error in a standard manner.

Listings 23.1 and 23.3 declared the P/Invoke methods as internal or private. Except for the simplest of APIs, wrapping methods in public wrappers that reduce the complexity of the P/Invoke API calls is a good guideline that increases API usability and moves toward object-oriented type structure. The AllocExecutionBlock() declaration in Listing 23.6 provides a good example of this approach.

Using SafeHandle

Frequently, P/Invoke involves a resource, such as a handle, that code needs to clean up after using. Instead of requiring developers to remember this step is necessary and manually code it each time, it is helpful to provide a class that implements IDisposable and a finalizer. In Listing 23.7, for example, the address returned after VirtualAllocEx() and VirtualProtectEx() requires a follow-up call to VirtualFreeEx(). To provide built-in support for this process, you define a VirtualMemoryPtr class that derives from System.Runtime.InteropServices.SafeHandle.

Begin 2.0

Listing 23.7: Managed Resources Using SafeHandle

public class VirtualMemoryPtr :
  System.Runtime.InteropServices.SafeHandle
{
  public VirtualMemoryPtr(int memorySize) :
      base(IntPtr.Zero, true)
  {
      _ProcessHandle =
          VirtualMemoryManager.GetCurrentProcessHandle();
      _MemorySize = (IntPtr)memorySize;
      _AllocatedPointer =
          VirtualMemoryManager.AllocExecutionBlock(
          memorySize, ProcessHandle);
      _Disposed = false;
  }
  public readonly IntPtr _AllocatedPointer;
  readonly IntPtr _ProcessHandle;
  readonly IntPtr _MemorySize;
  bool _Disposed;

  public static implicit operator IntPtr(
      VirtualMemoryPtr virtualMemoryPointer)
  {
      return virtualMemoryPointer.AllocatedPointer;
  }

  // SafeHandle abstract member
  public override bool IsInvalid
  {
      get
      {
          return _Disposed;
      }
  }

  // SafeHandle abstract member
  protected override bool ReleaseHandle()
  {
      if (!_Disposed)
      {
          _Disposed = true;
          GC.SuppressFinalize(this);
          VirtualMemoryManager.VirtualFreeEx(_ProcessHandle,
              _AllocatedPointer, _MemorySize);
      }
      return true;
  }
}

System.Runtime.InteropServices.SafeHandle includes the abstract members IsInvalid and ReleaseHandle(). You place your cleanup code in the latter; the former indicates whether this code has executed yet.

With VirtualMemoryPtr, you can allocate memory simply by instantiating the type and specifying the needed memory allocation.

End 2.0

Calling External Functions

Once you declare the P/Invoke functions, you invoke them just as you would any other class member. The key, however, is that the imported DLL must be in the path, including the executable directory, so that it can be successfully loaded. Listings 23.6 and 23.7 demonstrate this approach. However, they rely on some constants.

Since flAllocationType and flProtect are flags, it is a good practice to provide constants or enums for each. Instead of expecting the caller to define these constants or enums, encapsulation suggests that you provide them as part of the API declaration, as shown in Listing 23.8.

Listing 23.8: Encapsulating the APIs Together

class VirtualMemoryManager
{
  // ...

  /// <summary>
  /// The type of memory allocation. This parameter must
  /// contain one of the following values.
  /// </summary>
  [Flags]
  private enum AllocationType : uint
  {
      /// <summary>
      /// Allocates physical storage in memory or in the
      /// paging file on disk for the specified reserved
      /// memory pages. The function initializes the memory
      /// to zero.
      /// </summary>
      Commit = 0x1000,
      /// <summary>
      /// Reserves a range of the process's virtual address
      /// space without allocating any actual physical
      /// storage in memory or in the paging file on disk.
      /// </summary>
      Reserve = 0x2000,
      /// <summary>
      /// Indicates that data in the memory range specified by
      /// lpAddress and dwSize is no longer of interest. The
      /// pages should not be read from or written to the
      /// paging file. However, the memory block will be used
      /// again later, so it should not be decommitted. This
      /// value cannot be used with any other value.
      /// </summary>
      Reset = 0x80000,
      /// <summary>
      /// Allocates physical memory with read-write access.
      /// This value is solely for use with Address Windowing
      /// Extensions (AWE) memory.
      /// </summary>
      Physical = 0x400000,
      /// <summary>
      /// Allocates memory at the highest possible address.
      /// </summary>
      TopDown = 0x100000,
  }

  /// <summary>
  /// The memory protection for the region of pages to be
  /// allocated.
  /// </summary>
  [Flags]
  private enum ProtectionOptions : uint
  {

      /// <summary>
      /// Enables execute access to the committed region of
      /// pages. An attempt to read or write to the committed
      /// region results in an access violation.
      /// </summary>
      Execute = 0x10,
      /// <summary>
      /// Enables execute and read access to the committed
      /// region of pages. An attempt to write to the
      /// committed region results in an access violation.
      /// </summary>
      PageExecuteRead = 0x20,
      /// <summary>
      /// Enables execute, read, and write access to the
      /// committed region of pages.
      /// </summary>
      PageExecuteReadWrite = 0x40,
      // ...
  }

  /// <summary>
  /// The type of free operation.
  /// </summary>
  [Flags]
  private enum MemoryFreeType : uint
  {
      /// <summary>
      /// Decommits the specified region of committed pages.
      /// After the operation, the pages are in the reserved
      /// state.
      /// </summary>
      Decommit = 0x4000,
      /// <summary>
      /// Releases the specified region of pages. After this
      /// operation, the pages are in the free state.
      /// </summary>
      Release = 0x8000
  }

  // ...
}

The advantage of enums is that they group together the various values. Furthermore, they can limit the scope to nothing else besides these values.

Simplifying API Calls with Wrappers

Whether they are focused on error handling, structs, or constant values, one goal of effective API developers is to provide a simplified managed API that wraps the underlying Win32 API. For example, Listing 23.9 overloads VirtualFreeEx() with public versions that simplify the call.

Listing 23.9: Wrapping the Underlying API

class VirtualMemoryManager
{
  // ...

  [DllImport("kernel32.dll", SetLastError = true)]
  static extern bool VirtualFreeEx(
      IntPtr hProcess, IntPtr lpAddress,
      IntPtr dwSize, IntPtr dwFreeType);
  public static bool VirtualFreeEx(
      IntPtr hProcess, IntPtr lpAddress,
      IntPtr dwSize)
  {
      bool result = VirtualFreeEx(
          hProcess, lpAddress, dwSize,
          (IntPtr)MemoryFreeType.Decommit);
      if (!result)
      {
          throw new System.ComponentModel.Win32Exception();
      }
      return result;
  }
  public static bool VirtualFreeEx(
      IntPtr lpAddress, IntPtr dwSize)
  {
      return VirtualFreeEx(
          GetCurrentProcessHandle(), lpAddress, dwSize);
  }

  [DllImport("kernel32", SetLastError = true)]
  static extern IntPtr VirtualAllocEx(
      IntPtr hProcess,
      IntPtr lpAddress,
      IntPtr dwSize,
      AllocationType flAllocationType,
      uint flProtect);

  // ...
}

Function Pointers Map to Delegates

One last key point related to P/Invoke is that function pointers in unmanaged code map to delegates in managed code. To set up a timer, for example, you would provide a function pointer that the timer could call back on, once it had expired. Specifically, you would pass a delegate instance that matches the signature of the callback.

Guidelines

Given the idiosyncrasies of P/Invoke, there are several guidelines to aid in the process of writing such code.

Pointers and Addresses

On occasion, developers may want to access and work with memory, and with pointers to memory locations, directly. This is necessary, for example, for certain operating system interactions as well as with certain types of time-critical algorithms. To support this capability, C# requires use of the unsafe code construct.

Unsafe Code

One of C#’s great features is the fact that it is strongly typed and supports type checking throughout the runtime execution. What makes this feature especially beneficial is that it is possible to circumvent this support and manipulate memory and addresses directly. You would do so when working with memory-mapped devices, for example, or if you wanted to implement time-critical algorithms. The key is to designate a portion of the code as unsafe.

Unsafe code is an explicit code block and compilation option, as shown in Listing 23.10. The unsafe modifier has no effect on the generated CIL code itself, but rather is a directive to the compiler to permit pointer and address manipulation within the unsafe block. Furthermore, unsafe does not imply unmanaged.

Listing 23.10: Designating a Method for Unsafe Code

class Program
{
  unsafe static int Main(string[] args)
  {
      // ...
  }
}

You can use unsafe as a modifier to the type or to specific members within the type.

In addition, C# allows unsafe as a statement that flags a code block to allow unsafe code (see Listing 23.11).

Listing 23.11: Designating a Code Block for Unsafe Code

class Program
{
  static int Main(string[] args)
  {
      unsafe
      {
          // ...
      }
  }
}

Code within the unsafe block can include unsafe constructs such as pointers.

Note

It is necessary to explicitly indicate to the compiler that unsafe code is supported.

When you write unsafe code, your code becomes vulnerable to the possibility of buffer overflows and similar outcomes that may potentially expose security holes. For this reason, it is necessary to explicitly notify the compiler that unsafe code occurs. To accomplish this, set AllowUnsafeBlocks to true in your CSPROJ file, as shown in Listing 23.12.

Listing 23.12: Invalid Referent Type Example

<Project Sdk="Microsoft.NET.Sdk">
  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>netcoreapp1.0</TargetFramework>
    <ProductName>Chapter20</ProductName>
    <WarningLevel>2</WarningLevel>
    <AllowUnsafeBlocks>True</AllowUnsafeBlocks>                                     
  </PropertyGroup>
  <Import Project="..Versioning.targets" />
  <ItemGroup>
    <ProjectReference Include="..SharedCodeSharedCode.csproj" />
  </ItemGroup>
</Project>

Alternatively, you can pass the property on the command line when running dotnet build (see Output 23.1).

Output 23.1

dotnet build /property:AllowUnsafeBlocks=True

Or, if invoking C# compiler directly, you need the /unsafe switch (see Output 23.2).

Output 23.2

csc.exe /unsafe Program.cs

With Visual Studio, you can activate this feature by checking the Allow Unsafe Code checkbox from the Build tab of the Project Properties window.

The /unsafe switch enables you to directly manipulate memory and execute instructions that are unmanaged. Requiring /unsafe, therefore, makes explicit any exposure to potential security vulnerabilities that such code might introduce. With great power comes great responsibility.

Pointer Declaration

Now that you have marked a code block as unsafe, it is time to look at how to write unsafe code. First, unsafe code allows the declaration of a pointer. Consider the following example:

byte* pData;

Assuming pData is not null, its value points to a location that contains one or more sequential bytes; the value of pData represents the memory address of the bytes. The type specified before the * is the referent type—that is, the type located where the value of the pointer refers. In this example, pData is the pointer and byte is the referent type, as shown in Figure 23.1.

A figure illustrates the memory allocation of the pointer variable address in a stack. Two declarations, byte* pData, and byte[] data are considered. pData is a pointer here which is pointing to the three addresses from negative 0x0338EE9C to negative 0x0338EE9C, stored sequentially and kept in the byte referent. The byte []data points to the memory 0 times 42.

Figure 23.1: Pointers contain the address of the data

Because pointers are simply integers that happen to refer to a memory address, they are not subject to garbage collection. C# does not allow referent types other than unmanaged types, which are types that are not reference types, are not generics, and do not contain reference types. Therefore, the following command is not valid:

string* pMessage;

Likewise, this command is not valid:

ServiceStatus* pStatus;

where ServiceStatus is defined as shown in Listing 23.13. The problem, once again, is that ServiceStatus includes a string field.

Listing 23.13: Invalid Referent Type Example

struct ServiceStatus
{
  int State;
  string Description;  // Description is a reference type
}

In addition to custom structs that contain only unmanaged types, valid referent types include enums, predefined value types (sbyte, byte, short, ushort, int, uint, long, ulong, char, float, double, decimal, and bool), and pointer types (such as byte**). Lastly, valid syntax includes void* pointers, which represent pointers to an unknown type.

Assigning a Pointer

Once code defines a pointer, it needs to assign a value before accessing it. Just like reference types, pointers can hold the value null, which is their default value. The value stored by the pointer is the address of a location. Therefore, to assign the pointer, you must first retrieve the address of the data.

You could explicitly cast an int or a long into a pointer, but this rarely occurs without a means of determining the address of a particular data value at execution time. Instead, you need to use the address operator (&) to retrieve the address of the value type:

byte* pData = &bytes[0];  // Compile error

The problem is that in a managed environment, data can move, thereby invalidating the address. The resulting error message will be “You can only take the address of [an] unfixed expression inside a fixed statement initializer.” In this case, the byte referenced appears within an array, and an array is a reference type (a movable type). Reference types appear on the heap and are subject to garbage collection or relocation. A similar problem occurs when referring to a value type field on a movable type:

int* a = &"message".Length;

Either way, assigning an address of some data requires that the following criteria are met:

  • The data must be classified as a variable.

  • The data must be an unmanaged type.

  • The variable needs to be classified as fixed, not movable.

If the data is an unmanaged variable type but is not fixed, use the fixed statement to fix a movable variable.

Fixing Data

To retrieve the address of a movable data item, it is necessary to fix, or pin, the data, as demonstrated in Listing 23.14.

Listing 23.14: Fixed Statement

byte[] bytes = new byte[24];
fixed (byte* pData = &bytes[0])  // pData = bytes also allowed
{
  // ...
}

Within the code block of a fixed statement, the assigned data will not move. In this example, bytes will remain at the same address, at least until the end of the fixed statement.

The fixed statement requires the declaration of the pointer variable within its scope. This avoids accessing the variable outside the fixed statement, when the data is no longer fixed. However, as a programmer, you are responsible for ensuring that you do not assign the pointer to another variable that survives beyond the scope of the fixed statement—possibly in an API call, for example. Unsafe code is called “unsafe” for a reason; you must ensure that you use the pointers safely, rather than relying on the runtime to enforce safety on your behalf. Similarly, using ref or out parameters will be problematic for data that will not survive beyond the method call.

Since a string is an invalid referent type, it would appear invalid to define pointers to strings. However, as in C++, internally a string is a pointer to the first character of an array of characters, and it is possible to declare pointers to characters using char*. Therefore, C# allows for declaring a pointer of type char* and assigning it to a string within a fixed statement. The fixed statement prevents the movement of the string during the life of the pointer. Similarly, it allows any movable type that supports an implicit conversion to a pointer of another type, given a fixed statement.

You can replace the verbose assignment of &bytes[0] with the abbreviated bytes, as shown in Listing 23.15.

Listing 23.15: A fixed Statement without Address or Array Indexer

byte[] bytes = new byte[24];
fixed (byte* pData = bytes)
{
  // ...
}

Depending on the frequency and time needed for their execution, fixed statements may have the potential to cause fragmentation in the heap because the garbage collector cannot compact fixed objects. To reduce this problem, the best practice is to pin blocks early in the execution and to pin fewer large blocks rather than many small blocks. Unfortunately, this preference must be tempered with the practice of pinning as little as possible for as short a time as possible, so as to minimize the chance that a collection will happen during the time that the data is pinned. To some extent, .NET 2.0 reduces this problem through its inclusion of some additional fragmentation-aware code.

Potentially you might need to fix an object in place in one method body and have it remain fixed until another method is called; this is not possible with the fixed statement. If you are in this unfortunate situation, you can use methods on the GCHandle object to fix an object in place indefinitely. You should do so only if it is absolutely necessary, however; fixing an object for a long time makes it highly likely that the garbage collector will be unable to efficiently compact memory.

Allocating on the Stack

You should use the fixed statement on an array to prevent the garbage collector from moving the data. However, an alternative is to allocate the array on the call stack. Stack allocated data is not subject to garbage collection or to the finalizer patterns that accompany it. Like referent types, the requirement is that the stackalloc data is an array of unmanaged types. For example, instead of allocating an array of bytes on the heap, you can place it onto the call stack, as shown in Listing 23.16.

Listing 23.16: Allocating Data on the Call Stack

byte* bytes = stackalloc byte[42];

Because the data type is an array of unmanaged types, the runtime can allocate a fixed buffer size for the array and then restore that buffer once the pointer goes out of scope. Specifically, it allocates sizeof(T) * E, where E is the array size and T is the referent type. Given the requirement of using stackalloc only on an array of unmanaged types, the runtime restores the buffer back to the system by simply unwinding the stack, thereby eliminating the complexities of iterating over the f-reachable queue (see the “Garbage Collection” section and discussion of finalization in Chapter 10) and compacting reachable data. Thus, there is no way to explicitly free stackalloc data.

The stack is a precious resource. Although it is small, running out of stack space will have a big effect—namely, the program will crash. For this reason, you should make every effort to avoid running out stack space. If a program does run out of stack space, the best thing that can happen is for the program to shut down/crash immediately. Generally, programs have less than 1MB of stack space (and possibly a lot less). Therefore, take great care to avoid allocating arbitrarily sized buffers on the stack.

Dereferencing a Pointer

Accessing the data stored in a variable of a type referred to by a pointer requires that you dereference the pointer, placing the indirection operator prior to the expression. For example, byte data = *pData; dereferences the location of the byte referred to by pData and produces a variable of type byte. The variable provides read/write access to the single byte at that location.

Using this principle in unsafe code allows the unorthodox behavior of modifying the “immutable” string, as shown in Listing 23.17. In no way is this strategy recommended, even though it does expose the potential of low-level memory manipulation.

Listing 23.17: Modifying an Immutable String

string text = "S5280ft";
Console.Write("{0} = ", text);
unsafe  // Requires /unsafe switch
{
  fixed (char* pText = text)
  {

      char* p = pText;
      *++p = 'm';
      *++p = 'i';
      *++p = 'l';
      *++p = 'e';
      *++p = ' ';
      *++p = ' ';
  }
}
Console.WriteLine(text);

The results of Listing 23.17 appear in Output 23.3.

Output 23.3

S5280ft = Smile

In this case, you take the original address and increment it by the size of the referent type (sizeof(char)), using the pre-increment operator. Next, you dereference the address using the indirection operator and then assign the location with a different character. Similarly, using the + and operators on a pointer changes the address by the * sizeof(T) operand, where T is the referent type.

The comparison operators (==, !=, <, >, <=, and >=) also work to compare pointers. Thus, their use effectively translates to a comparison of address location values.

One restriction on the dereferencing operator is the inability to dereference a void*. The void* data type represents a pointer to an unknown type. Since the data type is unknown, it can’t be dereferenced to produce a variable. Instead, to access the data referenced by a void*, you must convert it to another pointer type and then dereference the latter type.

You can achieve the same behavior as implemented in Listing 23.17 by using the index operator rather than the indirection operator (see Listing 23.18).

Listing 23.18: Modifying an Immutable String with the Index Operator in Unsafe Code

string text;
text = "S5280ft";
Console.Write("{0} = ", text);

unsafe  // Requires /unsafe switch
{
  fixed (char* pText = text)
  {
      pText[1] = 'm';
      pText[2] = 'i';
      pText[3] = 'l';
      pText[4] = 'e';
      pText[5] = ' ';
      pText[6] = ' ';
  }
}
Console.WriteLine(text);

The results of Listing 23.18 appear in Output 23.4.

Output 23.4

S5280ft = Smile

Modifications such as those in Listings 23.17 and 23.18 can lead to unexpected behavior. For example, if you reassigned text to "S5280ft" following the Console.WriteLine() statement and then redisplayed text, the output would still be Smile because the address of two equal string literals is optimized to one string literal referenced by both variables. In spite of the apparent assignment

text = "S5280ft";

after the unsafe code in Listing 23.17, the internals of the string assignment are an address assignment of the modified "S5280ft" location, so text is never set to the intended value.

Accessing the Member of a Referent Type

Dereferencing a pointer produces a variable of the pointer’s underlying type. You can then access the members of the underlying type using the member access dot operator in the usual way. However, the rules of operator precedence require that *x.y means *(x.y), which is probably not what you intended. If x is a pointer, the correct code is (*x).y, which is an unpleasant syntax. To make it easier to access members of a dereferenced pointer, C# provides a special member access operator: x->y is a shorthand for (*x).y, as shown in Listing 23.19.

Listing 23.19: Directly Accessing a Referent Type’s Members

unsafe
{
  Angle angle = new Angle(30, 18, 0);
  Angle* pAngle = &angle;
  System.Console.WriteLine("{0}° {1}' {2}"",
      pAngle->Hours, pAngle->Minutes, pAngle->Seconds);
}

The results of Listing 23.19 appear in Output 23.5.

Output 23.5

30° 18' 0

Executing Unsafe Code via a Delegate

As promised at the beginning of this chapter, we finish up with a full working example of what is likely the most “unsafe” thing you can do in C#: obtain a pointer to a block of memory, fill it with the bytes of machine code, make a delegate that refers to the new code, and execute it. In this example, we use assembly code to determine the processor ID. If run on a Windows machine, it prints the processor ID. Listing 23.20 shows how to do it.

Listing 23.20: Designating a Block for Unsafe Code

using System;
using System.Runtime.InteropServices;
using System.Text;

class Program
{
  public unsafe delegate void MethodInvoker(byte* buffer);

  public unsafe static int ChapterMain()
  {
      if (RuntimeInformation.IsOSPlatform(OSPlatform.Windows))
      {
          unsafe
          {
              byte[] codeBytes = new byte[] {
              0x49, 0x89, 0xd8,       // mov    %rbx,%r8
              0x49, 0x89, 0xc9,       // mov    %rcx,%r9
              0x48, 0x31, 0xc0,       // xor    %rax,%rax
              0x0f, 0xa2,             // cpuid
              0x4c, 0x89, 0xc8,       // mov    %r9,%rax
              0x89, 0x18,             // mov    %ebx,0x0(%rax)
              0x89, 0x50, 0x04,       // mov    %edx,0x4(%rax)
              0x89, 0x48, 0x08,       // mov    %ecx,0x8(%rax)
              0x4c, 0x89, 0xc3,       // mov    %r8,%rbx
              0xc3                    // retq
          };

              byte[] buffer = new byte[12];

              using (VirtualMemoryPtr codeBytesPtr =
                  new VirtualMemoryPtr(codeBytes.Length))
              {
                  Marshal.Copy(
                      codeBytes, 0,
                      codeBytesPtr, codeBytes.Length);

                  MethodInvoker method = Marshal.GetDelegateForFunctionPointer<MethodInvoker>(codeBytesPtr);
                  fixed (byte* newBuffer = &buffer[0])
                  {
                      method(newBuffer);
                  }
              }
              Console.Write("Processor Id: ");
              Console.WriteLine(ASCIIEncoding.ASCII.GetChars(buffer));
          } // unsafe
      }
      else
      {
          Console.WriteLine("This sample is only valid for Windows");
      }
      return 0;
  }
}

The results of Listing 23.20 appear in Output 23.6.

Output 23.6

Processor Id: GenuineIntel

Summary

As demonstrated throughout this book, C# offers great power, flexibility, consistency, and a fantastic structure. This chapter highlighted the ability of C# programs to perform very low-level machine-code operations.

Before we end the book, Chapter 24 briefly describes the underlying execution framework and shifts the focus from the C# language to the broader context in which C# programs execute.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.112.69