Chapter 18

image

Managed and Unmanaged Code Interoperation

There can be no question about the need to provide seamless interoperation between managed and unmanaged code, and I’m not going to waste time discussing this obvious point.

Depending on the kind and the role of the unmanaged code, managed and unmanaged code can interoperate in several scenarios. First, the unmanaged code participating in the interoperation can be either “traditional” code, exposed as a set of functions, or classic COM code, exposed as a set of COM interfaces. Second, the unmanaged code can play the role of either a server, with the managed code initiating the interaction, or a client, with the unmanaged code initiating the interaction. Third, the unmanaged code can reside in a separate executable file, or it can be embedded in the managed module. The embedding option exists only for a “traditional” unmanaged server, and its use is limited to the specifics of the Microsoft Visual C++ compiler implementation.

These three dichotomies result in the classification of the interoperation scenarios shown in Figure 18-1.

9781430267614_Fig18-01.jpg

Figure 18-1. A classification of interoperation scenarios

There are six basic scenarios here: unmanaged code is acting as

  • An external (separate executable file) COM server, implemented through the COM interoperability subsystem of the common language runtime and runtime callable wrappers (RCWs).
  • An external COM client, implemented through the same subsystem and COM callable wrappers (CCWs).
  • An external “traditional” server, implemented through the platform invocation (P/Invoke) subsystem of the runtime.
  • An embedded “traditional” server, implemented through a special case of P/Invoke known as IJW (“it just works”) or local P/Invoke.
  • An external “traditional” client, implemented through the unmanaged export of the managed methods (inverse P/Invoke).
  • An embedded “traditional” client, implemented through IJW (inverse local P/Invoke). In this case, a managed module contains embedded unmanaged native code, and the entry point of the module is unmanaged, so the unmanaged code “takes the initiative” from the start and subsequently calls the managed methods.

Thunks and Wrappers

The interoperation between managed and unmanaged code requires the common language runtime to build special interface elements that provide the target identification and necessary data conversion, or marshaling. These runtime-generated interface elements are referred to as thunks, or stubs, in interoperation with “traditional” unmanaged code; in COM interoperation, they are referred to as wrappers.

For details on COM interoperation, which I describe in the next section rather briefly, please see the excellent and exhaustive book .NET and COM: The Complete Interoperability Guide (Sams, 2002), by Adam Nathan. Adam worked for many years on the CLR team in the COM interoperation area. If you cannot get Adam’s book, try COM and .NET Interoperability (Apress, 2002), by Andrew Troelsen; it is a good book too.

P/Invoke Thunks

In order to build a client thunk for managed code to call unmanaged code, the common language runtime needs the following information:

  • The name of the module exporting the unmanaged method—for example, Kernel32.dll
  • The exported method’s name or ordinal in the export table of this unmanaged module
  • Binary flags reflecting specifics of how the unmanaged method is called and how its parameters are marshaled

All these items constitute the metadata item known as an implementation map, discussed in the following section.

In general cases, the referenced unmanaged module must be located somewhere on the path. However, there is a special case when it’s desirable to consider the unmanaged module as part of the managed assembly and deploy them together. In this case, the unmanaged module resides in the application directory (which doesn’t have to be on the path); the prime module of the assembly must carry a File record associated with this unmanaged module.

The binary flag values and the respective ILAsm keywords are as follows:

  • nomangle (0x0001): The exported method’s name must be matched literally.
  • ansi (0x0002): The method parameters of type string must be marshaled as ANSI zero-terminated strings unless explicitly specified otherwise.
  • unicode (0x0004): The method parameters of type string must be marshaled as Unicode strings.
  • autochar (0x0006): The method parameters of type string must be marshaled as ANSI or Unicode strings, depending on the underlying platform.
  • bestfit:on (0x0010): Allow “best fit” guessing when converting the strings.
  • bestfit:off (0x0020): Disallow “best fit” guessing.
  • lasterr (0x0040): The native method supports the last error querying by the Win32 API GetLastError.
  • winapi (0x0100): The native method uses the calling convention standard for the underlying platform.
  • cdecl (0x0200): The native method uses the C/C++-style calling convention; the call stack is cleaned up by the caller.
  • stdcall (0x0300): The native method uses the standard Win32 API calling convention; the call stack is cleaned up by the callee.
  • thiscall (0x0400): The native method uses the C++ member method (non-vararg) calling convention. The call stack is cleaned up by the callee, and the instance pointer (this) is pushed on the stack last.
  • fastcall (0x0500): The native method uses the fastcall calling convention. This is much like stdcall, but the first two parameters are passed in registers if possible.
  • charmaperror:on (0x1000): Throw an exception when an unmappable character is encountered in a string.
  • charmaperror:off (0x2000): Don’t throw an exception when an unmappable character is encountered.

The flags ansi, unicode, and autochar are mutually exclusive and so are the flags defining the calling convention (cdecl, stdcall, thiscall, and fastcall).

The name of the exported method can be replaced with the method’s ordinal in the unmanaged module’s export table. The ordinal is specified as a decimal number, preceded by the # character—for example, #10.

If the specified name is a regular name rather than an ordinal, it is matched to the entries of the Export Name table of the unmanaged module. If the nomangle flag is set, the name is matched literally. Otherwise, things get more interesting.

Let’s suppose, for example, that the name is specified as Hello. If the strings are marshaled to ANSI and the Export Name table does not contain Hello, the P/Invoke mechanism tries to find HelloA. If the strings are marshaled as Unicode, the P/Invoke mechanism looks for HelloW; only if HelloW (or HelloA) is not found does P/Invoke look for Hello. If it still can’t find a match, it tries the mangled name Hello@N, where N is a decimal representation of the total size of the method’s arguments in bytes. For example, if method Hello has two 4-byte parameters (either integer or floating point), the mangled name would be Hello@8. This kind of function name mangling is characteristic only of the stdcall functions, so if the calling convention is different and the name is mangled in some other way, the P/Invoke mechanism will not find the exported method.

You can see that the “name digging” methods employed by the P/Invoke mechanism are intended for Windows API naming conventions and name mangling schemes of the C/C++ compiler.

The thunk is perceived by the managed code as simply another method, and hence it must be declared as any method would be. The presence of the pinvokeimpl flag in the respective Method record signals the runtime that this method is indeed a client thunk and not a true managed method. You already encountered the following declaration of a P/Invoke thunk in Chapter 1:

.method public static pinvokeimpl("msvcrt.dll" cdecl)
   vararg int32sscanf(string, int8*) cil managed{ }

The parameters within the parentheses of the pinvokeimpl clause represent the implementation map data. The string marshaling flag is not specified, and the marshaling defaults to ANSI. The method name need not be specified because it is the same as the declared thunk name.

If you want to use sscanf but would rather call it Foo (sscanf is such a reptilian name!), you could declare the thunk as follows:

.method public static pinvokeimpl("msvcrt.dll" as"sscanf" cdecl)
   vararg int32Foo(string, int8*) cil managed{ }

The unmanaged method resides somewhere else and the thunk is generated by the runtime, so the Method record of a “true” P/Invoke thunk has its RVA entry set to 0.

Implementation Map Metadata

The implementation map metadata resides in the ImplMap metadata table. A record in this table has four entries:

  • MappingFlags (unsigned 2-byte integer): Binary flags, which were described in the previous section. The validity mask (bits that can be set) is 0x3777.
  • MemberForwarded (coded token of type MemberForwarded): An index to the Method table, identifying the Method record of the P/Invoke thunk. This must be a valid index. The indexed method must have the pinvokeimpl and static flags set. The token of type MemberForwarded can, in principle, index the Field table as well; but the current releases of the common language runtime do not implement the P/Invoke mechanism for fields, and ILAsm syntax does not permit you to specify pinvokeimpl(...) in field definitions.
  • ImportName (offset in the #Strings stream): The name of the unmanaged method as it is defined in the export table of the unmanaged module. The name must be nonempty and fewer than 1,024 bytes long in UTF-8 encoding.
  • ImportScope (RID in the ModuleRef table): The index of the ModuleRef record containing the name of the unmanaged module. It must be a valid RID.

IJW Thunks

IJW thunks, similar in structure and function to “true” P/Invoke thunks, are created without the implementation map information or with an incomplete implementation map. The information regarding the identity of the target unmanaged method is not needed because the method is embedded in the same PE file and can be identified by its RVA. IJW thunks cannot have an RVA value of 0, as opposed to P/Invoke thunks, which must have an RVA value of 0.

The calling convention of the unmanaged method is defined by the thunk signature rather than by the binary flags of the implementation map. The IJW thunk signature usually has the modifier modopt or modreq on the thunk’s return type—for example, modopt([mscorlib]System.Runtime.InteropServices.CallConvCdecl). The string marshaling default is ansi.

If, however, there is a need to specify some implementation flags for an IJW thunk, it may be assigned an incomplete implementation map. Such a map contains zero ImportName entries and either contains zero ImportScope entries or contains ImportScope entries pointing at a no-name ModuleRef. The last case is outright bizarre, but such is life in general in the IJW domain.

To distinguish IJW thunks from P/Invoke thunks, the loader first looks at the implementation flags; IJW thunk declarations should have the flags native and unmanaged set. If the loader doesn’t see these flags, it presumes that this is a “true” P/Invoke thunk and tries to find its implementation map. If the map is not found, or the found map is incomplete, the loader realizes that this is an IJW thunk after all and proceeds accordingly. That’s why I noted that the native and unmanaged flags should be set rather than specified that they must be set. The loader will discover the truth even without these flags, but not before it tries to find the implementation map and analyze it.

The following is a typical example of an IJW thunk declaration; it is a snippet from a disassembly of a VC++-generated mixed-code PE file:

.method public static pinvokeimpl(/* No map */)
   unsigned int32  _mainCRTStartup() native unmanaged preservesig
{
   .entrypoint
   .custom instance void [mscorlib]
      System.Security.SuppressUnmanagedCodeSecurityAttribute:: .ctor()
      = ( 01 00 00 00 )
   // Embedded native code
   // Disassembly of native methods is not supported
   // Managed TargetRVA = 0x106f
}  // End of global method _mainCRTStartup

As you can see, a thunk can be declared as an entry point, and custom attributes and security attributes can be assigned to it. In these respects, a thunk has the same privileges as any other method.

As you can also see, neither the IL disassembler nor ILAsm can handle the embedded native code. The mixed-code PE files, employing the IJW interoperation, cannot be round-tripped (disassembled and reassembled).

COM Callable Wrappers

Classic COM objects are allocated from the standard operating system heap and contain internal reference counters. The COM objects must self-destruct when they are not referenced anymore—in other words, when their reference counters reach 0.

Managed objects are allocated from the common language runtime internal heap, which is controlled by the garbage collection subsystem (the GC heap). Managed objects don’t have internal reference counters. Instead, the runtime traces all the object references, and the GC automatically destroys unreferenced objects. But the references can be traced only if the objects are being referenced by managed code. Hence, it would be a bad idea to allow unmanaged COM clients to access managed objects directly.

Instead, for each managed object, the runtime creates a COM callable wrapper, which serves as a proxy for the object. A CCW is allocated outside the GC heap and is not subject to the GC mechanism, so it can be referenced from unmanaged code without causing any ill effects.

In addition to the lifetime control of the managed object, a CCW provides data marshaling for method calls and handles managed exceptions, converting them to HRESULT returns, which is standard for COM. If, however, a managed method is designed to return HRESULT (in the form of unsigned int32) rather than throw exceptions, it must have the implementation flag preservesig set. In this case, the method signature is exported exactly as defined.

The runtime carefully maintains a one-to-one relationship between a managed object and its CCW in any given application domain, not allowing an alternative CCW to be created. This guarantees that all interfaces of the same object relate to the same IUnknown and that the interface queries are consistent.

Any CCW generated by the runtime implements IDispatch for late binding. For early binding, which is done directly through the native v-table, the runtime must generate the type information in a form consumable by COM clients—namely, in the form of a COM type library. The Microsoft .NET Framework SDK includes the type library exporting the utility TlbExp.exe, which generates an accompanying COM type library for any specified assembly. Another tool, RegAsm.exe, also included in the .NET Framework SDK, registers the types exposed by an assembly as COM classes and generates the type library.

When managed classes and their members are exposed to COM, their exposed names might differ from the originals. First, the type library exporters consider all names that differ only in case to be the same—for example, Hello, hello, HELLO, and hElLo are exported as Hello. Second, classes are exported by name only, without the namespace part, except in the case of a name collision. If a collision exists—if, for example, an assembly has classes A.B.IHello and C.D.IHello defined—the classes are exported by their full names, with underscores replacing the dots: A_B_IHello, C_D_IHello.

Other COM parameters characterizing the CCW for each class are defined by the COM interoperability custom attributes, listed in Chapter 16. All information pertinent to exposing managed classes as COM servers is defined through custom attributes, so ILAsm does not have or need any linguistic constructs specific to this aspect of the interoperation.

Runtime Callable Wrappers

A runtime callable wrapper is created by the common language runtime as a proxy of a classic COM object that the managed code wants to consume. The reasons for creating an RCW are roughly the same as those for creating a CCW: the managed objects know nothing about reference counting and expect their counterparts to belong to the GC heap. An RCW is allocated from the GC heap and caches the reference-counted interface pointers to a single COM object. In short, from the runtime point of view, an RCW is a “normal” managed server; and from the COM point of view, an RCW is a “normal” COM client. So everyone is happy.

An RCW is created when a COM-exposed managed object is instantiated—for example, by a newobj instruction. There are two approaches to binding to the COM classes: early binding, which requires a so-called interop assembly, and late binding by name, which is performed through Reflection methods.

An interop assembly is a managed assembly either produced from a COM type library by means of running the utility TlbImp.exe (included in the .NET Framework SDK) or, at runtime, produced by calling methods of the class [mscorlib]System.Runtime.InteropServices.TypeLibConverter. From the point of view of the managed code, the interop assembly is simply another assembly, all classes of which happen to carry the import flag. This flag is the signal for the runtime to instantiate an RCW every time it is commanded to instantiate such a class.

Late binding through Reflection works in much the same way as IDispatch does, but it has nothing to do with the interface itself. The COM classes that implement IDispatch can be early-bound as well. And late binding isn’t restricted to imported classes only. “Normal” managed types can also be late-bound by using the same mechanism.

Instantiating a late-bound COM object is achieved by consecutive calls to the [mscorlib]System.Type::GetTypeFromProgID and [mscorlib]System.Activator::CreateInstance methods, followed when necessary by calls to the [mscorlib]System.Type::InvokeMember method. For example, if you want to instantiate a COM class Bar residing in the COM library Foo.dll and then call its Baz method, which takes no arguments and returns an integer, you could write the following code:

...
.locals init(class[mscorlib]System.Type Typ,
              object Obj,
              int32Ret)
// Typ = Type::GetTypeFromProgID("Foo.Bar");
ldstr"Foo.Bar"
call class[mscorlib]System.Type
     [mscorlib]System.Type::GetTypeFromProgID(string)
stloc Typ
 
// Obj = Activator::CreateInstance(Typ);
ldloc Typ
call instance object[mscorlib]System.Activator::CreateInstance(
     class[mscorlib]System.Type)
stloc Obj
...
// Ret = (int)Typ->InvokeMember("Baz",BindingFlags::InvokeMethod,
//                              NULL,Obj,NULL);
ldloc Typ
ldstr"Baz"
ldc.i40x100  //  System.Reflection.BindingFlags::InvokeMethod
ldnull        // Reflection.Binder – don't need it
ldloc Obj
ldnull        // Parameter array – don't need it
call instance object[mscorlib]System.Type::InvokeMember(string,
              valuetype[mscorlib]System.Reflection.BindingFlags,
              class[mscorlib]System.Reflection.Binder,
              object,
              object[])
unbox valuetype[mscorlib]System.Int32
stloc Ret
...

An RCW converts the HRESULT returns of COM methods to managed exceptions. The only problem with this is that the RCW throws exceptions only for failing HRESULT values, so subtleties such as S_FALSE go unnoticed. The only way to deal with this situation is to set the implementation flag preservesig on the methods that might return S_FALSE and forgo the automated HRESULT to exception transformation.

Another problem arises when the COM method has a variable-length array as one parameter and the array length as another. The type library carries no information about which parameter is the length, and the runtime is thus unable to marshal the array correctly. In this case, the signature of the method must be modified to include explicit marshaling information.

Yet another problem requiring manual intervention involves unions with overlapped reference types. Perfectly legal in the unmanaged world, such unions are outlawed in managed code. Therefore, these unions are converted into value types with .pack and .size parameters specified but without the member fields.

The manual intervention mentioned usually involves disassembling the interop assembly, editing the text, and reassembling it. Since the interop assemblies don’t contain embedded native code, this operation can easily be performed.

Data Marshaling

All thunks and wrappers provide data conversions between managed and unmanaged data types, which is referred to as marshaling. Marshaling information is kept in the FieldMarshal metadata table, which is described in Chapter 9. The marshaling information can be associated with Field and Param metadata records.

Blittable Types

One significant subset of managed data types directly corresponds to unmanaged types, requiring no data conversion across managed and unmanaged code boundaries. These types, which are referred to as blittable, include pointers (not references), function pointers, signed and unsigned integer types, and floating-point types. Formatted value types (the value types having sequential or explicit class layout) that contain only blittable elements are also blittable.

The nonblittable managed data types that might require conversion during marshaling because of different or ambiguous unmanaged representation are as follows:

  • bool (1-byte, true = 1, false = 0) can be converted either to native type bool (4-byte, true = 1, false = 0) or to variant bool (2-byte, true = 0xFFFF, false = 0).
  • char (Unicode character, unsigned 2-byte integer) can be converted either to int8 (an ANSI character) or to unsigned int16 (a Unicode character).
  • string (class System.String) can be converted either to an ANSI or a Unicode zero-terminated string (an array of characters) or to bstr (a Unicode Visual Basic–style string).
  • object (class System.Object) can be converted either to a structure or to a COM interface (CCW/RCW) pointer.
  • class can be converted either to an COM interface pointer or, if the class is a delegate, to a function pointer.
  • valuetype (nonblittable) is converted to a structure with a fixed layout.
  • An array and a vector can be converted to a safe array or a C-style array.

The references (managed pointers) are marshaled as unmanaged pointers. The managed objects and interfaces are references in principle, so they are marshaled as unmanaged pointers as well. Consequently, references to the objects and interfaces (class IFoo&) are marshaled as double pointers (IFoo**). All object references passed to the unmanaged code must be pinned; otherwise, the GC subsystem might move them during the call to an unmanaged method.

In/Out Parameters

The method parameter flags in and out can be (but are not necessarily) taken into account by the marshaler. When that happens, the marshaler can optimize the process by abandoning the marshaling in one direction. By default, parameters passed by reference (including references to objects but excluding the objects) are presumed to be in/out parameters, whereas parameters passed by value (including the objects, even though managed objects are in principle references) are presumed to be in parameters. The exceptions to this rule are the [mscorlib]System.Text.StringBuilder class, which is always marshaled as in/out, and the classes and arrays containing the blittable types that can be pinned, which, if the in and out flags are explicitly specified, can be two-way marshaled even when passed by value. The StringBuilder class is used to represent a mutable string in the unmanaged world, that is, a string that might be changed within the unmanaged method (in C/C++ notation, char* as opposed to const char*); that’s why StringBuilder is always marshaled as in/out.

Considering that managed objects don’t necessarily stay in one place and can be moved any time the garbage collector does its job, it is vital to ensure that the arguments of an unmanaged call don’t wander around while the call is in progress. This can be accomplished in the following two ways:

  • Pin the object for the duration of the call (see section “Modifiers” in Chapter 8), preventing the garbage collector from moving it. This is done for the instances of formatted, blittable classes that have fixed layout in memory, invariant to managed or unmanaged code.
  • Allocate some unmovable memory, that is, a block of memory outside of the GC heap. If the parameter has an in flag, marshal the data from the argument to this unmovable memory. Call the method, passing this memory as the argument. If the parameter has an out flag, marshal this memory back to the original argument upon completion of the call.

Chapter 10 describes the ILAsm syntax for the explicit marshaling definition of method parameters. Chapter 8 discusses the native types used in explicit marshaling definitions. Rather than reviewing that information here, I’ll discuss some interesting marshaling cases instead.

String Marshaling

String marshaling is defined in at least three places: in a string conversion flag of a TypeDef (ansi, unicode, or autochar), in a similar flag of a P/Invoke implementation map, and, explicitly, in marshal(...) clauses—for all parameters of all methods of a given class, for all parameters of a given method, and for one concrete parameter, respectively. Lower-level specifications override the higher-level specifications.

As method arguments, managed strings (instances of the System.String class) can be marshaled as the following native types:

  • lpstr, a pointer to a zero-terminated ANSI string
  • lpwstr, a pointer to a zero-terminated Unicode string
  • lptstr, a pointer to a zero-terminated ANSI or Unicode string, depending on the platform
  • bstr, a Unicode Visual Basic–style string with a prepended length
  • ansi bstr, an ANSI Visual Basic–style string with a prepended length
  • tbstr, an ANSI or Unicode Visual Basic–style string, depending on the platform

The COM wrappers marshal the string arguments as lpstr, lpwstr, or bstr only. Other unmanaged string types are not COM compatible.

At times, a string buffer must be passed to an unmanaged method in order to be filled with some particular contents. Passing a string by value does not work in this case because the called method cannot modify the string contents even if the string is passed as an in/out parameter (in the managed world, strings are immutable—once a string object is created, it cannot be changed). Passing the string by reference does not initialize the buffer to the required length. The solution, then, is to pass not a string (an instance of System.String) but rather an instance of System.Text.StringBuilder, initialized to the required length:

.typedef[mscorlib]System.Text.StringBuilder as StrB
.method public static pinvokeimpl("user32.dll" stdcall)
   int32GetWindowText(int32hndl,
                       class StrB s,// Default marshaling: ANSI
                       int32nMaxLen) { }
.method public static string GetWText(int32hndl)
{
   .locals init(class StrB sb )
   ldc.i41024// Buffer size
   newobj instance void StrB:: .ctor(int32)
   stloc.0
   ldarg.0   // Load hndl on stack
   ldloc.0   // Load StringBuilder instance on stack
   ldc.i41024// Buffer size again
   call int32GetWindowText(int32,
              class StrB,
              int32)
   pop      // Discard the return of GetWindowText
   ldloc.0  //  Load StringBuilder instance (filled in) on stack
   call instance string StrB::ToString()
            // Resulting string has length less than 1024
   ret
}

The string fields of the value types are marshaled as lpstr, lpwstr, lptstr, bstr, or fixed sysstring[<size>], which is a fixed-length array of ANSI or Unicode characters, depending on the string conversion flag of the field’s parent TypeDef and on the marshaling specification of the fields (if specified).

Object Marshaling

By “object” in this section I mean an instance of a reference type held in a parameter or a field or a function return of type System.Object. Objects (these instances of reference types, which in fact are cast to System.Object) are marshaled as struct (converted to a COM-style variant), interface (converted to IDispatch if possible and otherwise to IUnknown), iunknown (converted to IUnknown), or idispatch (converted to IDispatch). The default marshaling is as struct.

When an object is marshaled as struct to a COM variant, the type of the variant can be identified in three ways. First of all, the object being marshaled may or may not belong to the elite group of system objects, listed in Table 18-1 (all listed types belong to the System namespace). If the object does not belong to this high society, this object still may implement the [mscorlib]System.IConvertible interface. If it does, the marshaler calls its GetTypeCode() method, which returns the variant type. And if the object neither belongs nor implements, it is officially declared out of luck, and the variant type is set to VT_UNKNOWN.

Table 18-1. Marshaling of Managed Objects to and from COM Variants

Type of Object Marshaled To . . .

. . . COM Variant Type...

. . . Marshaled to Type of Object

Null reference

VT_EMPTY

Null reference

DBNull

VT_NULL

DBNull

Runtime.InteropServices.ErrorWrapper

VT_ERROR

UInt32

Reflection.Missing

VT_ERROR with E_PARAMNOTFOUND

UInt32

Runtime.InteropServices.DispatchWrapper

VT_DISPATCH

___ComObject or null reference if the variant value is null

Runtime.InteropServices.UnknownWrapper

VT_UNKNOWN

___ComObject or null reference if the variant value is null

Runtime.InteropServices.CurrencyWrapper

VT_CY

Decimal

Boolean

VT_BOOL

Boolean

Sbyte

VT_I1

Sbyte

Byte

VT_UI1

Byte

Int16

VT_I2

Int16

UInt16

VT_UI2

UInt16

Int32

VT_I4

Int32

UInt32

VT_UI4

UInt32

Int64

VT_I8

Int64

UInt64

VT_UI8

UInt64

Single

VT_R4

Single

Double

VT_R8

Double

Decimal

VT_DECIMAL

Decimal

DateTime

VT_DATE

DateTime

String

VT_BSTR

String

IntPtr

VT_INT

Int32

UintPtr

VT_UINT

UInt32

Array

VT_ARRAY

Array

If you wonder why, for example, System.Int16 and System.Boolean should be used instead of int16 and bool, respectively, I should remind you that our discussion concerns the conversion of the objects.

When a managed object is passed to unmanaged code by reference, the marshaler creates a new variant and copies the contents of the object reference into this variant. The unmanaged code is free to tinker with the variant contents, and these changes are propagated back to the referenced object when the method call is completed. If the type of the variant has been changed within the unmanaged code, the back propagation of the changes can result in a change of the object type, so you might find yourself with a different type of object after the call. The same story happens (in reverse order) when unmanaged code calls a managed method, passing a variant by reference: the type of the variant can be changed during the call.

The variant can contain a pointer to its value rather than the value itself. (In this case, the variant has its type flag VT_BYREF set.) Such a “reference variant,” passed to the managed code by value, is marshaled to a managed object, and the marshaler automatically dereferences the variant contents and retrieves the actual value. Despite its reference type, the variant is nonetheless passed by value, so any changes made to the object in the managed code are not propagated back to the original variant.

If a “reference variant” is passed to the managed code by reference, it is marshaled to an object reference, with the marshaler dereferencing the variant contents and copying the value into a newly constructed managed object. But in this case, the changes made in the managed code are propagated back to the unmanaged code only if they did not lead to a change in the variant type. If the changes did affect the variant type, the marshaler throws an InvalidCast exception.

More Object Marshaling

Objects are always marshaled by COM wrappers as COM interfaces. Every managed class can be seen as implementing an implicit interface that contains all nonprivate members of the class.

When a type library is generated from an assembly, a class interface and a coclass are produced for each accessible managed class. The class interface is marked as a default interface for the coclass.

A CCW generated by the common language runtime for each instance of the exposed managed class also implements other interfaces not explicitly implemented by the class. In particular, a CCW automatically implements IUnknown and IDispatch.

When an interop assembly is generated from a type library, the coclasses of the type library are converted to the managed classes. The member sets of these classes are defined by the default interfaces of the coclasses.

An RCW generated by the runtime for a specific instance of a COM class represents this instance and not a specific interface exposed by this instance. Hence, an RCW must implement all interfaces exposed by the COM object. This means that the identity of the COM object itself must be determined by one of its interfaces because COM objects are not passed as method arguments, but their interfaces are. In order to do this, the runtime queries the passed interface for IProvideClassInfo2. If this interface is unavailable, the runtime queries the passed interface for IProvideClassInfo. If either of the interfaces is available, the runtime obtains the class identifier (CLSID) of the COM class exposing the interface—by calling the IProvideClassInfo2::GetGUID() or IProvideClassInfo::GetClassInfo() method—and uses it to retrieve full information about the COM class from the registry. If this action sequence fails, the runtime instantiates a generic wrapper, System.ComObject.

Array Marshaling

Unmanaged arrays can be either COM-style safe arrays or C-style arrays of fixed or variable length. Both kinds of arrays are marshaled to managed vectors, with the unmanaged element type of the array marshaled to the respective managed element type of the vector. For example, a safe array of BSTR is marshaled to string[].

The rank and bound information carried by a safe array is lost in the transition. If this information is vital for correct interfacing, manual intervention is required again: the interop assembly produced from the COM type library must be disassembled, the array definitions must be manually edited, and the assembly must be reassembled. For example, if a three-dimensional safe array of BSTR is marshaled as string[], the respective type must be manually edited to string[0...,0...,0...] in order to restore the rank of the array.

C-style arrays can have a fixed length or a length specified by another parameter of the method or a combination thereof, the total length being a sum of fixed (base) length and the value of the length parameter. Both values, the base length and the length parameter’s zero-based ordinal, can be specified for the marshaler so that a vector of appropriate size can be allocated. Chapter 8 describes the ILAsm syntax for specifying the array length. For example,

// Array length is fixed (128)
.method public static pinvokeimpl("unmanaged.dll" stdcall)
   void Foo(string[] marshal(bstr[128]) StrArray) {}
 
// Array length is specified by arrLen (parameter #1)
.method public static pinvokeimpl("unmanaged.dll" stdcall)
   void Boo(string[] marshal(bstr[+1]) StrArray, int32arrLen) {}
 
//  Base length is 128, additional length specified by moreLen
.method public static pinvokeimpl("unmanaged.dll" stdcall)
   void Goo(int32moreLen, string[] marshal(bstr[128+0]) StrArray) {}

Managed vectors and arrays can be marshaled to unmanaged code as safe arrays or as C-style arrays. Marshaling as safe arrays preserves the rank and boundary information of the managed arrays. This information is lost when the managed arrays are marshaled as C-style arrays. Vectors of vectors—for example, int32[][]—cannot be marshaled.

Delegate Marshaling

Delegates are marshaled as interfaces by COM wrappers and as unmanaged function pointers by P/Invoke thunks. The type library Mscorlib.tlb defines the Delegate interface, which represents delegates in the COM world. This interface exposes the DynamicInvoke method, which allows the COM code to call a delegated managed method.

Marshaling a delegate as an unmanaged function pointer represents a certain risk. The unmanaged code may cache the received callback pointer “for future use.” Such a reference cached on the unmanaged side does not count as a live reference to the delegate, so the garbage collector may destroy the delegate before the unmanaged side is done using it as a callback. The calling managed code must take steps to ensure the delegate’s survival until interaction with the unmanaged code is complete, such as by storing the delegate reference in a field or in a pinned local variable.

Interoperation with Windows Runtime

Windows 8 OS introduced new programming platform: Windows Runtime, a.k.a. WinRT (not Windows RT!). It is an application architecture for TIFKAM (The Interface Formerly Known As Metro) apps running in sandboxed environment, supporting both x86 and ARM processor architectures. You know, the apps from Windows Store. Windows Phone 8 OS has its own implementation of WinRT: Windows Phone Runtime.

WinRT itself is unmanaged and COM-based, and all interactions of its parts, as well as interactions between the apps and WinRT, are conducted via COM objects rather than via good old Win32 API.

As WinRT is COM-based and hence is object-oriented, it borrows a page from .NET book; it uses .NET-style metadata to describe the types and their attributes present in a WinRT component or app. The format of WinRT metadata is exactly the same as that of .NET metadata, and WinRT metadata files have the same exact structure as .NET assemblies.

Unmanaged components, including WinRT’s own libraries, receive additional .winmd files (in PE format, just like .exe or .dll files) carrying their metadata and no code; all methods in this metadata have the runtime implementation flag set. In fact, .winmd files of unmanaged components are equivalent to interop assemblies produced by the TlbImp tool mentioned above.

Managed components are compiled into single .dll or .winmd files carrying both metadata and the component’s own code. In this regard, these assemblies are similar to .NET assemblies invoking COM objects from external COM libraries.

Having the same-format metadata, CLR and WinRT interoperate flawlessly. However, same format doesn’t mean same content. WinRT and .NET have different type systems (for example, for WinRT, ELEMENT_TYPE_OBJECT in a signature means interface IInspectable rather than System.Object, and ELEMENT_TYPE_STRING means HSTRING handle rather than System.String), so CLR v4.5 introduced an additional component called “Metadata Adapter” to convert (or “project”) WinRT types to CLR types.

Shawn Farkas, a CLR veteran, published a very nice article in MSDN Magazine describing the type transformations between WinRT and CLR (http://msdn.microsoft.com/en-us/magazine/jj651569.aspx), I recommend you to read it.

I should note, however, that, while WinRT is COM-based and interacts with managed code through the same CCW/RCW mechanism, the metadata of WinRT classes differs from the metadata of “classic” imported COM classes, and the marshaling is done automatically without explicit declarations. It is possible because all types exposed by WinRT are known, so CLR knows how to deal with them. (On the other hand, “classic” COM interoperation is more flexible exactly because you can define marshaling the way you like.)

WinRT assemblies (with or without code, doesn’t matter) have the windowsruntime flag set on their Assembly record. When a WinRT assembly is referenced, the respective AssemblyRef record carries the same flag as well.

WinRT classes and interfaces that need type transformation carry the flag windowsruntime. Only a fraction of the TypeDefs in a WinRT assembly carry this flag. None of the TypeDefs in a WinRT assembly carry import flag characteristics to COM wrappers.

Classes and methods of WinRT assemblies have no explicit marshaling information associated with them.

The key difference between .NET metadata and WinRT metadata is the version string in general metadata header (I spoke about it in Chapter 5, but I can’t trust you to remember it still). In .NET metadata, this string starts with “v” followed by a three-component version number, for example, “v4.0.30319”. In WinRT metadata the version string starts with “WindowsRuntime”, for example, “WindowsRuntime 1.2;Native code 1.2”. This is how CLR distinguishes WinRT metadata from “normal” .NET metadata. So if you decide to round-trip a .winmd assembly (disassemble it and reassemble again), don’t forget to use the /MDV option of IL Assembler to set the version string; otherwise you’ll get a “normal” .NET assembly with a weird file extension. Don’t forget that you will need the v4.5 assembler and disassembler for this.

Providing Managed Methods as Callbacks for Unmanaged Code

In a P/Invoke interaction, the initiative must come from the managed code’s side. The process starts in managed mode and makes calls to the unmanaged functions. However, the exchange can’t always go in only one direction; that model would be too simplistic to be usable.

Many unmanaged methods require callback functions, and the managed code must have the means to provide those functions. Thus, it’s necessary to have a way to pass a managed method pointer to an unmanaged function, permitting the unmanaged function to call the managed method. The managed callback method might be simply a P/Invoke thunk of another unmanaged method, but that changes nothing—it’s still a managed method.

The way to pass managed methods as callbacks to unmanaged functions involves the use of delegates. The delegates are marshaled by P/Invoke thunks as unmanaged function pointers, which makes them suitable for the task.

Let’s look at a sample to review the way delegates are used for callback specifications. You can find this sample, Callback.il, on the Apress web site. The sample implements a simple program that sorts 15 integer values in ascending order, employing the well-known C function qsort, called through P/Invoke. The difference between the P/Invoke calls you’ve encountered so far and this one is that qsort requires a callback function, which compares the two elements of the array being sorted, thus defining the sorting order.

I’ll let the sample speak for itself:

// I can't pass the managed method pointer to the unmanaged function,
// and even the ldftn instruction will not help me.
// This delegate will serve as an appropriate vehicle.
.class public sealed CompareDelegate
       extends[mscorlib]System.MulticastDelegate
{
   .method public specialname
           void  .ctor(object Object,
                                native uint MethodPtr)
                                 runtime{}
 
   // Note the modopt modifier of the Invoke signature -- it's very
   // important. Without it, the calling convention of the callback
   // function is marshaled as stdcall (callee cleans the stack).
   // But qsort expects the callback function to have the cdecl
   // calling convention (caller clears the stack). If we supply the
   // callback with the stdcall calling convention, qsort blows
   // the stack away and causes a memory access violation. You are
   // welcome to comment out the modopt line and see what happens.
   // Note also that the modopt modifier is placed on the delegate's
   // Invoke signature, not on the signature of the delegated method.
   .method public virtual int32
      modopt([mscorlib]System.Runtime.CompilerServices.CallConvCdecl)
             Invoke(void*, void*) runtime{}
 
   // Well, I don't really need asynchronous invocation here,
   // but, you know, dura lex sed lex.
   .method public newslot virtual
           class[mscorlib]System.IAsyncResult
              BeginInvoke(object,
                          class[mscorlib]System.AsyncCallback,
                          object) runtime{}
 
   .method public newslot virtual
           void  EndInvoke(class[mscorlib]System.IAsyncResult)
                 runtime{}
}
 
// The hero of the occasion: the qsort function.
.method public static pinvokeimpl("msvcrt.dll" ansi cdecl)
   void qsort(void*, int32, int32, class CompareDelegate) preservesig{}
 
// This is the comparison method I'm going to offer as
// a callback to qsort. What can be simpler than comparing
// two integers?
.method public static int32compInt32(void* arg1, void* arg2)
{
   // return(*arg1 - *arg2);
   ldarg.0
   ldind.i4
   ldarg.1
   ldind.i4
   sub
   ret
}
 
// And now, let's get this show on the road.
.method public static void Exec()
{
   .entrypoint
   .locals init(class CompareDelegate)
 
   // Print the unsorted values.
   ldstr"Before Sorting: "
   call vararg int32printf(string)
   pop
   ldsflda valuetype SixtyBytes DataToSort
   ldc.i4.s15
   call void printInt32(void*, int32)
 
   // Create the delegate.
   // Null object ref indicates the global method.
   ldnull
   ldftn int32compInt32(void*, void*)
   newobj instance void
      CompareDelegate:: .ctor(object, native uint)
   stloc.0
 
   // Invoke qsort.
   ldsflda valuetype SixtyBytes DataToSort// Pointer to data
   ldc.i4.s 15// Number of items to sort
   ldc.i4.4    // Size of an individual item
   ldloc.0     // Callback function pointer (delegate)
   call void qsort(void*, int32, int32, class CompareDelegate)
 
   // Print the sorted values.
   ldstr"After Sorting: "
   call vararg int32printf(string)
   pop
   ldsflda valuetype SixtyBytes DataToSort
   ldc.i4.s15
   call void printInt32(void*, int32)
 
   ret
}

Managed Methods as Unmanaged Exports

Exposing managed methods as unmanaged exports provides a way for unmanaged, non-COM clients to consume managed services. In fact, this technique opens the managed world in all its glory—with its secure and type-safe computing and with all the wealth of its class libraries—to unmanaged clients.

Of course, the managed methods are not exposed as such. Instead, inverse P/Invoke thunks, automatically created by the common language runtime, are exported. These thunks provide the same marshaling functions as “conventional” P/Invoke thunks, but in the opposite direction.

In order to expose managed methods as unmanaged exports, the IL assembler builds a v-table, a v-table fixup (VTableFixup) table, and a group of unmanaged export tables, which include the Export Address table, the Name Pointer table, the Ordinal table, the Export Name table, and the Export Directory table. Chapter 4 discusses all of these tables, their structures, and their positioning within a managed PE file. Now let’s see how it all is done.

The VTableFixup table is an array of VTableFixup descriptors, with each descriptor carrying the RVA of a v-table entry, the number of slots in the entry, and the binary flags indicating the size of each slot (32-bit or 64-bit) and any special features of the entry. One special feature is the creation of the marshaling thunk to be exposed to the unmanaged client.

The v-table and the VTableFixup table of a managed module serve two purposes. One purpose—relevant only to the VC++ compiler, the only compiler that produces mixed-code modules—is to provide the intramodule managed/unmanaged code interoperation. Another purpose is to provide the means for the unmanaged export of managed methods.

Each slot of a v-table in a PE file carries the token of the managed method the slot represents. At runtime, after the respective methods have been compiled to native code, the v-table fixups are executed, replacing the method tokens with actual addresses of the compiled methods.

The ILAsm syntax for a v-table fixup definition is

.vtfixup[<num_slots>] <flags> at<data_label>

where square brackets are part of the definition and do not mean that <num_slots> is optional. <num_slots> is an integer constant, indicating the number of v-table slots grouped into one entry because their flags are identical. This grouping has no effect other than saving some space—you can emit a single slot per entry, but then you’ll have to emit as many v-table fixups as there are slots.

The flags specified in the definition can be those that are described in the following list:

  • int32: Each slot in this v-table entry is 4 bytes wide (32-bit target platform).
  • int64: Each slot in this v-table entry is 8 bytes wide (64-bit target platform). The int32 and int64 flags are mutually exclusive.
  • fromunmanaged: The entry is to be called from the unmanaged code, so the marshaling thunk must be created by the runtime.
  • callmostderived: This flag is not currently used.

The order of appearance of .vtfixup declarations defines the order of the respective VTableFixup descriptors in the VTableFixup table.

The v-table entries are defined simply as data entries. Note that the v-table must be contiguous—in other words, the data definitions for the v-table entries must immediately follow one another.

For example,

...
.vtfixup[1] int32 fromunmanaged at VT_01
...
.vtfixup[1] int32 at VT_02
...
.data VT_01 = int32(0x0600001A)
.data VT_02 = int32(0x0600001B)
...

The actual data representing the method tokens is automatically generated by the IL assembler and placed in designated v-table slots. To achieve that, it is necessary to indicate which method is represented by which v-table slot. ILAsm provides the .vtentry directive for this purpose, the syntax of which is

.vtentry<entry_number> : <slot_number>

where <entry_number> and <slot_number> are 1-based integer constants. The .vtentry directive is placed within the respective method’s scope, as shown in the following code:

...
.vtfixup[1] int32 fromunmanaged at VT_01
...
.method public static void Foo()
{
   .vtentry1:1// Entry 1, slot 1
   ...
}
...
.data VT_01 = int32(0)// The slotwill be filled automatically.
...

Export Table Group

The export table group (in managed and unmanaged modules) consists of five tables:

  • The Export Address table (EAT), containing the RVA of the exported unmanaged functions.
  • The Export Name table (ENT), containing the names of the exported functions.
  • The Name Pointer table (NPT) and the Ordinal table (OT), together forming a lookup table that rearranges the exported functions in lexical order of their names. In special cases when an unmanaged module exports its methods exclusively by ordinal, ENT, NPT, and OT may be missing. Managed modules always export their methods by name.
  • The Export Directory table, containing the location and size information about the other four tables.

Location and size information concerning the Export Directory table itself resides in the first of 16 data directories in the PE header. Figure 18-2 shows the structure of the export table group.

9781430267614_Fig18-02.jpg

Figure 18-2. The structure of the export table group

In an unmanaged PE file, the EAT contains the RVA of the exported unmanaged methods. In a managed PE file, the picture is more complicated. The EAT cannot contain the RVA of the managed methods because it’s not the managed methods that are exported—rather, it’s their marshaling thunks, generated at runtime.

The only way to address a yet-to-be-created thunk is to define a slot in a v-table entry for the exported managed method and a VTableFixup descriptor for this entry, carrying the fromunmanaged flag. In this case, the contents of the v-table slot (a token of the exported method) are replaced at runtime with the address of the marshaling thunk. (If the fromunmanaged flag is not specified, the thunk is not created, and the method token is replaced with this method’s address; but this is outside the scenario being discussed.)

For each exported method, the IL assembler creates a tiny native stub—yes, you’ve caught me: the IL assembler does produce embedded native code after all—consisting of the x86 command jump indirect (0x25FF) followed by the RVA of the v-table slot allocated for the exported method. The native stubs produced by version 2.0 or later of the IL assembler for X64 or Itanium targets look, of course, different but are functionally similar: they execute an indirect jump. The EAT contains the RVA of these tiny stubs.

The generation of the jump stubs renders the module strictly platform-specific, but you already made your module platform-specific when you chose the width of the v-table slots (4 or 8 bytes).

The tiny stubs are necessary because the EAT must contain solid addresses of the exported methods as soon as the operating system loads the PE file. Otherwise, the unmanaged client won’t be able to match the entries of its Import Address table (IAT) to the entries of the managed module’s EAT. The addresses of the methods or their thunks don’t exist at the moment the file is loaded. But the tiny stubs exist and have solid addresses. It’s true that at that moment they cannot perform any meaningful jumps because the v-table slots they are referencing contain method tokens instead of addresses. But by the time the stubs are called, the methods and thunks will have been generated and the v-table slots will be fixed up, with the method tokens replaced with thunk addresses. Figure 18-3 illustrates this scenario.

9781430267614_Fig18-03.jpg

Figure 18-3. Indirect referencing of v-table entries from the EAT

The unmanaged exports require that relocation fixups are executed at the module load time. When a program runs under the Microsoft Windows XP operating system or later, this requirement can create a problem similar to those encountered with TLS data and data on data. As described in Chapter 4, if the common language runtime header flag COMIMAGE_FLAGS_ILONLY is set, the OS loader of ignores the .reloc section, and the fixups are not executed. To avoid this, the IL assembler automatically replaces the COMIMAGE_FLAGS_ILONLY flag with COMIMAGE_FLAGS_32BITREQUIRED whenever the source code specifies TLS data or data on data. Unfortunately, versions 1.0 and 1.1 of the assembler neglected to do this automatically when unmanaged exports were specified in the source code, and it was thus necessary to explicitly set the runtime header flags using the directive .corflags 0x00000002. Versions 2.0 and later of the assembler are free of this deficiency; they automatically remove the ILONLY flag and then, if the target architecture is x86, set the 32BITREQUIRED flag.

The ILAsm syntax for declaring a method as an unmanaged export is very simple.

.export [<ordinal> ] as<export_name>

<ordinal> is an integer constant. The <export_name> provides an alias for the exported method. In versions 1.0 and 1.1 of ILAsm, it was necessary to specify <export name> even if the method is exported under its own name. In version 2.0, it is not necessary.

The .export directive is placed within the scope of the respective method together with the .vtentry directive, as shown in this example:

...
.corflags0x00000002
...
.vtfixup[1] int32 fromunmanaged at VT_01
...
.method public static void Foo()
{
   .vtentry1:1     // Entry 1, slot 1
   .export[1] as Bar// Export #1, Name="Bar"
   ...
}
...
.data VT_01 = int32(0)// The slot will be filled automatically.
...

The source code for the small sample described earlier in Figure 18-2 could look like the following, which was taken from the sample file YDD.il on the Apress web site:

.assembly extern mscorlib { auto}
.assembly YDD { }
.module YDD.dll
.corflags0x00000002
.vtfixup[1] int32 fromunmanaged at VT_01// First v-table fixup
.vtfixup[1] int32 fromunmanaged at VT_02// Second v-table fixup
.vtfixup[1] int32 fromunmanaged at VT_03// Third v-table fixup
.data VT_01 = int32(0)         // First v-table entry
.data VT_02 = int32(0)         // Second v-table entry
.data VT_03 = int32(0)         // Third v-table entry
.method public static void Yabba()
{
   .vtentry1:1
   .export[1]
   ldstr"Yabba"
   call void[mscorlib]System.Console::WriteLine(string)
   ret
}
.method public static void Dabba()
{
   .vtentry2:1
   .export[2]
   ldstr"Dabba"
   call void[mscorlib]System.Console::WriteLine(string)
   ret
}
.method public static void Doo()
{
   .vtentry3:1
   .export[3]
   ldstr"Doo!"
   call void[mscorlib]System.Console::WriteLine(string)
   ret
}

Now you can compile the sample to a managed DLL, remembering to use the /DLL command-line option of the IL assembler, and then write a small unmanaged program that calls the methods from this DLL. This unmanaged program can be built with any unmanaged compiler—for example, Microsoft Visual C++ 6, if you can find one—but don’t forget that YDD.dll cannot run unless the .NET Framework is installed. It’s still a managed assembly, even if your unmanaged program does not know about it.

As you’ve probably noticed, all .vtfixup directives of the sample sport identical flags. This means that three single-slot v-table entries can be grouped into one three-slot entry:

.vtfixup[3] int32 fromunmanaged at VT_01
.data VT_01 = int32(0)[3]

Then the .vtentry directives of the Dabba and Doo methods must be changed to .vtentry 1:2 and .vtentry 1:3, respectively.

It’s worth making a few additional points about the sample. First, it’s good practice to define all VTableFixup and v-table entries in the beginning of the source code, before any methods or other data constants are defined. This ensures that you will not attempt to assign a nonexistent v-table slot to a method and that the v-table will be contiguous.

Second, in the sample, the export ordinals correspond to v-table entry numbers. In fact, no such correspondence is necessary. But if you’re using the v-table only for the purpose of unmanaged export, it might not be a bad idea to maintain this correspondence simply to keep track of your v-table slots. It won’t do you any good to assign the same v-table slot or the same export ordinal to two different methods.

Third, you should remember that the export ordinals are relative. The Export Directory table has a Base entry, which contains the base value for the export ordinals. The IL assembler simply finds the lowest ordinal used in the .export directives throughout the source code and assigns this ordinal to the Base entry. If you start numbering your exports from 5, it does not mean that the first four entries in the EAT will be undefined. The common practice is to use 1-based export ordinals.

At this moment, if you were paying attention, you would say, “Wait a minute! You are talking about the v2.0+ IL assembler targeting different platforms, and at the same you are suggesting to put the platform-specific details right in the source code?!”

But I’m not sure you were, so I’m saying it myself. Yes, if you look at the code of the sample YDD.il, you will see that the directives .corflags, .vtfixup, and .data are platform-specific (in this case, x86-specific), so in order to generate YDD.DLL for, say, the X64 platform, you would need to change the source code. This is the bad news.

The good news is that the IL assembler v2.0+ does not require these directives at all, as long as the v-table and VTFixup table are used for unmanaged exports only. Just specify the .export directives in the methods you want to export to the unmanaged world, and the flags, the v-table, and its fixups will be generated automatically by the compiler, with the slot size adjusted for the target platform:

.assembly extern mscorlib { auto}
.assembly YDD { }
.module YDD.dll
.method public static void Yabba()
{
   .export[1]
   ldstr"Yabba"
   call void[mscorlib]System.Console::WriteLine(string)
   ret
}
.method public static void Dabba()
{
   .export[2]
   ldstr"Dabba"
   call void[mscorlib]System.Console::WriteLine(string)
   ret
}
.method public static void Doo()
{
   .export[3]
   ldstr"Doo!"
   call void[mscorlib]System.Console::WriteLine(string)
   ret
}

In the case of an embedded “traditional” unmanaged client (that is, when the unmanaged code of a mixed-code module takes the initiative and calls the managed methods), the managed/unmanaged code interoperation is performed along the lines similar to the previously described case of external “traditional” unmanaged client. The embedded case is simpler because there is no need to involve the export tables (the calling code is embedded in this very module), and hence there is no need to generate the jump stubs. So in the case of the embedded unmanaged client, all interoperation is done via the module’s v-table and VTFixup table, with the CLR automatically generating the marshaling thunks for inverse P/Invoke (unmanaged code calling the managed). Just in case, let me remind you that existing versions of the IL assembler cannot generate the mixed-code modules.

Summary

In this chapter, I discussed six possible scenarios of managed/unmanaged code interoperation, based on three dichotomies: COM interoperation vs. “traditional” interoperation, unmanaged code as client (calling) vs. unmanaged code as server (being called), and external unmanaged code (residing in different module) vs. embedded unmanaged code (residing in the same module). With three dichotomies, one would expect eight scenarios, but there are only six, because the COM interoperation always involves external unmanaged code.

COM interoperation involves the generation of RCWs for representing the COM objects in the managed world (COM as server) and of CCWs for exposing the managed objects to the COM world (COM as client). Both RCWs and CCWs are generated by the CLR at runtime, and both serve two main purposes: marshaling the parameters across the managed/unmanaged boundaries and coordinating GC reference tracking on the managed side and COM-specific reference counting on the unmanaged side.

“Traditional” interoperation with unmanaged code posing as the server is based on the platform invocation mechanism (P/Invoke) and involves the generation of marshaling thunks. The marshaling thunks are automatically generated by the CLR at runtime according to the implementation map metadata (ImplMap table) and to the called method’s signature.

“Traditional” interoperation with unmanaged code posing as the client is based on the inverse P/Invoke mechanism and also involves the generation of marshaling thunks. This interoperation takes place via the module’s v-table and VTFixup table, and the marshaling thunks are automatically generated by the CLR at runtime according to data stored in these tables (which are not part of the metadata) and to the called method’s signature. In case of external “traditional” unmanaged client, the unmanaged export tables and unmanaged jump stubs must be generated by the compiler and persisted in the managed module.

“Traditional” interoperation within mixed-code modules is known as IJW and is (so far) specific to the VC++ compiler because no other compiler (so far) can produce the mixed-code modules.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.117.196.217