An Unmanaged API to Access Assembly Metadata

Two methods to access the metadata with an assembly will be discussed. A third method, the Reflection API, is built on top of these two methods. Reflection will be covered in further detail in Chapter 17, “Reflection.”

Both of the APIs covered in this chapter do not require the .NET CLR. The first method only requires that the mscorwks.dll be installed correctly on your system. It is an unmanaged COM API, which is relatively easy to use. It is a lower level than the Reflection API, but after you understand the basics of how an assembly is laid out, the unmanaged API is not that hard to use.

Jim Miller of Microsoft termed the second method “heroic.” This method takes the assembly specification as submitted to ECMA and deciphers it byte by byte. Of course, this method requires some facility to read in the binary assembly. This requires more intimate knowledge of the physical layout of the assembly. This method is covered in the next section.

An interface in the unmanaged API is the gateway for all other interfaces. It is appropriately named IMetaDataDispenser, or IMetaDataDispenserEx. As the name implies, this interface literally dispenses all of the other interfaces. IMetaDataDispenser and IMetaDataDispenserEx are pretty similar, so either interface is fine. The Ex version simply adds a few methods that change the way an assembly is searched or allows you to view where the Framework was installed (the system directory path). Both of these interfaces are COM interfaces, so it's important to have some familiarity with COM to effectively use these interfaces. This example uses C++ to access the COM interfaces. If ATL had been used, the implementation would have been marginally simpler. A VB implementation should be even simpler still. The full source for this application is in the AssemblyCOM directory. Listing 4.4 shows how to obtain a pointer to the IMetaDataDispenserEx interface.

Listing 4.4. Getting an Instance of the IMetaDataDispenserEx Interface
#include <cor.h>
. . .
HRESULT hr = CoCreateInstance(CLSID_CorMetaDataDispenser,
                              NULL,
                              CLSCTX_INPROC_SERVER,
                              IID_IMetaDataDispenserEx,
                              (void **) &m_pDisp);

Next, you will need to associate an assembly file with the set of metadata APIs. You do this with the OpenScope method of the IMetaDataDispenser interface. Listing 4.5 shows how to call OpenScope.

Listing 4.5. OpenScope in IMetaDataDispenser Interface
#include <cor.h>
. . .
WCHAR szScope[1024];
wcscpy(szScope, L"file:");
wcscat(szScope, lpszPathName);

// Attempt to open scope on given file
HRESULT hr = m_pDisp->OpenScope(szScope,
                                0,
                                IID_IMetaDataImport,
                               (IUnknown**)&m_pImport);

The IMetaDataImport interface provides most of the functionality that is typically required. You might want to query the IMetaDataImport interface for the IMetaDataAssemblyImport interface, but you will find that most of the metadata information is available from methods on the IMetaDataImport interface.

Table 4.2 listed the tables that can be defined. How do you get at those tables? Table 4.3 shows the association between a method call on IMetaDataImport and the tables that are listed in Table 4.2.

Table 4.3. Metadata Tables
Code Table Name Token Method
0x0C CustomAttribute mdCustomValue EnumCustomAttributes
0x14 Event mdEvent EnumEvents
0x04 Field mdFieldDef EnumFields
0x09 InterfaceImpl mdInterfaceImpl EnumInterfaceImpls
0x0A MemberRef mdMemberRef EnumMemberRefs
  mdToken EnumMembers
0x06 Method mdMethodDef EnumMethods
0x1A ModuleRef mdModuleRef EnumModuleRefs
0x08 Param mdParamDef EnumParams
0x0E DeclSecurity mdPermission EnumPermissionSets
0x17 Property mdProperty EnumProperties
0x02 TypeDef mdTypeDef EnumTypeDefs
0x01 TypeRef mdTypeRef EnumTypeRefs
  mdString EnumUserStrings

In addition to these method calls, a general table interface called IMetaDataTables has methods for enumerating through the tables, row by row. To illustrate how to use the unmanaged APIs, a project has been created that allows you to explore the metadata of an assembly. The full source for the application is in the AssemblyCOM subdirectory. When this application is run using the hello.exe assembly that was explored in the previous section, the tool looks like Figure 4.6.

Figure 4.6. AssemblyCOM application.


This application uses many unmanaged APIs. A separate property page was built for different views into the assembly metadata. To understand how to use the unmanaged APIs, look at the property page labeled TypeDef. This property page looks at fields, methods, and parameters that are defined as a type. Listing 4.6 shows how the process is started.

Listing 4.6. Enumerating Types
void DisplayTypeDefs(IMetaDataImport* pImport, CTreeCtrl& treeCtrl)
{
    HCORENUM typeDefEnum = NULL;
    mdTypeDef typeDefs[ENUM_BUFFER_SIZE];
    ULONG count, totalCount = 1;
    HRESULT hr;
    WCHAR lBuffer[256];
    HTREEITEM typedefItem;

    while (SUCCEEDED(hr = pImport->EnumTypeDefs(&typeDefEnum,
                                                typeDefs,
                                                NumItems(typeDefs),
                                                &count)) &&
            count > 0)
    {
        for (ULONG i = 0; i < count; i++, totalCount++)
        {
            wsprintf(lBuffer, _T("TypeDef #%d"), totalCount);
            typedefItem = treeCtrl.InsertItem(lBuffer);
            DisplayTypeDefInfo(pImport, typeDefs[i], treeCtrl, typedefItem);
        }
    }
    pImport->CloseEnum( typeDefEnum);
}

An IMetaDataImport interface was already obtained, as described earlier. Now you can step through each of the types that is defined in this module using HelloWorld.exe. EnumTypeDefs is called to enumerate all the types. To see how interconnected all these tables are, you could have just as easily started at the TypeDef table and iterated through each row in the table. For each of the types defined, DisplayTypeDefInfo is called to list the contents of a single type. A type can be a method, a property, an event, an interface, a permission class, or custom attributes. One of the more interesting subtrees in the TypeDef tree is the branch that deals with methods. The format of the enumeration is much the same as Listing 4.6. You can start the enumeration and then explicitly close it. Listing 4.7 shows an example of calling the EnumMethods.

Listing 4.7. Enumerating the Methods That Are Defined for a Type
void DisplayMethods(IMetaDataImport* pImport,
                    mdTypeDef inTypeDef,
                    CTreeCtrl& treeCtrl,
                    HTREEITEM treeItem)
{
    HCORENUM methodEnum = NULL;
    mdToken methods[ENUM_BUFFER_SIZE];
    DWORD flags;
    ULONG count, totalCount = 1;
    HRESULT hr;
    WCHAR lBuffer[512];
    HTREEITEM subTreeItem;
    while (SUCCEEDED(hr = pImport->EnumMethods( &methodEnum,
                                                inTypeDef,
                                                methods,
                                                NumItems(methods),
                                                &count)) &&
            count > 0)
    {
        for (ULONG i = 0; i < count; i++, totalCount++)
        {
            wsprintf(lBuffer, _T("Method #%d %ls"),
                               totalCount,
                               (methods[i] == g_tkEntryPoint) ?
                               L"[ENTRYPOINT]" : L"");
            subTreeItem = treeCtrl.InsertItem(lBuffer, treeItem);
            DisplayMethodInfo(pImport, methods[i], &flags, treeCtrl, subTreeItem);
            DisplayParams(pImport, methods[i], treeCtrl, subTreeItem);
            //DisplayCustomAttributes(methods[i], "		");
            //DisplayPermissions(methods[i], "	");
            //DisplayMemberRefs(methods[i], "	");
            //// P-invoke data if present.
            //if (IsMdPinvokeImpl(flags))
            //    DisplayPinvokeInfo(methods[i]);
        }
    }
    pImport->CloseEnum(methodEnum);
}

You should see some thread of commonality between Listing 4.7 and Listing 4.6. The enumeration is started and a HCORENUM handle is passed back. When you are finished with the enumeration, call CloseEnum to close off the enumeration. Various EnumXXX methods have different inputs and outputs, but they all take a token as described in Table 4.3. The upper byte of this token describes the table that is being referenced. The remaining bytes specify an index into that table. For example, when you first start up the AssemblyCOM application and select the TypeDef property page for the HelloWorld.exe assembly, one of the tokens that you see in the debugger is 0x02000002. This is index 2 into table number 2, which is the TypeDef table. Index 2 refers to the CLRUnleashed.Hello class of which Main is the only member. The indexes that are part of a token are 1-based. Zero is an indication that the feature or table entry is not present. This application is not finished, but it is far enough along to provide a good starting point for learning the unmanaged APIs.

From within the loop enumerating the TypeDefs are several other EnumXXX method calls. One of those loops is the EnumMethods method shown in Listing 4.7. Each of the EnumXXX methods is usually followed by a call to GetXXXProps. The pattern is to open the enumeration, get the detail, and close the enumeration. The “get the details” portion of the pattern is supported by the managed API call to GetXXXProps. In the sample code, the call to EnumMethods is succeeded by a call to GetMethodProps; however, because of the level of detail, it has been split out to an internal helper call to DisplayMethodInfo. If there is a particular OUT parameter in which you are not interested, you can simply pass NULL instead of a valid address to a variable. For some of the GetXXXProps, this can save productivity because you don't have to worry about setting up a variable that doesn't interest you. If you look at the call to GetTypeDefProps from within the internal TypeDefName function, you see that only the arguments to retrieve the name of the last three arguments are supplied with NULL. In a function like TypeDefName, these parameters are not important.

The one road block that you might run into in trying to crack the assembly metadata is with signatures. Signatures are necessarily complex because they have to generically describe a method, a field, and so on. Signatures need to describe the return type and each of the arguments (parameters) to the function or method. The signature data cannot be cracked easily. However, if your method signature is simple, then the corresponding metadata is relatively simple. The general format for a signature is as follows:

<calling convention>
<parameter count>
<return type>
<parameter #1 description>
. . .
<parameter #n description>

Where this gets complicated is the parameter count. The parameter count is not a simple number, but a compressed value that needs to be decompressed for correct interpretation. The return type is coded to describe returning a reference value, not returning a value (void), or returning a complex type. Each of the parameters can be simple or complex. If it is just a simple value, then a simple switch statement allows you to decode the signature values. If the parameter is more complex, then you might end up with recursion. Because of these complexities, the application, AssemblyCOM, was not built to crack the signature. That's an exercise for you. AssemblyCOM simply displays the hex bytes that represent the signature description in the metadata.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.193.129