Two methods to access the metadata with an assembly will be discussed. A third method, the Reflection API, is built on top of these two methods. Reflection will be covered in further detail in Chapter 17, “Reflection.”
Both of the APIs covered in this chapter do not require the .NET CLR. The first method only requires that the mscorwks.dll be installed correctly on your system. It is an unmanaged COM API, which is relatively easy to use. It is a lower level than the Reflection API, but after you understand the basics of how an assembly is laid out, the unmanaged API is not that hard to use.
Jim Miller of Microsoft termed the second method “heroic.” This method takes the assembly specification as submitted to ECMA and deciphers it byte by byte. Of course, this method requires some facility to read in the binary assembly. This requires more intimate knowledge of the physical layout of the assembly. This method is covered in the next section.
An interface in the unmanaged API is the gateway for all other interfaces. It is appropriately named IMetaDataDispenser, or IMetaDataDispenserEx. As the name implies, this interface literally dispenses all of the other interfaces. IMetaDataDispenser and IMetaDataDispenserEx are pretty similar, so either interface is fine. The Ex version simply adds a few methods that change the way an assembly is searched or allows you to view where the Framework was installed (the system directory path). Both of these interfaces are COM interfaces, so it's important to have some familiarity with COM to effectively use these interfaces. This example uses C++ to access the COM interfaces. If ATL had been used, the implementation would have been marginally simpler. A VB implementation should be even simpler still. The full source for this application is in the AssemblyCOM directory. Listing 4.4 shows how to obtain a pointer to the IMetaDataDispenserEx interface.
#include <cor.h> . . . HRESULT hr = CoCreateInstance(CLSID_CorMetaDataDispenser, NULL, CLSCTX_INPROC_SERVER, IID_IMetaDataDispenserEx, (void **) &m_pDisp); |
Next, you will need to associate an assembly file with the set of metadata APIs. You do this with the OpenScope method of the IMetaDataDispenser interface. Listing 4.5 shows how to call OpenScope.
#include <cor.h> . . . WCHAR szScope[1024]; wcscpy(szScope, L"file:"); wcscat(szScope, lpszPathName); // Attempt to open scope on given file HRESULT hr = m_pDisp->OpenScope(szScope, 0, IID_IMetaDataImport, (IUnknown**)&m_pImport); |
The IMetaDataImport interface provides most of the functionality that is typically required. You might want to query the IMetaDataImport interface for the IMetaDataAssemblyImport interface, but you will find that most of the metadata information is available from methods on the IMetaDataImport interface.
Table 4.2 listed the tables that can be defined. How do you get at those tables? Table 4.3 shows the association between a method call on IMetaDataImport and the tables that are listed in Table 4.2.
Code | Table Name | Token | Method |
---|---|---|---|
0x0C | CustomAttribute | mdCustomValue | EnumCustomAttributes |
0x14 | Event | mdEvent | EnumEvents |
0x04 | Field | mdFieldDef | EnumFields |
0x09 | InterfaceImpl | mdInterfaceImpl | EnumInterfaceImpls |
0x0A | MemberRef | mdMemberRef | EnumMemberRefs |
mdToken | EnumMembers | ||
0x06 | Method | mdMethodDef | EnumMethods |
0x1A | ModuleRef | mdModuleRef | EnumModuleRefs |
0x08 | Param | mdParamDef | EnumParams |
0x0E | DeclSecurity | mdPermission | EnumPermissionSets |
0x17 | Property | mdProperty | EnumProperties |
0x02 | TypeDef | mdTypeDef | EnumTypeDefs |
0x01 | TypeRef | mdTypeRef | EnumTypeRefs |
mdString | EnumUserStrings |
In addition to these method calls, a general table interface called IMetaDataTables has methods for enumerating through the tables, row by row. To illustrate how to use the unmanaged APIs, a project has been created that allows you to explore the metadata of an assembly. The full source for the application is in the AssemblyCOM subdirectory. When this application is run using the hello.exe assembly that was explored in the previous section, the tool looks like Figure 4.6.
This application uses many unmanaged APIs. A separate property page was built for different views into the assembly metadata. To understand how to use the unmanaged APIs, look at the property page labeled TypeDef. This property page looks at fields, methods, and parameters that are defined as a type. Listing 4.6 shows how the process is started.
An IMetaDataImport interface was already obtained, as described earlier. Now you can step through each of the types that is defined in this module using HelloWorld.exe. EnumTypeDefs is called to enumerate all the types. To see how interconnected all these tables are, you could have just as easily started at the TypeDef table and iterated through each row in the table. For each of the types defined, DisplayTypeDefInfo is called to list the contents of a single type. A type can be a method, a property, an event, an interface, a permission class, or custom attributes. One of the more interesting subtrees in the TypeDef tree is the branch that deals with methods. The format of the enumeration is much the same as Listing 4.6. You can start the enumeration and then explicitly close it. Listing 4.7 shows an example of calling the EnumMethods.
You should see some thread of commonality between Listing 4.7 and Listing 4.6. The enumeration is started and a HCORENUM handle is passed back. When you are finished with the enumeration, call CloseEnum to close off the enumeration. Various EnumXXX methods have different inputs and outputs, but they all take a token as described in Table 4.3. The upper byte of this token describes the table that is being referenced. The remaining bytes specify an index into that table. For example, when you first start up the AssemblyCOM application and select the TypeDef property page for the HelloWorld.exe assembly, one of the tokens that you see in the debugger is 0x02000002. This is index 2 into table number 2, which is the TypeDef table. Index 2 refers to the CLRUnleashed.Hello class of which Main is the only member. The indexes that are part of a token are 1-based. Zero is an indication that the feature or table entry is not present. This application is not finished, but it is far enough along to provide a good starting point for learning the unmanaged APIs.
From within the loop enumerating the TypeDefs are several other EnumXXX method calls. One of those loops is the EnumMethods method shown in Listing 4.7. Each of the EnumXXX methods is usually followed by a call to GetXXXProps. The pattern is to open the enumeration, get the detail, and close the enumeration. The “get the details” portion of the pattern is supported by the managed API call to GetXXXProps. In the sample code, the call to EnumMethods is succeeded by a call to GetMethodProps; however, because of the level of detail, it has been split out to an internal helper call to DisplayMethodInfo. If there is a particular OUT parameter in which you are not interested, you can simply pass NULL instead of a valid address to a variable. For some of the GetXXXProps, this can save productivity because you don't have to worry about setting up a variable that doesn't interest you. If you look at the call to GetTypeDefProps from within the internal TypeDefName function, you see that only the arguments to retrieve the name of the last three arguments are supplied with NULL. In a function like TypeDefName, these parameters are not important.
The one road block that you might run into in trying to crack the assembly metadata is with signatures. Signatures are necessarily complex because they have to generically describe a method, a field, and so on. Signatures need to describe the return type and each of the arguments (parameters) to the function or method. The signature data cannot be cracked easily. However, if your method signature is simple, then the corresponding metadata is relatively simple. The general format for a signature is as follows:
<calling convention> <parameter count> <return type> <parameter #1 description> . . . <parameter #n description>
Where this gets complicated is the parameter count. The parameter count is not a simple number, but a compressed value that needs to be decompressed for correct interpretation. The return type is coded to describe returning a reference value, not returning a value (void), or returning a complex type. Each of the parameters can be simple or complex. If it is just a simple value, then a simple switch statement allows you to decode the signature values. If the parameter is more complex, then you might end up with recursion. Because of these complexities, the application, AssemblyCOM, was not built to crack the signature. That's an exercise for you. AssemblyCOM simply displays the hex bytes that represent the signature description in the metadata.
3.144.193.129