Chapter 13. Metadata and Reflection

Metadata, which is often described as data about data, is a description of the data in an assembly. It represents the state of the assembly. Metadata is information pertaining to the assembly, including a detailed description of each type, the attributes of the assembly, and other particulars of the assembly itself. Metadata is similar to a type library in COM, except that metadata is persisted in the assembly that it describes. For this reason, assemblies are often referred to as self-describing. Because metadata is indigenous to the assembly, metadata cannot be lost and versioning problems are avoided. Metadata is emitted primarily by managed language compilers and consumed by metadata browsers, other .NET tools, and general managed applications. The Common Language Runtime (CLR) uses metadata extensively. Just-in-time compilation, code access security, garbage collection, and other services of the CLR rely heavily on metadata. Once emitted, metadata is read-only.

Metadata is important to anyone programming in the managed environment. Assembly inspection, late binding, and advanced concepts such as self-generating code require a nontrivial understanding of metadata. You also can interrogate metadata, which is called reflection. Reflection facilitates late binding and other uses of metadata. Most importantly, mastery of metadata promotes a better understanding of the managed world, which (one hopes) translates into better-written code.

This chapter introduces some advanced concepts. Further research may be required. However, it is important to introduce some of these concepts in the context of C# programming.

Metadata

Metadata about the overall assembly and modules (macro metadata) is called the manifest. Some of the macro information placed in the manifest includes the simple name, version number, external references, module name, and public key of the assembly. A portion of the manifest is created from the assembly attributes found in the AssemblyInfo.cs file of a Microsoft Visual Studio 2008 C#.NET project. Here is a partial listing of a typical AssemblyInfo.cs file:

using System.Reflection;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;

// General Information about an assembly is controlled through the following
// set of attributes. Change these attribute values to modify the information
// associated with an assembly.
[assembly: AssemblyTitle("WindowsApplication4")]
[assembly: AssemblyDescription("")]
[assembly: AssemblyConfiguration("")]
[assembly: AssemblyCompany("")]
[assembly: AssemblyProduct("WindowsApplication4")]
[assembly: AssemblyCopyright("Copyright © 2008")]
[assembly: AssemblyTrademark("")]
[assembly: AssemblyCulture("")]

Metadata also chronicles the microdata of the assembly, such as types, methods, and attributes. Metadata paints a portrait of each type, including the type name, methods of the type, parameters of each method of the type, each field of the type, and further details related to the loading and executing of that type at run time. Types are probably the most important construct in a .NET application, and metadata about types is used throughout the life cycle of a managed application. Here are a couple of examples of this. At startup, metadata is used to identify the entry point method where the program starts executing. During program execution, when a class is first touched, an internal component is built from metadata to represent that type to the just-in-time compiler. This component is an important ingredient in the just-in-time compilation process. This is further discussed in detail in Chapter 16.

Use attributes to add additional metadata to the manifest or other metadata. Attributes are the adjectives of a managed application and extend the description of an assembly, class, method, field, or other target. Attributes are recorded as metadata and extend the axiomatic metadata of an assembly. In addition, the Microsoft .NET Framework Class Library (FCL) offers predefined custom and pseudo-custom attributes. Obsolete and StructLayout attributes are examples of predefined custom attributes. Serializable is an example of a pseudo-custom attribute. The Obsolete attribute marks an entity as deprecated, whereas the StructLayout attribute stipulates the memory layout of fields in native memory. Native memory is not on the managed heap and is beyond the scope of the CLR. The latter attribute is essential when passing a managed type to an unmanaged function or application programming interface (API). You can augment the predefined attributes with programmer-defined custom attributes, limited only by your imagination. Applying a version number to a class, assigning the name of the responsible developer or team to a class, and identifying design documents used to architect an application are potential ways to exploit custom attributes.

Metadata is organized as a nonhierarchical but relational database of cross-referencing tables. The metadata database has many tables that can—and often do—reference each other. However, no parent-child relationship between tables is ever implied. Each categorization of data is maintained in a separate table. Consider for example the TypeDef and MethodDef tables. Types alone are stored in the TypeDef table. Each record of the TypeDef table represents a type. If there were six types in the assembly, there would be six records or rows in the TypeDef table. Methods belonging to all types are stored in the MethodDef table, where each row describes a method. The TypeDef table references the MethodDef table to link a type to its member functions. The MethodList column of the TypeDef table has record indexes (RIDs) into the MethodDef table. Extending this model, the MethodDef table has a ParamList column, which has indexes to the method’s parameters in the Param table.

Metadata tables are assigned unique table identifiers, which are one-byte unsigned integers. For example, the table identifier for the TypeDef table is 2, whereas 6 identifies the MethodDef table. Metadata tables reserved for the run time are not published and not assigned an external table identifier. Table 13-1 lists some of the popular metadata tables.

Table 13-1. Metadata tables

Table

Identifier

Description

Assembly

0x20

Data related to the overall assembly

Field

0x04

Fields (data members) of types

MethodDef

0x06

Methods (member functions) of types

NestedClass

0x29

Type definitions for nested types

Param

0x08

Method parameters of functions

Property

0x17

Properties of types

TypeDef

0x02

Type definitions of types in current assembly

TypeRef

0x01

Type definitions of types external to this module

Metadata Tokens

Metadata tables are collections of fixed-length records and columns. A metadata table contains a certain type of data, and each record is an instance of that type. Columns represent specific data on each instance, and each column contains either a constant or an index. An index in a metadata column references another table or heap and is also known as a metadata token. (Metadata heaps are explained in the next section.) Metadata tokens are used as metadata pointers, allowing tables to cross-reference each other. Metadata tables can be optimized (compressed) or not optimized. For the purpose of this book, it is assumed that metadata is optimized. Metadata that is not optimized requires intermediate tables for ordered access between tables.

Tokens are four-byte unsigned integers and a combination of the table identifier and RID. As shown in Figure 13-1, the first byte is the table identifier, and the last three bytes are the RID. A token referring to the Field table, for example, might be 0x04000002. This token refers to the second row of the Field table. RIDs start at one and are not zero-based. Because tokens are padded with zeros, the run time might optimize them. Metadata tokens are probably the most public manifestation of metadata. You will see metadata tokens often over the next few chapters of this book.

Layout of a metadata token

Figure 13-1. Layout of a metadata token

Metadata Heaps

Metadata tables reference metadata heaps and other tables. Records of metadata tables hold fixed-length metadata information. Variable-length data is stored in one of the metadata heaps. Method signatures are an example of data placed on a metadata heap. They are variable-length and stored on the String heap.

The four metadata heaps are as follows: String, Userstring, Blob, and GUID.

  • The String heap is an array of null-terminated strings. Namespace, type, field, and method names, as well as other identifiers, are stored on the String heap.

  • User-defined strings reside on the Userstring heap. The Userstring heap is an array of null-terminated strings.

  • The Blob heap is a binary heap and a composite of length prefix data, such as default values, method signatures, and field signatures. Length prefix data precedes each binary blob with the length.

  • The GUID heap is an array of globally unique identifiers (GUIDs). Yes, this is obvious. You might remember GUIDs from COM as 16-byte unique identifiers assigned to almost everything—most notably, class identifiers (CLSIDs) are assigned to class factories. There are also TYPEIDs, LIBIDs, IIDs, and much more. What kind of GUID is stored on the GUID heap? The GUID heap contains module version identifiers (MVIDs).

Streams

Physically, metadata tables and heaps are persisted in streams. Six possible streams, including streams for each metadata heap, are available in .NET. There are also two mutually exclusive streams, optimized and nonoptimized, which are reserved for metadata tables. Metadata tables are either completely optimized or not optimized—there’s no such thing as partial optimization of metadata tables. If the metadata tables are optimized, the optimized stream is used. Otherwise, the nonoptimized stream is used. Therefore, any particular managed application has at most five streams. Table 13-2 provides a complete list of the metadata streams.

Table 13-2. Metadata streams

Name

Description

#~

Optimized or compressed metadata tables

#-

Nonoptimized metadata tables

#Blob

Physical repository of the Blob heap

#GUID

Physical repository of the GUID heap

#String

Physical repository of the String heap

#US

Physical repository of the Userstring heap

Metadata Validation

Successful execution of a managed application depends largely on metadata. Improperly formed metadata could cause a managed application to fail unceremoniously. An assembly with bad metadata is like a house built on quicksand. Loading a class, just-in-time compilation, code access security, and other run-time operations depend on robust metadata. Metadata validation tests the correctness of metadata and is performed preemptively, preventing applications with corrupt metadata from being executed. Preventing application crashes caused by improper metadata enforces code isolation.

Several tests are performed to validate metadata. Here is an abbreviated list of these tests:

  • Cross-references between tables are validated.

  • Offsets into metadata heaps are validated.

  • Metadata tables must have a valid number of rows. For example, the Assembly table must have exactly one row.

  • Metadata tables cannot have duplicate rows.

You can use the PEVerify and Intermediate Language Disassembler (ILDASM) tools to validate metadata. Both tools are included in the .NET Framework software development kit (SDK).

PEVerify submits an assembly for metadata validation and Microsoft Intermediate Language (MSIL) verification and then reports the results. (MSIL verification is discussed in Chapter 14.) Run the PEVerify tool from a command prompt using the following basic syntax:

PEVerify assemblyname

PEVerify first validates the metadata of assemblyname. If metadata validation is successful, MSIL verification is conducted next. If metadata validation fails, the target assembly cannot be executed and MSIL verification is skipped. PEVerify offers a variety of optional arguments, including the option to perform MSIL verification even when the metadata validation fails.

Table 13-3 lists some of the PEVerify options.

Table 13-3. Selected PEVerify options

Argument

Description

/clock

Collects data and reports duration of verification and validation tests.

/HRESULT

Displays errors in hexadecimal format.

/ignore=errorcode1, errorcode2, ..., errorcoden

Ignores listed error codes.

/il

Conducts MSIL verification. With this option, if metadata validation is also desired, it must be requested explicitly.

/md

Explicitly conduct metadata validation. If MSIL verification is also required, MSIL verification must be requested explicitly.

/unique

Ignores repeating error codes.

The following is a simple "Hello World!" application, which is compiled to hello.exe. It is a minimal application, in which not much can go wrong. PEVerify will confirm this:

using System;

class Starter {
    static void Main() {
        Console.WriteLine("Hello, World!");
    }
}

The following output shows the result of running PEVerify on hello.exe with the /il and /clock options. Because the md command is omitted, metadata verification is skipped:

c:>peverify /il /clock hello.exe

All Classes and Methods in hello.exe Verified.
Timing: Total run     125 msec
        IL Ver.cycle  125 msec
        IL Ver.pure   93 msec

The elapsed cycle and pure verification times are listed. Pure verification time is the duration of the test, whereas cycle verification time also encompasses the startup and shutdown processes.

ILDASM

ILDASM is a .NET tool that also can perform various validations. In addition, you can use this tool to browse and display the metadata of an assembly—including the manifest. ILDASM inspects an assembly using reflection and can present the results in a window, console, or file.

ILDASM, which is a .NET disassembler and metadata browser, is a popular tool for developers. It proffers an internal representation of an assembly, which includes the metadata and MSIL code of an assembly in a variety of formats. ILDASM uses reflection to inspect an assembly. The basic command-line syntax of ILDASM requires only an assembly name, which opens ILDASM and displays the metadata of the assembly:

ildasm assemblyname

The following simple application is a basic .NET application that references a library. The application has a ZClass and a ZStruct type:

using System;

namespace Donis.CSharpBook {

    interface IA {
    }

    struct ZStruct {
    }

    class Starter {

        public static void Main() {
            ZClass obj1 = new ZClass();
            obj1.DisplayCreateTime();
            ZClass obj2 = new ZClass();
            obj2.DisplayCreateTime();
        }
    }

    class ZClass : IA {

    public enum Flag {
        aflag,
        bflag
    }

    public event EventHandler AEvent = null;

        public void DisplayCreateTime() {
            Console.WriteLine("ZClass created at " + m_Time);
        }

        private string m_Time = DateTime.Now.ToLongTimeString();
        public string Time {
            get {
                return m_Time;
            }
        }
    }
}

Figure 13-2 is a view of simple.exe from ILDASM. ILDASM displays a hierarchal object graph with an icon for each element of the application.

The simple.exe assembly displayed in ILDASM

Figure 13-2. The simple.exe assembly displayed in ILDASM

Some icons are expandable or collapsible, as indicated by a + or symbol, if you want to see more or less detail. The Assembly icon expands to show the details of the target assembly, the Namespace icon expands to show the members of the namespace, and so on. You can explore the object graph from the assembly down to the class members. Each icon depicts the category of item. Table 13-4 lists each icon and the action associated with double-clicking the icon.

Table 13-4. Elements of ILDASM

Icon

Action

Assembly

Shows elements of the assembly

Class

Shows members of a class

Enum

Shows members of an enum type

Event

Shows metadata and MSIL code of an event

Field

Shows metadata of a field

Interface

Shows members of an interface

Manifest

Shows attributes of an assembly

Method

Shows metadata and MSIL code of a method

Namespace

Shows members of a namespace

Property

Shows metadata and MSIL code of a property

Static Field

Shows metadata of a static field

Static Method

Shows metadata and MSIL code of a static method

Value Type

Shows members of a value type

Some elements are displayed twice. For example, a property is presented as itself and separately as accessor and mutator methods.

ILDASM has a variety of command-line options. Table 13-5 lists these parameters.

Table 13-5. ILDASM options

ILDASM option

Description

/Out

Renders metadata and MSIL to a text file.

/Text

Renders metadata and related MSIL to a console.

/ HTML

Combines with the Out option to persist metadata and MSIL in Hypertext Markup Language (HTML) format.

/RTF

Renders metadata and MSIL in Rich Text Format (RTF).

/Bytes

Shows MSIL code with opcodes and related bytes.

/Raweh

Shows label form of try and catch directives in raw form.

/ Tokens

Shows metadata tokens.

/ Source

Shows MSIL interlaced with commented source code; for this option, the source code and debug file must be in the current path.

/ Linenum

Inserts line directives into an output stream that matches source code to MSIL. This option requires the debug file.

/Visibility

Disassembles only members with the stated visibility: pub (public), pri (private), fam (family), asm (assembly), faa (family and assembly), foa (family or assembly), and psc (private scope).

/Pubonly

Disassembles only public elements; short notation for visibility=pub.

/QuoteAllNames

Brackets all identifiers in single quotes.

/NOCA

Excludes custom attributes.

/CAVerbal

Displays blob information of custom attributes in symbolic form rather than binary.

/NOBAR

Suppresses progress bar display.

/ UTF8

Renders output file in UTF8 (8-bit UCS/Unicode Transformation Format). The default is American National Standards Institute (ANSI) format.

/UNICODE

Renders output file in UNICODE.

/NOIL

Prevents source code disassembly.

/TypeList

Displays list of types.

/Headers

Includes DOS, PE, COFF, CLR, and metadata header information.

/Item

Disassembles a particular class or method.

/Stats

Displays statistical information on the assembly file, which is a portable executable

/ClassList

Provides a list of classes in the target.

/All

Specifies combination of the Header, Bytes, Stats, ClassList, and Tokens commands.

/Metadata

Displays specific information related to metadata. This command has its own set of options.

/Objectfile

Shows metadata of a library file.

The user interface and command-line options for ILDASM are similar. The following command line is typical. It disassembles simple.exe and persists the resulting metadata, MSIL, metadata tokens, and source code to the simple.il file:

ildasm /out=simple.il /source /tokens simple.exe

The /source option of the preceding command interlaces MSIL code with source code. The source code is commented. Associating MSIL to source code relates each source statement to the resulting MSIL code, which is invaluable when debugging. The tokens shown per the /Tokens option are also commented.

The disassembly created by ILDASM is a valid MSIL program that can be recompiled For this reason the output text file should have an il extension, as in client.il. The assembly can be reassembled with the ILASM compiler, which compiles MSIL code. The newly assembled assembly is identical to the original assembly.

Some ILDASM options cause the assembly to be partially disassembled. When a partial disassembly occurs, you are presented with a warning. One limitation is that partial assemblies cannot be reassembled using ILASM. The following command creates a partial disassembly:

ildasm /out=simple.il /item=Donis.CSharpBook.ZClass simple.exe

The preceding command-line disassembles only the ZClass of the simple.exe assembly. Because other types are omitted from the disassembly, the result is a partial disassembly. For this reason, a warning is appended to the output. The following is a partial listing of the output file with the embedded warning:

//  Microsoft (R) .NET Framework IL Disassembler.  Version 3.5.21022.8
//  Copyright (c) Microsoft Corporation.  All rights reserved.

// warning : THIS IS A PARTIAL DISASSEMBLY, NOT SUITABLE FOR RE-ASSEMBLING

.class private auto ansi beforefieldinit Donis.CSharpBook.ZClass
       extends [mscorlib]System.Object
       implements Donis.CSharpBook.IA
{
  .class auto ansi sealed nested public Flag
         extends [mscorlib]System.Enum
  {
    .field public specialname rtspecialname int32 value__
    .field public static literal valuetype Donis.CSharpBook.ZClass/Flag
        aflag = int32(0x00000000)
    .field public static literal valuetype Donis.CSharpBook.ZClass/Flag
        bflag = int32(0x00000001)
  } // end of class Flag

  .field private class [mscorlib]System.EventHandler AEvent
  .field private string m_Time
  .method public hidebysig specialname instance void

Here is the final example of ILDASM and command-line options. The following command validates the metadata and persists the results to the simple.txt file:

ildasm /metadata=validate /out=simple.txt simple.exe
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.15.12.34