Chapter 6

image

Modules and Assemblies

This chapter discusses the organization, deployment, and execution of assemblies and modules. It also provides a detailed examination of the metadata segment responsible for assembly and module identity and interaction: the manifest. As you might recall from Chapter 1, an assembly can include several modules. Any module of a multimodule assembly can—and does, as a rule—carry its own manifest, but only one module per assembly carries the manifest that contains the assembly’s identity. This module is referred to as the prime module. Thus, each assembly, whether multimodule or single-module, contains only one prime module.

What Is an Assembly?

An assembly is a deployment unit, a building block of a managed application. Assemblies are reusable, allowing different applications to use the same assembly. Assemblies carry a full self-description in their metadata, including version information that allows the common language runtime to use a specific version of an assembly for a particular application.

This arrangement eliminates what’s known as DLL Hell, the situation created when upgrading one application renders another application inoperative because both happen to use identically named DLL(s) of different versions.

Private and Shared Assemblies

Assemblies are classified as either private or shared. Structurally and functionally, these two kinds of assemblies are the same, but they differ in how they are named and deployed and in the level of version checks performed by the loader.

A private assembly is considered part of a particular application, not intended for use by other applications. A private assembly is deployed in the same directory as the application or in a subdirectory of this directory. This kind of deployment shields the private assembly from other applications, which should not have access to it.

Being part of a particular application, a private assembly is usually created by the same author (person, group, or organization) as other components specific to this application and is thus considered to be primarily the author’s responsibility. Consequently, naming and versioning requirements are relaxed for private assemblies, and the common language runtime does not enforce these requirements. The name of a private assembly must be unique within the application.

A shared assembly is not part of a particular application and is designed to be used widely by various applications. Shared assemblies are usually authored by groups or organizations other than those responsible for the applications that use these assemblies. A prominent example of shared assemblies is the set of assemblies constituting the .NET Framework class library.

As a result of such positioning, the naming and versioning requirements for shared assemblies are much stricter than those for private assemblies. Names of shared assemblies must be globally unique. Additional assembly identification is provided by strong names, which use cryptographic public/private key pairs to ensure the strong name’s uniqueness and to prevent name spoofing. The central part of the strong name is the strong name signature (mentioned in Chapter 5)—a hash of the assembly’s prime module encrypted with the publisher’s private key. Assembly metadata carries the publisher’s public key, which is used to verify the strong name signature. A strong name also provides the consumer of the shared assembly with information about the identity of the assembly publisher. If the common language runtime cryptographic checks pass, the consumer can be sure that the assembly comes from the expected publisher, assuming that the publisher’s private encryption key was not compromised.

Shared assemblies are deployed into the machine-wide repository called global assembly cache (GAC). The GAC stores multiple versions of shared assemblies side by side. The loader looks for the shared assemblies in the GAC.

Under some circumstances, an application might need to deploy a shared assembly in its directory to ensure that the appropriate version is loaded. In such a case, the shared assembly is being used as a private assembly, so it is not in fact shared, whether it is strong named or not.

Application Domains As Logical Units of Execution

Operating systems and runtimes typically provide some form of isolation between applications running on the system. This isolation is necessary to ensure that code running in one application cannot adversely affect other, unrelated applications. In modern operating systems, this isolation is achieved by using hardware-enforced process boundaries, where a process, occupying a unique virtual address space, runs exactly one application and scopes the resources that are available for that process to use.

Managed code execution has similar needs for isolation. Such isolation can be provided at a lower cost in a managed application, however, considering that managed applications run under the control of the common language runtime and are verified to be type-safe.

The runtime allows multiple applications to be run in a single operating system process, using a construct called an application domain to isolate the applications from one another. Since all memory allocation requested by an application is done by the CLR, it is easy for the CLR to give an application access to only those objects that were allocated by the application and to block the application’s attempts to access objects allocated in another application domain. In many respects, application domains are the CLR equivalent of an operating system process.

Specifically, isolation in managed applications means the following:

  • Different security levels can be assigned to each application domain, giving the host a chance to run the applications with varying security requirements in one process.
  • Code running in one application cannot directly access code or resources from another application. (Doing so could introduce a security hole.) An exception to this rule is the base class library assembly of .NET Framework—Mscorlib—which is shared by all application domains within the process. Mscorlib is not shared between the processes.
  • Faults in one application cannot affect other applications by bringing down the entire process.
  • Each application has control over where the code loaded on its behalf comes from and the version of the code being loaded. In addition, configuration information is scoped by the application.

The following examples describe scenarios in which it is useful to run multiple applications in the same process:

  • ASP.NET runs multiple web applications in the same process. In ASP and Internet Information Services (IIS), application isolation was achieved by process boundaries, which proved too expensive to scale appropriately; it’s cheaper to run 20 application domains in one process than to spawn 20 separate processes.
  • Microsoft Internet Explorer runs code from multiple sites in the same process as the browser code itself. Obviously, code from one site should not be able to affect code from another site.
  • Database engines need to run code from multiple user applications in the same process.
  • Application server products might need to run code from multiple applications in a single process.

Hosting environments such as ASP.NET or Internet Explorer need to run managed code on behalf of the user and take advantage of the application isolation features provided by application domains. In fact, it is the host that determines where the application domain boundaries lie and in what domain user code is run, as these examples show:

  • ASP.NET creates application domains to run user code. Domains are created per application as defined by the web server.
  • Internet Explorer by default creates one application domain per site (although developers can customize this behavior).
  • In Shell EXE, each application launched from the command line runs in a separate application domain occupying one process.
  • Microsoft Visual Basic for Applications (VBA) uses the default application domain of the process to run the script code contained in a Microsoft Office document.
  • The Windows Foundation Classes (WFC) Forms Designer creates a separate application domain for each form being built. When a form is edited and rebuilt, the old application domain is shut down, the code is recompiled, and a new application domain is created.

Since isolation demands that the code or resources of one application must not be directly accessible from code running in another application, no direct calls are allowed between objects in different application domains. Cross-domain communications are limited to either copying objects or creating special proxy objects, which are the object’s “representatives” in other domains, giving the code in other domains access to instance fields and methods of the object. In regard to cross-domain communications, the objects fall into one of the following three categories:

  • Unbound objects are marshaled by value across domains. This means that the receiving domain gets a copy of the object to play with instead of the original object.
  • AppDomain-bound objects are marshaled by reference across domains, which means that cross-domain access is always accomplished through proxies.
  • Context-bound objects are also marshaled by reference across domains as well as between contexts within the same domain. A context is a set of usage rules defining an environment where the objects reside. The rules are enforced when an object is entering or leaving the context.

The CLR relies on the verifiable type safety of the code (discussed in Chapter 13) to provide fault isolation between domains at a much lower cost than that incurred by the process isolation used in operating systems. The isolation is based on static type verification, and as a result, the hardware ring transitions or process switches are not necessary.

Manifest

The metadata that describes an assembly and its modules is referred to as a manifest. The manifest carries the following information:

  • Identity, including a simple textual name, an assembly version number, an optional culture (if the assembly contains localized managed resources), and an optional public key if the assembly is strong named. This information is defined in two metadata tables: Module and Assembly (in the prime module only).
  • Contents, including types and managed resources exposed by this assembly for external use and the location of these types and resources. The metadata tables that contain this information are ExportedType (in the prime module only) and ManifestResource.
  • Dependencies, including other (external) assemblies this assembly references and, in the case of a multimodule assembly, other modules of the same assembly. You can find the dependency information in these metadata tables: AssemblyRef, ModuleRef, and File.
  • Requested permissions, specific to the assembly as a whole. More specific requested permissions might also be defined for certain types (classes) and methods. This information is defined in the DeclSecurity metadata table. (Chapter 17 describes requested permissions and the ways to declare them.)
  • Custom attributes, specific to the manifest components. Custom attributes provide additional information used mostly by compilers and other tools. The CLR recognizes a limited number of custom attributes. Custom attributes are defined in the CustomAttribute metadata table. (Refer to Chapter 16 for more information on this topic.)

Figure 6-1 shows the mutual references that take place between the metadata tables constituting the manifest.

9781430267614_Fig06-01.jpg

Figure 6-1. Mutual references between the manifest’s metadata tables

Assembly Metadata Table and Declaration

The Assembly metadata table contains at most one record, which appears in the prime module’s metadata. The table has the following column structure:

  • HashAlgId(4-byte unsigned integer): The ID of the hash algorithm used in this assembly to hash the files. The value must be one of the CALG_* values defined in the header file Wincrypt.h. The default hash algorithm is CALG_SHA (a.k.a. CALG_SHA1) (0x8004). Ecma International/ISO specifications consider this algorithm to be standard, offering the best widely available technology for file hashing. CLR versions 1.0, 1.1, and 2.0 support only MD5 (0x8003) and SHA1 (0x8004) algorithms. CLR version 4.0 introduced support of SHA256 (0x800C), SHA384 (0x800D), and SHA512 (0x800E) algorithms.
  • MajorVersion(2-byte unsigned integer): The major version of the assembly.
  • MinorVersion(2-byte unsigned integer): The minor version of the assembly.
  • BuildNumber(2-byte unsigned integer): The build number of the assembly.
  • RevisionNumber(2-byte unsigned integer): The revision number of the assembly.
  • Flags(4-byte unsigned integer): Assembly flags indicating whether the assembly is strong named (set automatically by the metadata emission API if PublicKey is present), whether the JIT tracking and/or optimization is enabled (set automatically on assembly load), and whether the assembly can be retargeted at run time to an assembly of a different version. JIT tracking is the mapping of IL instruction offsets to addresses of native code produced by the JIT compiler; this mapping is used during the debugging of the managed code. The CLR version 4.5 introduced additional assembly flags indicating the assembly platform (cil, x86, ia64, amd64, arm) and a flag indicating whether the assembly is in fact a Windows Runtime metadata stub or a managed app (windowsruntime, see Chapter 18 for details).
  • PublicKey(offset in the #Blob stream): A binary object representing a public encryption key for a strong-named assembly.
  • Name(offset in the #Strings stream): The assembly name, which must be nonempty and must not contain a path or a filename extension (for example, mscorlib, System.Data).
  • Locale(offset in the #Strings stream): The culture (formerly known as locale) name, such as en-US (American English) or fr-CA (Canadian French), identifying the culture of localized managed resources of this assembly. The culture name must match one of hundreds of culture names “known” to the runtime through the .NET Framework class library, but this validity rule is rather meaningless: to use a culture, the specific language support must be installed on the target machine. If the language support is not installed, it doesn’t matter whether the culture is “known” to the runtime.

In ILAsm, the Assembly is declared in the following way (for example):

.assembly mscorlib
{
  .publickey = (00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 )
  .hash algorithm 0x00008004
  .ver 2:0:0:0
}

The ILAsm syntax of the Assembly declaration is

.assembly<flags> <name> { <assemblyDecl>* }
0p class="noindent">where <flags> ::=
<none>            // (0x0000) Assembly cannot be retargeted,
                   //          platform is defined by PE and CLR headers.
|cil              // (0x0010) Assembly is pure MSIL
| x86              // (0x0020) Assembly is x86-specific
| ia64             // (0x0030) Assembly is IA64-specific
| amd64            // (0x0040) Assembly is AMD64-specific
| arm              // (0x0050) Assembly is ARM-specific (introduced in v4.5)
| retargetable     // (0x0100) Assembly can be retargeted
| windowsruntime   // (0x0200) Assembly is a Windows Runtime metadata stub
                   //          or managed app (introduced in v4.5)
0p class="noindent">and <assemblyDecl> ::=
.hash algorithm<int32>               // Set hash algorithm ID
| .ver<int32>:<int32>:<int32>:<int32>// Set version numbers
| .publickey= ( <bytes> )            // Set public encryption key
| .locale<quotedString>              // Set assembly culture
| <securityDecl>                      // Set requested permissions
| <customAttrDecl>                    // Define custom attribute(s)

In this declaration, <int32> denotes an integer number, 4 bytes in size. The notation <bytes> represents a sequence of two-digit hexadecimal numbers, each representing 1 byte; this form, bytearray, is often used in ILAsm to represent binary objects of arbitrary size. Finally, <quotedString> denotes, in general, a composite quoted string, such as a construct like  "ABC"+"DEF"+"GHI". The concatenation with the plus sign is useful for defining very long strings, although in this case we don’t need concatenation for strings such as en-US or nl-BE.

AssemblyRef Metadata Table and Declaration

The AssemblyRef (assembly reference) metadata table defines the external dependencies of an assembly or a module. Both prime and nonprime modules can—and do, as a rule—contain this table. The only assembly that does not depend on any other assembly, and hence has an empty AssemblyRef table, is Mscorlib.dll, the root assembly of the .NET Framework class library.

The column structure of the AssemblyRef table is as follows:

  • MajorVersion(2-byte unsigned integer): The major version of the assembly.
  • MinorVersion(2-byte unsigned integer): The minor version of the assembly.
  • BuildNumber(2-byte unsigned integer): The build number of the assembly.
  • RevisionNumber (2-byte unsigned integer): The revision number of the assembly.
  • Flags(4-byte unsigned integer): Assembly reference flags, which indicate whether the assembly reference holds a full unhashed public key or a “surrogate” (public key token). The CLR version 4.5 introduced additional assembly flags indicating the assembly platform (cil, x86, ia64, amd64, arm) and a flag indicating whether the assembly is a Windows Runtime metadata stub or managed app (windowsruntime, see Chapter 18 for details).
  • PublicKeyOrToken(offset in the #Blob stream): A binary object representing the public encryption key for a strong-named assembly or a token of this key. A key token is an 8-byte representation of a hashed public key, and it has nothing to do with metadata tokens.
  • Name(offset in the #Strings stream): The name of the referenced assembly, which must be nonempty and must not contain a path or a filename extension.
  • Locale (offset in the #Strings stream): The culture name.
  • HashValue(offset in the #Blob stream): A binary object representing a hash of the metadata of the referenced assembly’s prime module. This value is ignored by the loader, so it can safelybe omitted.

In ILAsm, an AssemblyRef is declared in the following way (for example):

.assembly extern mscorlib
{
  .publickeytoken= (B7 7A 5C 56 19 34 E0 89 )
  .ver2:0:0:0
}

The ILAsm syntax for an AssemblyRef declaration is

.assembly extern<flags> <name> { <assemblyRefDecl>* }
0p class="noindent">where <flags> ::=
<none>            // (0x0000) Must be so for versions older than 4.5.
| cil              // (0x0010) Assembly is pure MSIL
| x86              // (0x0020) Assembly is x86-specific
| ia64             // (0x0030) Assembly is IA64-specific
| amd64            // (0x0040) Assembly is AMD64-specific
| arm              // (0x0050) Assembly is ARM-specific (introduced in v4.5)
| windowsruntime   // (0x0200) Assembly is a Windows Runtime metadata stub
                   //          or managed app (introduced in v4.5)

and<assemblyRefDecl> ::=

| .ver<int32>:<int32>:<int32>:<int32>// Set version numbers
| .publickey= ( <bytes> )            // Set public encryption key
| .publickeytoken= ( <bytes> )       // Set public encryption key token
| .locale<quotedString>              // Set assembly locale (culture)
| .hash= ( <bytes> )                 // Set hash value
| <customAttrDecl>                    // Define custom attribute(s)

As you might have noticed, ILAsm does not provide a way to set the flags in the AssemblyRef declaration except flags specific to v4.5 or later. The explanation is simple: in older versions, the only flag relevant to an AssemblyRef is the flag indicating whether the AssemblyRef carries a full unhashed public encryption key, and this flag is set only when the .publickey directive is used.

When referencing a strong-named assembly, you are required to specify .publickeytoken (or .publickey, which is rarely used in AssemblyRefs) and .ver. The only exception to this rule among the strong-named assemblies is Mscorlib.dll.

If .locale is not specified, the referenced assembly is presumed to be “culture neutral.”

An interesting situation arises when you need to use two or more versions of the same assembly side by side. An assembly is identified by its name, version, public key (or public key token), and culture. It would be extremely cumbersome to list all these identifications every time you reference an assembly: “I want to call method Bar of class Foo from assembly SomeOtherAssembly, and I want the version number such-and-such, the culture nl-BE, and....” Of course, if you didn’t need to use different versions side by side, you could simply refer to an assembly by name.

ILAsm provides an AssemblyRef aliasing mechanism to deal with such situations. The AssemblyRef declaration can be extended as shown here:

.assembly extern<flags> <name> as<alias> { <assemblyRefDecl>*  }

Whenever you need to reference this assembly, you can use its <alias>, as shown in this example:

.assembly extern SomeOtherAssembly as OldSomeOther
{ .ver1:1:1:1 }
.assembly extern SomeOtherAssembly as NewSomeOther
{ .ver1:3:2:1 }
...
call int32[OldSomeOther]Foo::Bar(string)
...
call int32[NewSomeOther]Foo::Bar(string)
...

The alias is not part of metadata. Rather, it is simply a language tool, needed to identify a particular AssemblyRef among several identically named AssemblyRefs. The IL disassembler generates aliases for AssemblyRefs whenever it finds identically named AssemblyRefs in the module metadata.

Autodetection of Referenced Assemblies

Version 2.0 of the IL assembler introduced a way to reference the assemblies without specifying their version, public key token, and other attributes:

.assembly extern<name> as<alias> { auto}

When the keyword auto is specified, the ILAsm compiler queries the GAC and tries to find an assembly with the specified name. If it succeeds, it reads the assembly attributes (version, public key, culture) and puts these attributes into the generated AssemblyRef metadata record.

Note that the autodetection feature works only for referenced assemblies installed in the GAC.

The referenced assembly attributes may be partially specified and combined with autodetection, thus narrowing the search; for example:

.assembly extern OtherAssembly { .ver1:3:*:* auto}

The previous directive will prompt the IL assembler to query the GAC looking for an assembly named OtherAssembly with the major version number equal to 1 and the minor version number equal to 3 and with any build and revision numbers. If such assembly is found in the GAC, then its missing attributes are retrieved and put into the respective entries of the AssemblyRef record.

If more than one assembly matching the search criteria is found, the one with the highest version is taken.

In this regard, the IL assembler differs from other managed compilers (VB, C#, VC++), as those compilers require the specification of referenced assemblies via the file path instead of querying the GAC. This might play a bad trick on a programmer, because the CLR loader always tries to load the assemblies from the GAC first (as is described in the next section), and in the unlikely event of a mismatch between referenced assemblies installed in the GAC and those specified by the file path, the application will be executed against assemblies different from those it was built against.

The autodetection feature was introduced in version 2.0 of the IL assembler.

The Loader in Search of Assemblies

When you define an AssemblyRef in the metadata, you expect the loader to find exactly this assembly and load it into the application domain. Let’s have a look at the process of finding an external assembly and binding it to the referencing application.

Given an AssemblyRef, the process of binding to that assembly is influenced by these factors:

  • The application base (AppBase), which is a URL to the referencing application location (that is, to the directory in which your application is located). For executables, this is the directory containing the EXE file. For web applications, the AppBase is the root directory of the application as defined by the web server.
  • Version policies specified by the application, by the publisher of the shared assembly being referenced, or by the administrator.
  • Any additional search path information given in the application configuration file.
  • Any code base (CodeBase) locations provided in the configuration files by the application, the publisher, or the administrator. The CodeBase is a URL to the location of the referenced external assembly. There may be as many code bases as there are referenced assemblies.
  • Whether the reference is to a shared assembly with a strong name or to a private assembly. Strong-named assemblies are first sought in the GAC.

As illustrated in Figure 6-2, the loader performs the following steps to locate a referenced assembly.

9781430267614_Fig06-02.jpg

Figure 6-2. Searching for a referenced assembly

  1. Initiate the binding. Basically, this means taking the relevant AssemblyRef record from the metadata and seeing what it holds—its external assembly name, whether it is strong named, whether culture is specified, and so on.
  2. Apply the version policies, which are statements made by the application, by the publisher of the shared assembly being referenced, or by the administrator. These statements are contained in XML configuration files and simply redirect references to a particular version (or set of versions) of an assembly to a different version.

    The .NET Framework retrieves its configuration from a set of configuration files. Each file represents settings that have different scopes. For example, the configuration file supplied with the installation of the common language runtime has settings that can affect all applications that use that version of the CLR. The configuration file supplied with an application (application configuration file) has settings that affect only that one application; this configuration file resides in the application directory. A publisher policy file is supplied by the publisher of a shared assembly, and it contains information about the assembly compatibility and redirects an assembly reference to a new version of the shared component. A publisher policy file is usually issued when the shared component is updated by its publisher. The publisher policy settings take precedence over the settings of the application configuration file. The administrator policy file, Machine.config, resides in the Configuration subdirectory of the CLR installation directory. This file contains settings defined by the administrator for this machine and takes precedence over any other configuration file. Overrides specified in the Machine.config file affect all applications running on this machine and cannot be in turn overridden.

    Note that starting with v4.0, the machine-wide policies described here are not enforced by the CLR (see Chapter 17 for details).

  3. If the referenced assembly is strong named (in other words, the AssemblyRef contains non-null public key or public key token), then look up the assembly in the GAC. Otherwise, since weak-named assemblies cannot be installed in GAC, this step is skipped. If the assembly is found, which is the most common case, the search process is completed.
  4. Check the CodeBase. Now that the common language runtime knows which version of the assembly it is looking for, it begins the process of locating it. If the CodeBase has been supplied (in the same XML configuration file), it points the CLR directly at the executable to load; otherwise, the runtime needs to look in the AppBase (see the next step). If the executable specified by the CodeBase matches the assembly reference, the process of finding the assembly is complete, and the external assembly can be loaded. In fact, even if the executable specified by the CodeBase does not match the reference, the CLR stops searching. In this case, of course, the search is considered a failure, and no assembly load follows.
  5. Probe the AppBase. The probing involves consecutive searching in the directories defined by the AppBase, the private binary path (binpath) from the same XML configuration file, the culture of the referenced assembly, and its name. The AppBase plus directories specified in the binpath form a set of root directories: {<rootk>, k=1...N}. If the AssemblyRef specifies the culture, the search is performed in directories <rootk>/<culture> and then in <rootk>/<culture>/<name>; otherwise, the directories <rootk> and then <rootk>/<name> are searched. When searching for a private assembly, the process ignores the version numbers. If the assembly is not found by probing, the binding fails (see Figure 6-2).

In version 2.0 or later of the CLR running under a 64-bit operating system, the problems with assembly binding are exacerbated by the possible presence of both 32-bit and 64-bit versions of assemblies. To deal with the problem, the binding mechanism of the v2.0+ assembly loader uses the following classification of the assemblies:

  • Platform-agnostic assemblies can be executed in native unemulated mode on a 32-bit or 64-bit platform; they don’t contain any platform-specific details.
  • 32-bit specific assemblies can be executed natively on 32-bit platforms; on 64-bit platforms such assemblies require 32-bit emulation.
  • Itanium-specific assemblies can be executed natively on Intel Itanium platform and cannot be executed on any other platform.
  • X64-specific assemblies can be executed natively on an AMD/Intel X64 platform and cannot be executed on any other platform.

This classification is called Processor Architecture and is an additional part of full assembly identity in versions 2.0+. The Processor Architecture is derived from the Machine entry of the COFF header, the type of the Optional NT header, and the two least significant bits (flags ILONLY and 32BITREQUIRED) of the CLR header flags (see Chapter 4 for details):

  • Platform-agnostic assemblies have Machine = I386, 32-bit Optional header, and the two least significant bits of CLR header flags set to ILONLY (0x1).
  • 32-bit specific assemblies have the same Machine and Optional header and the two least significant bits of CLR header flags set to 32BITREQUIRED|ILONLY (0x3), 32BITREQUIRED (0x2), or 0.
  • Itanium-specific assemblies have Machine = IA64 and 64-bit Optional header; CLR header flags play no role.
  • X64-specific assemblies have Machine = AMD64 and 64-bit Optional header; CLR header flags play no role.

You should be careful declaring your assembly platform agnostic. To be truly platform agnostic, the assembly has to have no presumptions of pointer size, no unmanaged exports or imports, no embedded native code, and no thread-local storage (.tls section), and it has to reference no platform-specific assemblies or platform-specific unmanaged DLLs. The last condition is the worst of them all, because it is transitive. Many times developers have written an application (EXE) and declared it platform agnostic, only to discover that it crashed on 64-bit platforms: the application, being platform agnostic, created a 64-bit process and then tried to load a 32-bit specific referenced assembly into the 64-bit process. Kaboom! Or it tried to load a platform-agnostic assembly A, which in turn referenced assembly B, and B just happened to P/Invoke a 32-bit unmanaged DLL (see Chapter 18). Kaboom! The bright side of it is that such problems are usually discovered right away, not after the application has been shipped.

Versions 2.0 and later of the runtime consider all assemblies produced for versions 1.0 and 1.1 as 32-bit specific assemblies. It is only fair: versions 1.0 and 1.1 of the runtime did not support 64-bit platforms. The assemblies produced for versions 1.0 and 1.1 are identified by the metadata stream header (see Chapter 5); the version specified in this header is 1.0 for v1.0 and v1.1 assemblies and is 2.0 for v2.0+ assemblies.

Module Metadata Table and Declaration

The Module metadata table contains a single record that provides the identification of the current module. The column structure of the table is as follows:

  • Generation(2-byte unsigned integer): Used only at run time, in edit-and-continue mode.
  • Name(offset in the #Strings stream): The module name, which is the same as the name of the executable file with its extension but without a path. The length should not exceed 512 bytes in UTF-8 encoding, counting the zero terminator.
  • Mvid(offset in the #GUID stream): A globally unique identifier, assigned to the module as it is generated.
  • EncId(offset in the #GUID stream): Used only at run time, in edit-and-continue mode.
  • EncBaseId(offset in the #GUID stream): Used only at run time, in edit-and-continue mode.

Since only one entry of the Module record can be set explicitly (the Name entry), the module declaration in ILAsm is quite simple:

.module<name>

ModuleRef Metadata Table and Declaration

The ModuleRef metadata table contains descriptors of other modules referenced in the current module. The set of “other modules” includes both managed and unmanaged modules.

The relevant managed modules are the other modules of the current assembly. In ILAsm, they should be declared explicitly, and their declarations should be paired with File declarations (discussed in the following section). IL assembler does not verify whether the referenced modules are present at compile time.

The unmanaged modules described in the ModuleRef table are simply unmanaged DLLs containing methods called from the current module using the platform invocation mechanism—P/Invoke, discussed in Chapter 18. These ModuleRef records usually are not paired with File records. They need not be explicitly declared in ILAsm because in ILAsm the DLL name is part of the P/Invoke specification, so the IL assembler emits respective ModuleRef records automatically.

There is one reason, however, to pair a ModuleRef record referring to an unmanaged module with a File record: you should do that if you want this unmanaged DLL to be part of your deployment. In this case, the unmanaged DLL will reside together with managed modules constituting your assembly, and it does not have to be on the path to be discovered.

A ModuleRef record contains only one entry, the Name entry, which is an offset in the #Strings stream. The ModuleRef declaration in ILAsm is not much more sophisticated than the declaration of Module:

.module extern<name>

As in the case of Module, <name> in ModuleRef is the name of the executable file with its extension but without a path, not exceeding 512 bytes in UTF-8 encoding.

File Metadata Table and Declaration

The File metadata table describes other files of the same assembly that are referenced in the current module. In single-module assemblies, this table is empty (unless you want to specify unmanaged DLLs as part of your deployment, as described earlier). The table has the following column structure:

  • Flags(4-byte wide bitfield): Binary flags characterizing the file. This entry is mostly reserved for future use; the only flag currently defined is ContainsNoMetaData (0x00000001). This flag indicates that the file in question is not a managed PE file but rather a pure resource file.
  • Name(offset in the #Strings stream): The filename, subject to the same rules as the names in Module and ModuleRef. This is the only occurrence of data duplication in the metadata model: the File name matches the name used in the ModuleRef with which this File record is paired. However, since the names in both records are not physical strings but rather offsets in the string heap, the string data might not actually be duplicated; instead, both records might reference the same string in the heap. This doesn’t mean there is no data duplication: the offsets are definitely duplicated.
  • HashValue(offset in the #Blob stream): The blob representing the hash of the file, used to authenticate the files in a multifile assembly. Even in a strong-named assembly, the strong name signature resides only in the prime module and covers only the prime module. Nonprime modules in an assembly are authenticated by their hash values.

The File declaration in ILAsm is

.file<flag> <name>  .hash= ( <bytes> )

where<flag> ::=

<none>         // The file is a managed PE file
| nometadata    // The file is a pure resource file

If the hash value is not explicitly specified, the IL assembler finds the named file and computes the hash value using the hash algorithm specified in the Assembly declaration. If the file is not available at compile time, the HashValue entry of the respective File record is set to 0.

The File declaration can also carry the .entrypoint directive, as shown in this example:

.file MainClass.dll
  .hash= (01 02 03 04 05 06  ... )
  .entrypoint

This sort of File declaration can occur only in the prime module of a multimodule assembly and only when the entry point method is defined in a nonprime module of the assembly. This clause of the File declaration does not affect the metadata, but it puts the appropriate file token in the EntryPointToken entry of the common language runtime header. See Chapter 4 for details about EntryPointToken and the CLR header.

The prime module of an assembly, especially a runnable application (EXE), must have a valid token in the EntryPointToken field of the CLR header; and this token must be either a Method token, if the entry point method is defined in the prime module, or a File token. In the latter case, the loader loads the relevant module and inspects its common language runtime header, which must contain a valid Method token in the EntryPointToken field.

Managed Resource Metadata and Declaration

A resource is nonexecutable data that is logically deployed as part of an application. The data can take any number of forms such as strings, images, persisted objects, and so on. As Chapter 4 described, resources can be either managed or unmanaged (platform specific). These two kinds of resources have different formats and are accessed using managed and unmanaged APIs, respectively.

An application often must be customized for different cultures. A culture is a set of preferences based on a user’s language, sublanguage, and cultural conventions. In the .NET Framework, the culture is described by the CultureInfo class from the .NET Framework class library. A culture is used to customize operations such as formatting dates and numbers, sorting strings, and so on.

You might also need to customize an application for different countries or regions. A region defines a set of standards for a particular country or region of the world. In the .NET Framework, the class library describes a region using the RegionInfo class. A region is used to customize operations such as formatting currency symbols.

Localization of an application is the process of connecting the application’s executable code with the application’s resources that have been customized for specific cultures. Although a culture and a region together constitute a locale, localization is not concerned with customizing an application to a specific region. The .NET Framework and the common language runtime do not support the localization of component metadata, instead relying solely on the managed resources for this task.

The .NET Framework uses a hub-and-spoke model for packaging and deploying resources. The hub is the main assembly, which contains the executable code and the resources for a single culture (referred to as the neutral culture). The neutral culture is the fallback culture for the application. Each spoke connects to a satellite assembly that contains the resources for a single culture. Satellite assemblies do not contain code.

The advantages of this model are obvious. First, resources for new cultures can be added incrementally after an application is deployed. Second, an application needs to load only those satellite assemblies that contain the resources needed for a particular run.

The resources used in or exposed by an assembly can reside in one of the following locations:

  • In separate resource file(s) in the same assembly. Each resource file can contain one or more resources. The metadata descriptors of such files carry the nometadata flag.
  • Embedded in managed modules of the same assembly.
  • In another (external) assembly.

The resource data is not directly used or validated by the deployment subsystem or the loader, so it can be of any kind.

All resource data embedded in a managed PE file resides in a contiguous block inside the .text section. The Resources data directory in the CLR header provides the RVA and size of embedded managed resources. Each individual resource is preceded by a 4-byte unsigned integer holding the resource’s length in bytes. Figure 6-3 shows the layout of embedded managed resources.

9781430267614_Fig06-03.jpg

Figure 6-3. The layout of embedded managed resources

The ManifestResource metadata table, describing the managed resources, has the following column structure:

  • Offset(4-byte unsigned integer): Location of the resource within the managed resource segment to which the Resources data directory of the CLR header points. This is not an RVA; rather, it is an offset within the managed resource segment.
  • Flags(4-byte wide bitfield): Binary flags indicating whether the managed resource is public (accessible from outside the assembly) or private (accessible from within the current assembly only).
  • Name(offset in the #Strings stream): Nonempty name of the resource, unique within the assembly.
  • Implementation(coded token of typeImplementation): Token of the respective AssemblyRef record if the resource resides in another assembly or of the respective File record if the resource resides in another file of the current assembly. If the resource is embedded in the current module, this entry is set to 0. If the resource is imported from another assembly, the offset need not be specified; the loader will ignore it.

ILAsm syntax for the declaration of a managed resource is

.mresource<flag> <name> { <mResourceDecl>* }

where<flag> ::= public| private and <mResourceDecl> ::=

.assembly extern<alias>     //  Resource is imported from another
                              // assembly
| .file<name> at<int32>    //  Resource resides in another
                              //  file of this assembly;
                              //  <int32> is the offset
| <customAttrDecl>           //  Define custom attribute for this resource

The default flag value is private.

The directives .assembly extern and .file in the context of a managed resource declaration refer to the resource’s Implementation entry and are mutually exclusive. If Implementation references the AssemblyRef or File before it has been declared, the ILAsm compiler will diagnose an error.

If the Implementation entry is empty, the resource is presumed embedded in the current module. In this case, the IL assembler creates the PE file, loads the resource from the file according to the resource’s name, and writes it into the .text section of the PE file, automatically setting the Offset entry of the ManifestResource record. When the IL disassembler disassembles a PE file into a text file, the embedded managed resources are saved into binary files named after these resources, which allows the IL assembler to easily pick them up if the PE file needs to be reassembled.

There is a little catch there: names of managed resources may contain characters inappropriate for filenames. In such cases, the managed resources cannot be saved under their true names; on the other hand, you cannot change the resource names, because the resources are addressed by these names in the application. To deal with this situation, version 2.0 of ILAsm introduced aliasing of the managed resources similar to aliasing of referenced assemblies:

.mresource<flag> <name> as<filename> { <mResourceDecl>* }

The previous directive prompts the IL assembler to load the resource from file <filename> and create the respective ManifestResource metadata record with name <name>. The IL disassembler v2.0+, when saving the managed resources to files, analyses the names of the resources and if it finds colon, semicolon, comma, or backslash characters, it creates an alias for the resource, replacing these characters with exclamation mark, commercial “at” (@), ampersand (&), and currency sign ($), respectively. Then the resource is saved in the alias-named file.

ILAsm does not offer any language constructs to address the managed resources because IL lacks the means to do so. Managed APIs provided by the .NET Framework class library—specifically, the System.Resources.ResourceManager class—are used to load and manipulate managed resources.

ExportedType Metadata Table and Declaration

The ExportedType metadata table contains information about the public classes (visible outside the assembly) that are declared in nonprime modules of the assembly. Only the prime module’s manifest can carry this table.

This table is needed because the loader expects the prime module of an assembly to hold information about all classes exported by the assembly. The union of the classes defined in the prime module and those in the ExportedType table gives the loader the full picture.

On the other hand, the intersection of the classes defined in the prime module and those in the ExportedType table must be nil. As a result, the ExportedType table can be nonempty only in the prime module of a multimodule assembly: if there are no nonprime modules, then all classes defined by this assembly reside in the prime module itself.

In versions 2.0+, the ExportedType table serves an additional function: it contains so-called class forwarders, which are close conceptually to reexports in the unmanaged world or a postal address forwarding in everyday life. A forwarder indicates to which assembly class such-and-such (which used to reside in this assembly) has been moved. The forwarding mechanism, obviously, allows you to refactor your multiassembly product without the need for all your customers to rebuild their applications.

The ExportedType table has the following column structure:

  • Flags(4-byte wide bitfield): Binary flags indicating whether the exported type is a forwarder (forwarder) and the accessibility of the exported type. The accessibility flags we are interested in are public and nested public; other accessibility flags—identical to the class accessibility flags discussed in Chapter 7—are syntactically admissible but are not used to define true exported types. Other flags can be present in pseudo-ExportedTypes only, which the loader can use to resolve unscoped type references in multimodule assemblies.

    Some explanation is in order. Any time a type (class) is referenced in a module, the resolution scope should be provided to indicate where the referenced class is defined (in the current module, in another module of this assembly, or in another assembly). If the resolution scope is not provided, the referenced type should be declared in the current module. However, if this type cannot be found in the module referencing it and if the manifest of the prime module carries an identically named pseudo-ExportedType record indicating where the type is actually defined, the loader is nevertheless able to resolve the type reference. None of the current Microsoft managed compilers, excluding the IL assembler, uses this rather bizarre technique. The IL assembler has to be able to, for obvious reasons.

  • TypeDefId(4-byte unsigned integer): An uncoded token referring to a record of the TypeDef table of the module where the exported class is defined. This is the only occasion in the entire metadata model in which a module’s metadata contains an explicit value of a metadata token from another module. This token is used as something of a hint for the loader and can be omitted without any ill effects. If the token is supplied, the loader retrieves the specific TypeDef record from the respective module’s metadata and checks the full name of ExportedType against the full name of TypeDef. If the names match, the loader has found the class it was looking for; if the names do not match or if the token was not supplied in the first place, the loader finds the needed TypeDef by its full name. My advice: never specify a TypeDefId token explicitly when programming in ILAsm. This shortcut works only for automatic tools such as the Assembly Linker (AL) and only under certain circumstances.
  • TypeName(offset in the #Strings stream): Exported type’s name; must be nonempty.
  • TypeNamespace(offset in the #Strings stream): Exported type’s namespace; can be empty. Class names and namespaces are discussed in Chapter 7.
  • Implementation(coded token of typeImplementation): Token of the File record indicating the file of the assembly where the exported class is defined or the token of another ExportedType, if the current one is nested in another one. The forwarders have AssemblyRef tokens as Implementation, which, in my humble opinion, makes the forwarder flag redundant: the forwarding nature of an exported type can be deduced from its Implementation being an AssemblyRef.

The exported types are declared in ILAsm as

.class extern<flag> <namespace>.<name> { <expTypeDecl> * }

where<flag> ::= public | nested public | forwarder and where <expTypeDecl> ::=

.file<name>    // File where exported class is defined
| .class extern<namespace>.<name>//  Enclosing exported type
| .class<int32>// Set TypeDefId explicitly (don't do that!)
| .assembly extern<name>// Forwarder
| <customAttrDecl>//  Define custom attribute for this ExportedType

The directives .assembly extern, .file, and .class extern define the Implementation entry and are mutually exclusive. As in the case of the .mresource declaration, respective AssemblyRef, File, or ExportedType must be declared before being referenced by the Implementation entry.

It is fairly obvious that if Implementation is specified as .class extern, we are dealing with a nested exported type, and Flags must be set to nested public. Inversely, if Implementation is specified as .file, we are dealing with a top-level unnested class, and Flags must be set to public.

Order of Manifest Declarations in ILAsm

The general rule in ILAsm (and not only in ILAsm) is “declare, then reference.” In other words, it’s always safer, and in some cases outright required, to declare a metadata item before referencing it. There are times when you can reference a yet-undeclared item, such as calling a method that is defined later in the source code. But you cannot do this in the manifest declarations.

If you reexamine Figure 6-1, which illustrates the mutual references between the manifest metadata tables, you can discern the following list of dependencies:

  • Exported types reference external assemblies, files, and enclosing exported types.
  • Manifest resources reference files and external assemblies.
  • Every manifest item can have associated custom attributes, and custom attributes reference external assemblies and (rarely) external modules. (See Chapter 16 for details.)

To comply with the “declare, then reference” rule, the following sequence of declarations is recommended for ILAsm programs, with the manifest declarations preceding all other declarations in the source code:

  1. AssemblyRef declarations (.assembly extern), because of the custom attributes. The reference to the assembly Mscorlib should lead the pack because most custom attributes reference this assembly.
  2. ModuleRef declarations (.module extern), again because of the custom attributes.
  3. Assembly declaration (.assembly). The ILAsm compiler takes different paths in compiling Mscorlib.dll and compiling other assemblies, so it is better to let it know which path to take as soon as possible. In versions 2.0 and later you can also use special keyword .mscorlib indicating that you are compiling Mscorlib.dll. This keyword is best placed at the beginning of the program. However, this is less important if you are not compiling Mscorlib.dll; by default the compiler assumes that it is compiling a “conventional” module.
  4. File declarations (.file) because ExportedType and ManifestResource declarations might reference them.
  5. ExportedType declarations (.class extern), with enclosing ExportedType declarations preceding the nested ExportedType declarations.
  6. ManifestResource declarations (.mresource).

Remember that only the manifests of prime modules carry Assembly and ExportedType declarations.

Single-Module and Multimodule Assemblies

A single-module assembly consists of a sole prime module. Manifests of single-module assemblies as a rule carry neither File nor ExportedType tables: there are no other files to declare, and all types are defined in the prime module. However, you might want to declare a File record for an unmanaged DLL you want to be part of the deployment, or your single-module assembly might use type forwarding via the ExportedType table.

The advantages of single-module assemblies include lower overhead, easier deployment, and slightly greater security. Overhead is lower because only one set of headers and metadata tables must be read, transmitted, and analyzed. Assembly deployment is simpler because only one PE file must be deployed. And the level of security can be slightly higher because the prime module of the assembly can be protected with a strong name signature, which is extremely difficult to counterfeit and virtually guarantees the authenticity of the prime module. Nonprime modules are authenticated only by their hash values (referenced in File records of the prime module) and are theoretically easier to spoof.

Manifests of the modules of a multimodule assembly carry File tables, and the manifest of the prime module of such an assembly might or might not carry ExportedType tables, depending on whether any public types are defined in nonprime modules.

The advantages of multimodule assemblies include easier development and...lower overhead. (No, I am not pulling your leg.) Both advantages stem from the obvious modularity of the multimodule assemblies.

Multimodule assemblies are easier to develop because if you distribute the functionality among the modules well, you can develop the modules independently and then incrementally add to the assembly. (I didn’t say that a multimodule assembly was easier to design.)

Lower overhead at run time results from the way the loader operates: it loads the modules only when they are referenced. So if only part of your assembly’s functionality is engaged in a certain execution session, only part of the modules constituting your assembly might be loaded. Of course, you cannot count on any such effect if the functionality is spread all over the modules and if classes defined in different modules cross-reference each other.

A well-known technique for building a multimodule assembly from a set of modules is based on a “spokesperson” approach: the modules are analyzed, and an additional prime module is created, carrying nothing but the manifest and (maybe) a strong name signature. Such a prime module carries no functionality or positive definitions of its own whatsoever; it is only a front for functional modules, a “spokesperson” dealing with the loader on behalf of the functional modules. The Assembly Linker tool, distributed with the .NET Framework, uses this technique to build multimodule assemblies from sets of nonprime modules.

Summary of Metadata Validity Rules

In this section, I’ll summarize the validity rules for metadata contained in a manifest. Since some of these rules have a direct bearing on how the loader functions, the respective checks are performed at run time. Other rules describe “well-formed” metadata; violating one of these rules might result in rather peculiar effects during the program execution, but it does not represent a crash or security breach hazard, so the loader does not perform these checks. You can find the complete set of metadata validity rules in Partition II of the ECMA/ISO standard; the sections that follow here review the most important of them.

ILAsm does allow you to generate invalid metadata. Thus, it’s extremely important to carefully check your modules after compilation.

To find out whether any of the metadata in a module is invalid, you can run the PEVerify utility, included in the .NET Framework SDK, using the option /MD (metadata validation). Alternatively, you can invoke the IL disassembler. Choose View, MetaInfo, and Validate, and then press Ctrl+M. Both utilities use the Metadata Validator (MDValidator), which is built into the common language runtime.

Assembly Table Validity Rules

  • The record count of the table must be no more than 1. This is not checked at run time because the loader ignores all Assembly records except the first one. (I will mark all metadata validity rules checked by the loader with a “[run time]” label.)
  • The Flags entry must have bits set only as defined in the CorAssemblyFlags enumeration in CorHdr.h. For the version 2.0 of the common language runtime, the valid mask is 0xC101, and only one bit (0x0100, retargetable) can be specified explicitly. For the version 4.5, flags windowsruntime (0x0200) and one of the platform flags – cil (0x0010), x86 (0x0020), ia64 (0x0030), amd64 (0x0040), or arm (0x0050) – can be specified explicitly; if any of the platform flags is specified, bit 0x0080 is set as well.
  • The Locale entry must be set to 0 or must refer to a nonempty string in the string heap that matches a known culture name. You can obtain a list of known culture names by using a call to the CultureInfo.GetCultures method, from the .NET Framework class library.
  • [run time] If Locale is not set to 0, the referenced string must be no longer than 1,023 characters plus the zero terminator.
  • [run time] The Name entry must refer to a nonempty string in the string heap. The name must be the module filename excluding the extension, the path, and the drive letter.
  • [run time] The PublicKey entry must be set to 0 or must contain a valid offset in the #Blob stream.

AssemblyRef Table Validity Rules

  • For versions older than 4.5, the Flags entry can have only the least significant bit set (corresponding to the afPublicKey value; see the CorAssemblyFlags enumeration in CorHdr.h). For the version 4.5, flags windowsruntime (0x0200) and one of the platform flags – cil (0x0010), x86 (0x0020), ia64 (0x0030), amd64 (0x0040), or arm (0x0050) – can be specified explicitly; if any of the platform flags is specified, bit 0x0080 is set as well.
  • [run time] The PublicKeyOrToken entry must be set to 0 or must contain a valid offset in the #Blob stream.
  • The Locale entry must comply with the same rules as the Locale entry of the Assembly table (discussed in the preceding section).
  • The table must not have duplicate records with simultaneously matching Name, Locale, PublicKeyOrToken, and all Version entries.
  • [run time] The Name entry must refer to a nonempty string in the string heap. The name must be the prime module filename excluding the extension, the path, and the drive letter.

Module Table Validity Rules

  • The record count of the table must be exactly 1. This is not checked at run time because the loader uses the first Module record and ignores the others.
  • [run time] The Name entry must refer to a nonempty string in the string heap, no longer than 511 characters plus the zero terminator. The name must be the module filename including the extension and excluding the path and the drive letter.
  • The Mvid entry must refer to a nonzero GUID in the #GUID stream. The value of the Mvid entry is generated automatically and cannot be specified explicitly in ILAsm.

ModuleRef Table Validity Rules

  • [run time] The Name entry must refer to a nonempty string in the string heap, no longer than 511 characters plus the zero terminator. The name must be a filename including the extension and excluding the path and the drive letter.

File Table Validity Rules

  • The Flags entry can have only the least significant bit set (corresponding to the ffContainsNoMetaData value; see the CorFileFlags enumeration in CorHdr.h).
  • [run time] The Name entry must refer to a nonempty string in the string heap, no longer than 511 characters plus the zero terminator. The name must be a filename including the extension and excluding the path and the drive letter.
  • [run time] The string referenced by the Name entry must not match S[N][[C]*], where
    S ::= con | aux | lpt | prn | nul | com
    N ::= 0..9
    C ::= $ | :
  • [run time] The HashValue entry must hold a valid offset in the #Blob stream.
  • The table must not contain duplicate records whose Name entries refer to matching strings.
  • The table must not contain duplicate records whose Name entries refer to strings matching this module’s name.

ManifestResource Table Validity Rules

  • [run time] The Implementation entry must be set to 0 or must hold a valid AssemblyRef or File token.
  • [run time] If the Implementation entry does not hold an AssemblyRef token, the Offset entry must hold a valid offset within limits specified by the Resources data directory of the common language runtime header of the target file (if the target file is not a pure-resource file with no metadata).
  • [run time] The Flags entry must hold either 1 or 2—mrPublic or mrPrivate, respectively.
  • [run time] The Name entry must refer to a nonempty string in the string heap.
  • The table must not contain duplicate records whose Name entries refer to matching strings.

ExportedType Table Validity Rules

  • There must be no rows with TypeName and TypeNamespace matching Name and Namespace, respectively, of any row of the TypeDef table.
  • The Flags entry must hold either one of the visibility flags (0x0–0x7) of the enumeration CorTypeAttr (see CorHdr.h) or a forwarder flag (0x00200000).
  • [run time] The Implementation entry must hold a valid ExportedType or File or AssemblyRef token. In the last case, the forwarder flag must be set.
  • [run time] The Implementation entry must not hold an ExportedType token pointing to this record.
  • If the Implementation entry holds an ExportedType token, the Flags entry must hold a nested visibility value in the range 2–7.
  • If the Implementation entry holds a File token, the Flags entry must hold the tdNonPublic or tdPublic visibility value (0 or 1).
  • [run time] The TypeName entry must refer to a nonempty string in the string heap.
  • [run time] The TypeNamespace entry must be set to 0 or must refer to a nonempty string in the string heap.
  • [run time] The combined length of the strings referenced by TypeName and TypeNamespace must not exceed 1022 bytes in UTF-8 encoding.
  • The table must not contain duplicate records whose Implementation entry holds a File or AssemblyRef token and whose TypeName and TypeNamespace entries refer to matching strings.
  • The table must not contain duplicate records whose Implementation entries hold the same ExportedType token and whose TypeName entries refer to matching strings.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.22.171.136