PE File Format and Metadata Validation

Any assembly is heavily scrutinized before any contained code makes it to execution. Not only must assemblies have the necessary permissions to access resources, they must also pass a set of rigorous checks to successfully run. One set of these checks is referred to as validation. Validation encompasses three major sets of tests:

  • PE file format validation

  • Metadata validation

  • IL validation

These checks ensure that IL verification can be carried out successfully and, ultimately, assemblies can be run within strictly enforced security constrains. All sets of tests introduced in this chapter, including PE file format validation checks, need to be met for an assembly to run in a safe manner on the Common Language Runtime. PE file format validation and metadata validation present the outer perimeter of the security apparatus protecting your machine from malicious managed code. IL validation will be explained in the “IL Validation and Verification” section later in this chapter.

PE File Format Validation

Compilers emitting managed code generate assemblies in the so-called PE/COFF file format. This a file format that has been specified for executable code on Windows platforms. Both managed and native code DLL and EXE files compiled to run on Windows platforms follow this standard. The following are a few of the typical ingredients of PE/COFF files:

  • Standard entry point— The entry point invoked by Windows for all native code executables (and for managed PE files on systems before Windows XP).

  • Authenticode signature slot— Standard location for the authenticode signature if the PE file has been signed with a publisher certificate. Both native and managed PE files can be authenticode signed.

  • Standardized resource location— The PE file section where resources of the executable are kept.

  • A list of explicit library dependencies— If a PE file depends explicitly on other PE files, that dependency is noted in the dependencies list.

  • PE directory slots— The PE/COFF standard provides a limited number of slots that can be used to customize the PE/COFF file format. Slots are reserved for various types of executables. A slot holds an address pointing into the PE file where custom information (usually a custom header) can be found.

Assemblies represent a customized type of PE/COFF files—slot 14 (decimal) in the PE directory has been reserved for managed executables. Thus, whenever that slot entry is not null, you can assume to be dealing with a managed image. In turn, this slot points to a “header” containing managed executable specific information, such as the location of metadata in the PE file or the location of the strong name signature slot.

TIP

If you want to find out exactly how the assembly header for managed PE files is defined, look at the CorHdr.h include file.

This file can be found in the Include subdirectory of the FrameworkSDK directory. This file is only shipped in the .Net Framework SDK, so you will not find it on regular redistribution installations of the .NET Framework.

If you have Visual Studio installed on your system, the file can be found at %Install Drive%Program FilesMicrosoft Visual Studio .NetFrameworkIncludecorhdr.h.

%Install Drive% here stands for the drive to which you have installed Visual Studio.


All managed PE files have an explicit dependency included for the mscoree.dll library—the code that has the responsibility of starting up the Common Language Runtime and provides the CLR with the header information of the managed PE file to be run.

You may now ask how mscoree.dll ever actually gets going when managed code is run in the operating system shell (when the shell acts as the host of the CLR). In versions of Windows before Windows XP, the operating system had no notion of managed PE files, so the managed PE executables contain a call to mscoree in the standard PE entry point. On the other hand, Windows XP (by looking for an entry in the PE directory slot 14) recognizes PE images representing assemblies. When Windows XP finds a managed PE executable, it calls mscoree itself, thereby ignoring the jump instruction to mscoree found in the standard PE header.

This difference between Windows XP and earlier versions of the operating system is significant. Some piece of unmanaged code is always executed (the standard PE entry point) on non-Windows XP platforms when running a managed image. Since this is the jump instruction that is supposed to start up the Common Language Runtime when managed code is hosted in the Windows shell, the CLR is not yet running to check that very PE file entry point for being well formed and pointing to mscoree.

This fact was exploited by the so called “Donut virus” that reared its head prior to the Visual Studio.NET release. This virus replaces the entry point jump instruction to point to its own code. Obviously, Windows XP machines were not affected because, as previously stated, Windows XP ignores the standard PE entry point when running a managed image in the Windows XP shell. However, earlier versions of Windows that need the entry stub to spin up the Common Language Runtime were affected. In a way, the virus had very little to do with CLR security specifically; the virus writer simply replaced a PE file entry point, and the CLR did not even need to be installed on the system. The virus thus differed little from virus corruption of any other type of PE image. However, with the support of the operating system loader in conjunction with the CLR security infrastructure, managed images can give much stronger security guarantees than native code, excluding even such PE file corruptions.

CAUTION

If you are planning to write your own host of the Common Language Runtime, be sure to either check the validity of the standard PE entry point, as jumping into mscoree, or calling mscoree yourself as the Windows XP shell host does.


It is highly recommended that you upgrade to Windows XP where possible. This will pre-empt any attempts to misuse the standard PE entry point when running managed applications in the operating system shell.

After the Common Language Runtime is spun up, it validates all major parts of the managed PE file before their use. Basically, the Common Language Runtime makes sure that all addresses, such as the one pointing to the metadata in the PE file, are not pointing outside of the PE file. This could allow malicious code to write into, or read out of, arbitrary memory locations. If such checks were not performed, malicious code could snoop around and modify the memory of other components, such the CLR itself. That would pose a clear danger to security, because assembly isolation is broken and current security policy as held in memory may be subject to being modified by malicious code.

Metadata Validation

Metadata can be seen as a thorough description of the identity and type structure that defines an assembly. The following categories of information are contained in an assembly's metadata:

  • Information about the types in an assembly, including the following:

    • Their methods, including method signature, name, return types, calling convention (such as virtual, abstract, instance, and so on) and accessibility

    • Their fields, including field name, field type, accessibility, and whether the fields are static or instance fields

    • Their inheritance and dependency in relation to other types, as well as their interface implementation relation

    • Their accessibility (public, private, family access) and other modifiers, such as being sealed or abstract

  • A list of other files belonging to the assembly (assemblies can consist of multiple modules)

  • Identity of the assembly, including the following:

    • The name of the assembly (such as Foo)

    • The version of the assembly

    • The culture of the assembly

    • The strong name digital signature, in the form of the public key and the signature value (if present, not all assemblies must be signed by a strong name)

  • Any declarative security annotations, such as an assembly's required minimum set of permissions

Therefore, metadata offers a rich set of information detailing both the way an assembly's types are defined as well as information about the assembly's identity. Metadata is a mandatory constituent of assemblies. Therefore, compilers emitting managed code must create the metadata describing the implementation they are compiling into IL. A managed image will not execute on the CLR if it lacks metadata detailing its identity and implementation.

TIP

You can programmatically browse through most of the metadata of an assembly. The process of programmatically accessing metadata and invoking members is called reflection. The classes provided in the System.Reflection namespace allow for easy access to an assembly's metadata. The following is a little sample program that gets the full name of an assembly from the metadata of an assembly, and then displays the name of all the types in that assembly:

          using System;
          using System.Reflection;
          using System.IO;
          using System.Collections;
class ReflectionTest
              {

        static void Main()
        {
        Console.Write("Type in a path to a file you want to reflect over:");
            Assembly asmbl = Assembly.LoadFrom(Console.ReadLine());
            //get and show full name of assembly
            Console.WriteLine("Full Name of assembly:"+asmbl.FullName);
            //get types of assembly
            Type[] types = asmbl.GetTypes();
            IEnumerator typeenum = types.GetEnumerator();
            //show names of the types in assembly
            Console.WriteLine("Types in Assembly:");
            while(typeenum.MoveNext())
                Console.WriteLine(((Type)typeenum.Current).FullName);
        }
    }

To compile and execute the assembly, save the previous code into a file called reflectiontest.cs. Compile the code by using the C# command line compiler—csc reflectiontest.cs. You can now run the sample program; simply type in reflectiontest at the command line and, when prompted, type in the full path and name of any assembly (such as reflectiontest itself).


Metadata is persisted in assemblies in the form of a set of cross-referenced tables. For example, there is a table that contains all the types in that assembly. Each row in this table points to a table containing further information about a specific type, such as a pointer to a table containing all of the types' methods. This elaborate network of tables describing an assembly is filled in by compilers when creating an assembly. Let's look at an example. Suppose a class of the following structure is defined in an assembly foo.exe:

public class bar
{       int i;
       float j;
       void x() { ...} ;
       int z() { ...} ;
}

foo's metadata contains a table for the types defined in foo, fields of types, and methods defined in types, among other things. Figure 11.2 roughly shows how bar will show up in foo's metadata:

Figure 11.2. Approximate metadata table entries for class bar in foo's metadata.


As you can see, the structure of the class bar (its fields, scope, methods, and so on) is recorded accurately in interconnected metadata tables. Overall, there are over three dozen different kinds of table in metadata.

NOTE

For a complete and thorough definition of all the metadata tables, their structure, and layout, please see the ECMA standards specification about metadata (Partition II) at http://msdn.microsoft.com/net/ecma.


Metadata is used by the Common Language Runtime to access and load types, find the managed entry point to an assembly, ascertain the strong name identity of an assembly, and many other crucial functions involved in loading and executing managed code. It is information used throughout the Common Language Runtime infrastructure. As part of the CLR process of loading and accessing assembly information, metadata is checked for corruption.

NOTE

There are two types of metadata validation checks that the Common Language Runtime implements—structural metadata validation and semantic metadata validation.

Structural metadata validation refers to checks that ascertain that the metadata table structure is being adhered to. Returning to our previous example, there are metadata checks at runtime that make sure the pointer to the Fields table in the Type Definitions table, actually points into a field table—not into another table or outside of the PE image itself.

Semantic metadata validation checks are based on rules of the Common Language Runtime type system itself. Certain invariants about types and their interrelationships ought not to be broken. Therefore, such tests are not concerned with the layout of metadata in tables but whether what a specific layout represents still honors certain rules about types. An example of such a rule is the test that checks the inheritance chain of classes for circularities—preventing any subclass X of a class A to be A's superclass or a superclass of a class from which A derives.


All metadatachecks are done at runtime and preemptively—catching erroneous metadata before it is used by the Common Language Runtime infrastructure. In particular, metadata checks occur before IL verification. Let us now look at the verification checks.

NOTE

To review all the metadata tests the CLR implements, please see the ECMA Metadata Standard, Partition II, at http://msdn.microsoft.com/net/ecma.


..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.222.162.216