MSIL in Depth

Here are some basic facts about MSIL programming. The content of an MSIL program is case-sensitive. MSIL is also a free-form language. Statements can span multiple lines of code, with lines broken at the white space. Statements are not terminated with a semicolon. Comments are the same as in the C# language. Double slashes (//) are used for single-line comments, and "/* comment */" is used for multiline comments. Code labels are colon-terminated and reference the next instruction. Code labels must be unique within the scope of the label in question.

In addition to the evaluation stack, the other important elements of an MSIL application are directives and the actual MSIL source code. Directives are dot-prefixed and are the declarations of the MSIL program. Source code is the executable content and flow control of the application.

Directives

There are several categories of directives. Assembly, class, and method directives are the most prominent. Assembly directives contain information that the compiler emits to the manifest, which is metadata pertaining to the overall assembly. Class directives define classes and the members of the class. This information is emitted as standard metadata, which is data about types. Method directives define the particulars of a method, such as any local variables and the size of the evaluation stack.

Assembly Directives

This section lists common assembly directives.

.assembly

The .assembly directive defines the simple name of the assembly. The simple name does not include the extension. Assembly probing will discover the correct extension. Adding the extension will cause normal probing to fail. A binding exception will occur when the assembly is referenced.

Here is the syntax of the .assembly directive:

.assembly name { block }

The assembly block contains additional directives that further describe the assembly. These directives are optional. You need to provide only enough directives to identify the assembly uniquely. Here is an assembly block with additional details:

.assembly Hello {
    .ver 1:0:0:0
    .locale "en.US"
}

These are some of the directives available in the assembly block:

  • .ver The four-part version number of the assembly

  • .publickey The 8-byte public key token of the public/private key pair used to encrypt the hash of the assembly

  • .locale The language and culture of the assembly

  • .custom Custom attributes of the assembly

.assembly extern

The .assembly extern directive references an external assembly. The public types and methods of the referenced assembly are available to the current assembly.

Here is the syntax of the .assembly extern directive:

.assembly extern name as aliasname {block}

The as clause is optional. Use this clause to reference an assembly with the same name that has a different version, public key, or culture.

Add the .ver, .publickey, .locale, and .custom directives to the assembly extern block to refine the identification of that assembly.

Because of the importance of mscorlib.dll, the ILASM compiler automatically includes an external reference to that library. Therefore, adding an .assembly extern mscorlib directive is purely informative.

.file

The .file directive adds a file to the manifest of the assembly. This is useful for associating documents, such as a readme file, with an assembly.

Here is the syntax of the .file directive:

.file nometadata filename .hash = (bytes) .entrypoint

The file name is the sole required element of the declaration. Nometadata is the primary option and stipulates that the file is unmanaged. Here is an example:

.file nometadata documentation.txt

.subsystem

The .subsystem directive indicates the subsystem used by the application, such as the graphical user interface (GUI) or console subsystem. This is distinct from the target type of the application, which is an executable, library, module, or other type. The ILASM compiler inserts this directive based on options specified when the application is compiled. You also can add this directive explicitly.

Here is the syntax of the .subsystem directive:

.subsystem number

Number is a 32-bit integer in which 2 is a GUI application and 3 is a console application.

.corflags

The .corflags directive sets the runtime flag in the Common Language Infrastructure (CLI) header. This defaults to 1, which stipulates an IL-only assembly. The corflags tool, introduced in .NET 2.0, allows the configuration of this flag.

Here is the syntax of the .corflags directive:

.corflags flag

where flag is a 32-bit integer.

.stackreserve

The .stackreserve directive sets the stack size. The default size is 0x00100000. The following code calls MethodA recursively. Without the .stackreserve directive, which defaults to 0x00100000, the MethodA method is called recursively more than 110,000 times before exhausting the stack. If you set the stack size to 0x0001000 using the .stackreserve directive, MethodA is called only about 21,000 times before quitting. Although the results may vary on your actual computer, the relative values are consistent:

.assembly recursive {}
.imagebase 0x00800000
.stackreserve 0x00001000

.namespace Donis.CSharpBook {
    .class Starter {
        .method static public void Main() il managed {
            .entrypoint
            ldc.i4.0
            call void Donis.CSharpBook.Starter::MethodA(int32)
            ret
        }

        .method static public void MethodA(int32) il managed {
            ldarg.0
            ldc.i4.1
            add
            dup
            call void [mscorlib] System.Console::WriteLine(int32)
            call void Donis.CSharpBook.Starter::MethodA(int32)
            ret
        }
    }
}

.imagebase

The .imagebase directive sets the base address where the application is loaded. The default is 0x00400000. The load address of the application image and the stack size are confirmable using the dumpbin tool. For example, see the following code:

dumpbin /headers recursive.exe >recursive.txt

Class Directives

This section describes the important class directives.

.class header {members}

The .class directive defines a new reference, value, or interface type.

Here is the syntax of the header portion of the .class directive:

attributes classname extends basetype implements interfaces

There are a variety of attributes. Here is a short list of the most common of these:

  • abstract. The type is abstract, and instances cannot be created.

  • ansi and Unicode. Strings can be marshaled in American National Standards Institute (ANSI) format or in UNICODE format.

  • autoThe memory layout of fields is controlled by the CLR.

  • beforefieldinit. The type should be initialized before a static field is accessed.

  • private and public. Sets the visibility of the class outside of the assembly.

  • sealed. The class cannot be inherited.

  • serializable. The contents of the class can be serialized.

The extends option is used if the type inherits from another type. .NET supports only single class inheritance. The extends option is optional. If that option is not present, the type inherits implicitly from System.Object.

The implements option lists the interfaces implemented by the type. The implements clause is optional, and there are no default interfaces. The list of interfaces is comma-delimited.

In the members block, members are declared with the appropriate directive: .method, .field, .property, and so on.

.custom constructorsignature

The .custom directive adds a custom attribute to the type.

.method

The .method directive defines a method. C# does not support global methods. Therefore, in MSIL that is derived from C# source code, the .method directive always is included within a type.

Here is the syntax of the .method directive:

.method attributes callingconv return methodname arguments implattributes {methodbody }

The method attributes are varied, including the accessibility attributes: public, private, family, and others. The default is private. Static methods have the static attribute, whereas instance methods have the instance attribute. The default is an instance method.

Here are additional attributes:

  • final. The method cannot be overridden (it is sealed).

  • virtual. The method is virtual.

  • hidebysig. Hides the base class interface of this method. This flag is used only by the source language compiler.

  • newslot. Creates a new entry in the vtable for this method. This method does not override the same method in the base class. For example, this option is used with the add_Event and remove_Event methods of an event.

  • abstract. The method has no implementation and is assumed to be implemented in a descendant.

  • specialname. The method is special, such as get_Property and set_Property methods. These methods are treated in a special way by tools.

  • rtspecialname. The method has a special name, such as a constructor. These methods are treated in a special way by the CLR.

The calling convention pertains mostly to native code, in which a variety of calling conventions are supported: fastcall, cdecl, and others.

The implementation attributes include (but are not limited to) the following:

  • cil or il. The method contains MSIL code.

  • native. The method contains platform-specific code.

  • runtime. The implementation of the method is provided by the CLR. For example, when defining delegates, the delegate class and methods are generated by the run time.

  • managed. The implementation is managed.

Here is the declaration of a C# method:

virtual public int MethodA(int param1, int param2)

Here is the MSIL code for that same method:

.method public hidebysig newslot virtual instance int32 MethodA(
    int32 param1, int32 param2) cil managed

.field

The .field directive defines a new field, which is state information for a class. Instance fields are data for an object. Static fields are data for a class.

The syntax of the .field directive is as follows:

.field attributes type fieldname fieldinit at datalabel

The accessibility attributes are the same as described with methods. Static fields must be assigned the static attribute. The default is an instance field. The initonly attribute defines a read-only field.

Here is a field defined in a C# class:

private readonly int fielda = 10;

Here is the same field translated to MSIL code. The compiler also adds a no-argument constructor, where fielda is initialized to 10 (not shown here).

.field private initonly int32 fielda

.property

The .property directive adds a property member to a class. It also declares the get and set methods for the property.

Here is the syntax of the .property directive:

.property attributes return propertyname parameters default { propertyblock }

The attributes of a property are similar to those of a class and method. Return is the return type of the property. Propertyname and parameters are the signature of the property. The default option sets the default value of the property.

Within propertyblock, the .get directive declares the signature of the get method, whereas the .set directive declares the set method. The propertybody block includes only the method declarations. The get and set methods actually are implemented at the class level, not within the property.

Here is a property defined and implemented in a C# application:

public int propa {
    get {
        return 0;
    }
}

Here is the declaration of the property in the MSIL code (the implementation of the get.propa method is not shown):

.property instance int32 propa()
{
    .get instance int32 Donis.CSharpBook.Starter::get_propa()
}

.event

The .event directive adds an event in a class.

Here is the syntax of the .event directive:

.event classref eventname { eventbody }

Classref is the underlying type of the event, such as EventHandler.

The eventbody block encapsulates the .addon and .removeon directives. The .addon directive declares the method used to add subscribers. The .removeon directive declares the method for removing subscribers. The add and remove methods are implemented in the class and not within the event.

Here is the C# code that declares an event:

public event EventHandler EventA;

Here is the MSIL code for the event:

.event [mscorlib]System.EventHandler EventA {
    .addon instance void Donis.CSharpBook.Starter::add_EventA(
        class [mscorlib]System.EventHandler)
    .removeon instance void Donis.CSharpBook.Starter::remove_EventA(
        class [mscorlib]System.EventHandler)
}

Method Directives

The .method directive adds a method to a class. MSIL allows for global methods. Global methods break the rules of encapsulation and other tenets of OOP. For this reason, C# does not support global methods. The method block can contain both directives and the implementation code (MSIL).

This section lists the directives that frequently are included in the method block.

.locals

The .locals directive declares local variables that are available by name or index. Local variables form a zero-based array.

Here is the syntax of the .locals directive:

.locals1 ([index]local1, [index] local2, [index] localn)

.locals2 init ([index]local1, [index] local2, [index] localn)

The .locals1 directive defines one or more local variables. Explicit indexes can be set for each local variable. By default, the local variables are indexed sequentially starting at zero.

The .locals2 directive adds the init keyword, which requests that local variables be initialized to either null or zero . The init keyword is required for the assembly to pass code verification. Therefore, the C# compiler emits only the .locals2 directive.

Local variables do not have to be declared at the beginning of a method.

.maxstack

The .maxstack directive sets the number of slots available on the evaluation stack, which is the number of items that can exist on the evaluation stack simultaneously. Without this directive, the default is eight slots.

Here is the syntax of the .maxstack directive:

.maxstack slots

.entrypoint

The .entrypoint directive designates a method as the entry point method of the application. This directive can appear anywhere in the method, but best practice is to put the .entrypoint directive at the start of the method.

In C#, the entry point method is Main. In MSIL, any static method can be given this status.

Method Directive Example

The following program defines MSILFunc as the entry point method. The .entrypoint directive is found at the end of this method. This demonstrates that the .entrypoint declarative can be placed anywhere within the static method. The .locals directive defines two local variables and assigns explicit indexes. Our program simply reverses the default indexes. The instruction stloc.0 will update the second local variable. In the MSILFunc method, local variables are referenced using both the name and index. At the end, the method displays the values of 10 and then 5. The MSILFunc method returns void. In MSIL code, the ret instruction is required even when a function returns nothing. In C#, the return statement is optional for methods returning void.

.assembly extern mscorlib {}
.assembly application {}

.namespace Donis.CSharpBook {

    .class Starter {

        .method static public void MSILFunc() il managed {
            .locals init ([1] int32 locala, [0] int32 localb)
            ldc.i4.5
            stloc.0
            ldc.i4 10
            stloc.1
            ldloc locala
            call void [mscorlib] System.Console::WriteLine(int32)
            ldloc localb
            call void [mscorlib] System.Console::WriteLine(int32)
            .entrypoint
            ret
        }
    }
}

MSIL Instructions

MSIL includes a full complement of instructions, many of which are demonstrated in previous examples. Each instruction is also assigned an opcode, which is commonly 1 or 2 bytes. The 2-byte opcodes are always padded with a 0xFE byte in the high-order byte. Opcodes often are followed with operands. Opcodes, which provide an alternate means of identifying MSIL instructions, are used primarily when emitting code dynamically at run time. The ILGenerator.Emit method emits instructions based on opcodes. This method is found in the System.Reflection.Emit name space.

The byte option of ILDASM adds opcodes to a disassembly. The following is a partial listing of the hello.exe disassembly that includes just the Main method. As seen from the disassembly, the opcode for ldstr is 0x72, the opcode for stloc is 0x0A, and the opcode for call is 0x28.

.method public static void Main() cil managed
{
    .entrypoint
    .maxstack 2
    .locals init ([0] string name)
    IL_0000: /* 72   | (70)000001 */ ldstr    "Donis"
    IL_0005: /* 0A   |            */ stloc.0
    IL_0006: /* 72   | (70)00000D */ ldstr    "Hello, {0}!"
    IL_000b: /* FE0C | 0000       */ ldloc    name
    IL_000f: /* 28   | (0A)000001 */ call     void [mscorlib]System.Console::WriteLine(
                                                  string, object)
    IL_0014: /* 2A   |            */ ret
}

Short Form

Some MSIL instructions have a normal and a short-form syntax. The short form of the instruction has an .s suffix. The short form of the ldloc instruction is ldloc.s. The short form of the br instruction is br.s. Normal instructions have 4-byte operands, and short-form instructions are limited to 1-byte operands.

When used injudiciously, the short-form syntax can cause unexpected results. Consider the following example:

.assembly extern mscorlib {}
.assembly application {}

.namespace Donis.CSharpBook {

    .class Starter {
        .method static public void Main() il managed {
            .entrypoint
            ldc.i4.s 50000
            call void [mscorlib] System.Console::WriteLine(int32)
            ret
        }
    }
}

In the preceding application, a constant of 50000 is placed on the evaluation stack. However, the ldc instruction is in the short form. You cannot fit 50000 into a single byte, so the constant overflows the byte. For this reason, the application incorrectly displays 80.

There are categories of MSIL instructions. The next section reviews these categories, such as branch, arithmetic, call, and array groups of instructions. Because of the prevalence of the evaluation stack, load and store instructions are the most frequently used of all MSIL instructions. That is a good place to start.

Load and Store Methods

Load and store instructions transfer data between the evaluation stack and memory. Load commands push memory, such as a local variable, onto the evaluation stack. Store commands move data from the evaluation stack to memory. Information placed on the evaluation stack is consumed by method parameters, arithmetic operations, and other MSIL instructions. The return value from a method also is placed on the evaluation stack after the invocation. Data not otherwise consumed should be removed from the evaluation stack before the current method returns. The pop instruction is the best command to remove extraneous data from the evaluation stack. Information needed for an instruction should be placed on the evaluation stack immediately prior to the execution of that instruction. If not, an InvalidProgramException is triggered.

Table 14-2 lists the basic load instructions.

Table 14-2. Load instructions

Instruction

Description

ldc

The ldc instruction pushes a constant on the evaluation stack. The constant can be an integral or a floating-point value.

Here is the syntax of the ldc instruction:

ldc1.type value

ldc2.i4.number

ldc3.i4.s number

The ldc1 instruction places a constant of the specified type onto the evaluation stack.

The ldc2 instruction is more efficient if you need to transfer an integral value of –1 or an integral value between 0 and 8 to the evaluation stack. The special format for –1 is ldc.i4.m1.

ldloc

The ldloc instruction copies the value of a local variable to the evaluation stack.

Here is the syntax of the ldloc instruction:

ldloc1 index

ldloc2.s index

ldloc3 name

ldloc4.s name

ldloc5.n

The ldloc1 and ldloc2 instructions use an index to identify a local variable, which is then placed on the evaluation stack. The ldloc3 and ldloc4 instructions identify the local variable with the symbolic name. The ldloc5 instruction is optimized to place local variables from index 0 to index 3. The short form, ldloc2, efficiently loads local variables from index 4 to index 255.

ldarg

The ldarg instruction places a method argument on the evaluation stack. The value then can be used in the program, such as in an arithmetic expression, or it can be stored in a local variable.

Here is the syntax of the ldarg instruction, which is identical to the ldloc instruction:

ldarg index

ldarg.s index

ldarg name

ldarg.s name

ldarg.n

ldnull

The ldnull instruction places a null on the evaluation stack. This instruction has no operands.

Table 14-3 lists the basic store instructions.

Table 14-3. Store instructions

Instruction

Description

stloc

The stloc instruction removes a value from the evaluation stack and places it in a local variable.

Here is the syntax of the stloc instruction, which is the same as the syntax of the ldloc and ldarg instructions:

stloc index

stloc.s index

stloc name

stloc.s name

stloc.n

starg

The starg instruction moves a value from the evaluation stack to a method argument. The value also is removed from the evaluation stack.

Here is the syntax of the starg instruction:

starg num

starg.s num

The short form of the starg instruction is efficient for index 0 to index 255.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.191.235.176