8.9. The Intermediate Language

When we compile a C# program, the code generated does not execute; it is not a machine assembly language. Rather it is a machine-independent intermediate language (IL) representation of our program.[4]

[4] For details of the intermediate language beyond the discussion in this section, see The IL Assembly Language Programmers' Reference and The MSIL Instruction Set Specification within the Program Files/Microsoft.NET/FrameworkSDK/Tool Developers Guide/docs folder.

All .NET languages are compiled into the IL. This means that at the IL level, all language code looks pretty much the same. There are two primary benefits to compiling all .NET languages into a common intermediate language. The first, of course, is real language interoperability—not only at the binary level, but at the source level as well. Not only can we combine modules written in different .NET languages, but we can make use of and inherit from classes defined within different .NET languages.

The second benefit is in having a language-neutral code base upon which to target tools. The delivery of a tool built on top of the IL should work with each .NET language through a common interface.

The one caveat is that not all languages support all the features of the IL. C#, for example, supports unsigned types. VB.NET (Visual Basic) does not. C# recognizes upper- and lowercase when distinguishing identifiers. VB.NET does not. To promote language interoperability, .NET defines a Common Language Specification (CLS). The CLS represents the basic features common to a broad range of languages. To know if your code is CLS compliant, you can set the CLSCompliant assembly attribute to true:

[CLSCompliant( true )]

Noncompliant code triggers a compile-time error when CLSCompliant is set to true.[5]

[5] See the Cross-Language Interoperability section of the .NET Framework Developer's Guide for a discussion of the Common Language Specification and writing CLS-compliant code.

8.9.1. Examining the Intermediate Language

As we've discussed, the primary difference between a value and a reference type is that a value type stores its data directly within its object, while the reference type is separated into a handle/object pair. The handle is allocated locally; the object is allocated on the managed heap. As an illustration of the intermediate language representation, let's look at the code generated to support the definition of a struct value type and a class reference type:

structDef sd = new structDef(1, 2, 3);
classDef  cd = new classDef( 1, 2, 3);

The intermediate language uses an evaluation stack for both load and store. Each method maintains a local stack. Upon method entry, the stack is empty.

Load instructions (ld*) copy values from memory to the evaluation stack. Store instructions (st*) copy values from the stack back to memory. Arguments passed to other methods and their return values are also pushed onto and popped from the evaluation stack.

Each method maintains an array to hold locally defined objects. It is named .locals. The struct value type sd, together with the handle portion of the cd reference type, is placed in this array:

.locals (
   [0] value class structDef sd,
   [1] class classDef cd,
   ...
)

Here is the code sequence to initialize the struct value type sd. It's quite simple: sd is loaded onto the stack, followed by the three constant literals. Then the constructor is invoked. Because the instructions are difficult to read, I've added the source code with comments set off by three slashes (///):

/// source line: structDef sd = new structDef( 1, 2, 3 );

/// the ldloca instruction pushes the address of the local
/// object onto the stack

ldloca.s   sd

/// the ldc instruction pushes a constant number onto the
/// stack; i4 represents the type – in this case, a 4-byte
/// integer; the last digit represents the literal value
///
/// so the next three instructions push 1, 2, 3 onto the
/// stack; these represent the arguments to the constructor

ldc.i4.1
ldc.i4.2
ldc.i4.3

/// an invocation of the three-argument constructor;
/// it pulls the three arguments and the object to
/// initialize from the stack

call instance void structDef::.ctor(int32,int32,int32)

The reference type is more complicated to initialize because it must be allocated on the heap. The newobj instruction allocates memory on the heap and then calls the class constructor. newobj returns a reference to the object on the heap, which is stored in the local handle:

///  source line: classDef cd = new classDef(1,2,3);

/// load the constant literals on the stack
ldc.i4.1
ldc.i4.2
ldc.i4.3

/// the constructor to be called is specified as part of
/// the newobj instruction; the parameters accepted by the
/// constructor are pulled from the stack
newobj instance void classDef::.ctor(int32, int32, int32)

/// store the heap object reference at index 1 of .locals
stloc.1

This should give you something of the flavor of the intermediate language. It can be quite interesting to explore in order to discover how various constructs are implemented. To help in your navigation, Visual Studio.NET comes with the intermediate language disassembler: ildasm.

Keep in mind that the intermediate language representation is a storage mechanism. It is not interpreted when our programs run. Rather, prior to execution the IL is compiled into machine assembly for the target host. This is what is executed.

8.9.2. The ildasm Tool

Several tools work with the intermediate language representation. One is ildasm,[6] the intermediate language disassembler. As a learning tool, it allows us to disassemble and inspect the IL code and view the metadata associated with our assembly. For example, Figure 8.1 captures a portion of the ildasm tree view of the core system library assembly mscorlib.dll. To understand the geometric icons, refer to Figure 8.2.

[6] Currently, ildasm.exe and the many other .NET tools are located in the directory %windir%Microsoft.NETFrameworkv1.0.xxxx, where xxxx is the build number of the .NET framework you are using.

Figure 8.1. Tree View of mscorlib.dll


Figure 8.2. Tree View Icon Help


The tree view provides a hierarchical view of the types within the assembly organized by namespace. All types and nested namespaces are listed for each namespace. All members of each type are listed. If we double-click on an atomic icon, such as the red triangle property icon of the ICollection member Count, a window pops up with additional information. Figure 8.3 pictures the window that displays when we double-click on Count. We see that it's of type int, that it provides only a get accessor, and that it is nonstatic. If we double-click on a method, the window that pops up displays the IL instructions. (I'll leave that as an exercise for the reader.)

Figure 8.3. Result of Double-Clicking on Property Member Count


ildasm runs in both a default and an advanced mode. The advanced mode provides additional access to metadata information about the assembly.

The following command brings it up in default mode :

C:WINMicrosoft.NETFrameworkv1.0.2914ildasm.exe

Adding /adv following ildasm.exe brings it up in advanced mode.

C:WINMicrosoft.NETFrameworkv1.0.2914ildasm.exe /adv

Either command can be executed within either a Command Prompt window, or within the Run dialog box.

The View menu item allows us to set various display attributes, such as the following:

  • Sort by Name: Sort the items in tree view by name.

  • Show Public: Show the items having public accessibility.

  • Show Private: Show the items having private accessibility.

  • Show Assembly: Show the items having assembly accessibility.

  • Show Source Lines: Show the original source code along with IL.

  • Show Statistics (advanced mode only): Show file statistics.

  • Show MetaInfo: (advanced mode only): Show the metadata in a disassembly window.

ildasm is a great learning tool. It offers us the chance to peek under the hood, so to speak. I find myself going to the intermediate language when I am studying a particular language feature and wishing to confirm my understanding.

For example, while I understand the different behaviors of a reference and value type when assigned to an object type or when initialized through a new expression, seeing the actual intermediate implementation serves as icing on the cake.

ildasm provides a good way of confirming that what you think is actually happening is actually happening!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.179.35