Chapter 9: Fields and Data Constants

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 9

Fields and Data Constants

Fields are one of two kinds of typed and named data locations, the second kind being method local variables, which are discussed in Chapter 10. Fields correspond to the data members and global variables of the C++ world. Apart from their own characteristics, fields can have additional information associated with them that defines the way the fields are laid out by the loader, how they are allocated, how they are marshaled to unmanaged code, and whether they have default values. This chapter examines all aspects of member and global fields, and the metadata used to describe these aspects.

Field Metadata

To define a field, you must first provide basic information: the field’s name and signature, and the flags indicating the field’s characteristics, stored in the Field metadata table. Then comes optional information, specific to certain kinds of fields: field marshaling information, found in the FieldMarshal table; field layout information in the FieldLayout table; field mapping information in the FieldRVA table; and a default value in the Constant table.

To reference a field, you must know its owner—TypeRef, TypeDef, or ModuleRef—as well as the field’s name and signature. The references to the fields are kept in the MemberRef table. Figure 9-1 shows the general structure of the field metadata group.

Figure 9-1. Field metadata group

Defining a Field

The central metadata table of the group, the Field table, has the associated token type mdtFieldDef (0x04000000). A record in this table has three entries:

Flags (2-byte unsigned integer): Binary flags indicating the field’s characteristics.
Name (offset in the #Strings stream): The field’s name.
Signature (offset in the #Blob stream): The field’s signature.

As you can see, a Field record does not contain one vital piece of information: which class or value type owns the field. The information about field ownership is furnished by the class descriptor itself: records in the TypeDef table have FieldList entries, which hold the RID in the Field table where the first of the type’s fields can be found.

In the simplest case, the ILAsm syntax for a field declaration is as follows:

.field<flags> <type> <name>

The owner of a field is the class or value type in the lexical scope of which the field is defined.

A field’s binary flags are defined in the CorHdr.h file in the enumeration CorFieldAttr and can be divided into four groups, as described in the following list. I’m using ILAsm keywords instead of the constant names from CorFieldAttr, as I don’t think the constant names are relevant.

Accessibility flags (mask 0x0007)
- privatescope (0x0000): This is the default accessibility. A private scope field is exempt from the requirement of having a unique triad of owner, name, and signature and hence must always be referenced by a FieldDef token and never by a MemberRef token (0x0A000000), because member references are resolved to the definitions by exactly this triad. The privatescope fields are accessible from anywhere within current module.
- private (0x0001): This field is accessible from its owner and from classes nested in the field’s owner. Global private fields are accessible from anywhere within current module.
- famandassem (0x0002): This field is accessible from types belonging to the owner’s family defined in the current assembly. The term family here means the type itself and all its descendants.
- assembly (0x0003): This field is accessible from types defined in the current assembly.
- family (0x0004): This field is accessible from the owner’s family (defined in this or any other assembly).
- famorassem (0x0005): This field is accessible from the owner’s family (defined in this or any other assembly) and from all types (of the owner’s family or not) defined in the current assembly.
- public (0x0006): This field is accessible from any type.
Contract flags (mask 0x02F0)
- static (0x0010): This field is static, shared by all instances of the type. Global fields must be static.
- initonly (0x0020): This field can be initialized only and cannot be written to later. Initialization takes place in an instance constructor (.ctor) for instance fields and in a class constructor (.cctor) for static fields. This flag is not enforced by the CLR; it exists for the compilers’ reference only.
- literal (0x0040): This field is a compile-time constant. The loader does not lay out this field and does not create an internal handle for it. The field cannot be directly addressed from IL and can be used only as a Reflection reference to retrieve an associated metadata-held constant. If you try to access a literal field directly—for example, through the ldsfld instruction—the JIT compiler throws a MissingField exception and aborts the task.
- notserialized (0x0080): This field is not serialized when the owner is remoted. This flag has meaning only for instance fields of the serializable types.
- specialname (0x0200): This field is special in some way, as defined by the name. An example is field value__ of an enumeration type.
Reserved flags (cannot be set explicitly; mask 0x9500)
- rtspecialname (0x0400): This field has a special name that is reserved for the internal use of the common language runtime. Two field names are reserved: value_, for instance fields in enumerations, and _Deleted*, for fields marked for deletion but not actually removed from metadata. The keyword rtspecialname is ignored by the IL assembler (the flag is actually set automatically by the metadata emission API) and is displayed by the IL disassembler for informational purposes only. This flag must be accompanied in the metadata by a specialname flag.
- marshal(<native_type>) (0x1000): This field has an associated FieldMarshal record specifying how the field must be marshaled when consumed by unmanaged code. The ILAsm construct marshal(<native_type>) defines the marshaling information emitted to the FieldMarshal table but does not set the flag directly. Rather, the flag is set behind the scenes by the metadata emission API when the marshaling information is emitted. Chapter 8 discusses native types.
- [no ILAsm keyword] (0x8000): This field has an associated Constant record. The flag is set by the metadata emission API when the respective Constant record is emitted. See the section “Default Values” later in this chapter.
- [no ILAsm keyword] (0x0100): This field is mapped to data and has an associated FieldRVA record. The flag is set by the metadata emission API when the respective FieldRVA record is emitted. See the section “Mapped Fields” later in this chapter.

In the field declaration, the type of the field (<type> in the previous syntax formula) is the ILAsm notation of the appropriate single encoded type, which together with the calling convention forms the field’s signature. If you forgot what a field signature looks like, see Chapter 8.

The name of the field (<name> in the previous syntax formula), also included in the declaration, can be a simple name or a composite (dotted) name. ILAsm v1.0 and v1.1 did not allow composite field names, although one could always cheat and put a composite name in single quotation marks, turning it into a simple name.

Examples of field declarations include the following:

.field public static marshal(int) int32 I
.field family string S
.field privateint32& pJ   //  ERROR! ByRef in field signature!

Referencing a Field

Field references in ILAsm have the notation of

<field_ref> ::= <type> [<class_ref>::]<name>

where <class_ref>—as you know from Chapter 7—is defined as

<class_ref> ::= [<resolution_scope>]<full_type_name>

where

 <resolution_scope> ::= [<assembly_ref_alias> ]
                      | [.module<module_ref_name> ]

For instance, this example uses the IL instruction ldfld, which loads the field value on the stack:

ldfld int32[ .module Another.dll]Foo.Bar::idx

When it is not possible to infer unambiguously from the context whether the referenced member is a field or a method, <field_ref> is preceded by the keyword field. Note that the keyword does not contain a leading dot. The following example uses the IL instruction ldtoken, which loads an item’s runtime handle on the stack:

ldtoken field int32[ .module Another.dll]Foo.Bar::idx

The field references reside in the MemberRef metadata table, which has associated token type 0x0A000000. A record of this table has only three entries:

Class (coded token of type MemberRefParent): This entry references the TypeRef or ModuleRef table. Method references, residing in the same table, can have their Class entries referencing the Method and TypeSpec tables as well.
Name (offset in the #Strings stream).
Signature (offset in the #Blob stream).

Instance and Static Fields

Instance fields are created every time a type instance is created, and they belong to the type instance. Static fields, which are shared by all instances of the type, are created when the type is loaded. Some of the static fields (literal and mapped fields) are never allocated. The loader simply notes where the mapped fields reside and addresses these locations whenever the fields are to be addressed. And all the references to the literal fields are replaced with the constants at compile time by the high-level compilers (the IL assembler does not do that, leaving it to the programmer).

A field signature contains no indication of whether the field is static or instance. But since the loader keeps separate books for instance fields and for two out of three kinds of static fields—not for literal static fields—the kind of referenced field is easily discerned from the field’s token. When a field token is found in the IL stream, the JIT compiler does not have to dive into the metadata, retrieve the record, and check the field’s flags; by that time, all the fields have been accounted for and duly classified by the loader.

IL has two sets of instructions for field loading and storing. The instructions for instance fields are ldfld, ldflda, and stfld; those for static fields are ldsfld, ldsflda, and stsfld. An attempt to use a static field instruction with an instance field would result in a JIT compilation failure. The inverse combination would work, but it requires loading the instance pointer on the stack, which is, of course, completely redundant for a static field. The good thing about the possibility of using instance field instructions for static fields is that it allows for accessing both static and instance fields in the same way.

Default Values

Default values reside in the Constant metadata table. Three kinds of metadata items can have a default value assigned and therefore can reference the Constant table: fields, method parameters, and properties. A record of the Constant table has three entries:

Type (unsigned 1-byte integer): The type of the constant—one of the ELEMENT_TYPE_* codes. (See Chapter 8.)
Parent (coded token of type HasConstant): A reference to the owner of the constant—a record in the Field, Property, or Param table.
Value (offset in the #Blob stream): A constant value blob.

The current implementation of the common language runtime and ILAsm allows the constant types described in Table 9-1. (As usual, I’ve dropped the ELEMENT_TYPE_ part of the name.)

Table 9-1. Constant Types

Constant Type	ILAsm Notation	Comments
I1	int8	Signed 1-byte integer.
I2	int16	Signed 2-byte integer.
I4	int32	Signed 4-byte integer.
I8	int64	Signed 8-byte integer.
R4	float32	4-byte floating point.
R8	float64	8-byte floating point.
CHAR	char	2-byte Unicode character.
BOOLEAN	bool	1-byte Boolean, true = 1, false = 0.
STRING	<quoted_string>, bytearray	Unicode string.
CLASS	nullref	Null object reference. The value of the constant of this type must be a 4-byte integer containing 0.

The ILAsm syntax for defining the default value of a field is as follows:

<field_def_const> ::= .field<flags> <type> <name>
   = <const_type> [( <value> )]

The value in parentheses is mandatory for all constant types except nullref. For example,

.field public int32i = int32(123)
.field public static literal bool b = bool(true)
.field private float32f = float32(1.2345)
.field public static int16ii = int16(0xFFE0)
.field public object o = nullref

Defining integer and Boolean constants—not to mention nullref—is pretty straightforward, but floating-point constants and strings can present some difficulties.

Floating-point numbers have special cases, such as positive infinity, negative infinity, and not-a-number (NAN), that cannot be presented textually in simple floating-point format. In these special cases, the floating-point constants can alternatively be represented as integer values with a matching byte count. The integer values are not converted to floating-point values; instead, they represent an exact bit image of the floating-point values (in IEEE-754 floating-point format used by the CLR). For example,

.field public float32fPosInf = float32(0x7F800000)
.field public float32fNegInf = float32(0xFF800000)
.field public float32fNAN    = float32(0xFFC00000)

Like all other constants, string constants are stored in the #Blob stream. In this regard, they differ from user-defined strings, which are stored in the #US stream. What both kinds of strings have in common is that they are supposed to be Unicode (UTF-16). I say “supposed to be” because the only Unicode-specific restrictions imposed on these strings are that their sizes are reported in Unicode characters and that their byte counts must be even. Otherwise, these strings are simply binary objects and might or might not contain invalid Unicode characters.

Notice that the type of the constant does not need to match the type of the item to which this constant is assigned—in this case, the type of the field. That is, the match is not required by the CLR, which cares nothing about the constants: the constants are provided for compilers’ information only, and the high-level compilers, having encountered a reference to a constant in the source code, emit explicit instructions to assign respective values to fields or parameters.

In ILAsm, a string constant can be defined either as a composite quoted string or as a byte array:

.field public static string str1 = "Isn't" + " it " + "marvelous!"
.field public static string str2 = bytearray(00 01 FF FE 1A 00 00 )

When a string constant is defined as a simple or composite quoted string, this string is converted to Unicode before being stored in the #Blob stream. In the case of a bytearray definition, the specified byte sequence is stored “as is” and padded with one 0 byte if necessary to make the byte count even. In the example shown here, the default value for the str2 field will be padded to bring the byte count to eight (four Unicode characters). And if the bytes specified in thebytearray are invalid Unicode characters, it will surely be discovered when you try to print the string, but not before.

Assigning default values to fields (and parameters) seems to be such a compelling technique that you might wonder why I did not employ it in the simple sample discussed in Chapter 1. Really, defining the default values is a great way to initialize fields—right? Wrong. Here’s a tricky question. Suppose that you define a member field as follows:

.field public static int32ii = int32(12345)

What will the value of the field be when the class is loaded? Correct answer: 0. Why? Default values specified in the Constant table are not used by the loader to initialize the items to which they are assigned. If you want to initialize a field to its default value, you must explicitly call the respective Reflection method to retrieve the value from metadata and then store this value in the field. This doesn’t sound too nice, and I think that the CLR could probably do a better job with field initialization—and with literal fields as well.

Let me remind you once again that literal fields are not true fields. They are not laid out by the loader, and they cannot be directly accessed from IL. From the point of view of metadata, however, literal fields are nevertheless valid fields having valid tokens, which allow the constant values corresponding to these fields to be retrieved by Reflection methods. The common language runtime does not provide an implicit means of accessing the Constant table, which is a pity. It would certainly be much nicer if the JIT compiler would compile the ldsfld instruction into the retrieval of the respective constant value, instead of failing, when this instruction is applied to a literal field. But such are the facts of life, and I am afraid we cannot do anything about it at the moment.

Given this situation, literal fields without associated Constant records are legal from the loader’s point of view, but they are utterly meaningless. They serve no purpose except to inflate the Field metadata table.

But how do the compilers handle literal fields? If every time a constant from an enumeration—represented, as you know, by a literal field—was used, the compiler emitted a call to the Reflection API to get this constant value, then one could imagine where it would leave the performance. Most compilers are smarter than that and resolve the literal fields at compile time, replacing references to literal fields with explicit constant values of these fields so that the literal fields never come into play at run time.

ILAsm, following common language runtime functionality to the letter, allows the definition of the Constant metadata but does nothing about the symbol-to-value resolution at compile time. From the point of view of ILAsm and the runtime, the enumeration types are real, as distinctive types, but the symbolic constants listed in the enumerations are not. You can reference an enum, but you can never reference its literal fields.

Mapped Fields

It is possible to provide unconditional initialization for static fields by mapping the fields to data defined in the PE file and setting this data to the initializing values. The syntax for mapping a field to data in ILAsm is as follows:

<mapped_field_decl> ::= .field<flags> <type> <name> at<data_label>

Here’s an example:

.field public static int64ii at data_ii

The nonterminal symbol <data_label> is a simple name labeling the data segment to which the field is mapped. The ILAsm compiler allows a field to be mapped either to the “normal” data section (.sdata) or to the thread local storage section (.tls), depending on the data declaration to which the field mapping refers. A field can be mapped only to data residing in the same module as the field declaration. (For information about data declaration, see the following section, “Data Constants Declaration.”)

Mapping a field results in emitting a record into the FieldRVA table, which contains two entries:

RVA (4-byte unsigned integer): The relative virtual address of the data to which the field is mapped.
Field (RID to the Field table): The index of the Field record being mapped.

Two or more fields can be mapped to the same location, but each field can be mapped to one location only. Duplicate FieldRVA records with the same Field values and different RVA values are therefore considered invalid metadata. The loader is not particular about duplicate FieldRVA records, however; it simply uses the first one available for the field and ignores the rest.

The field mapping technique has some catches. The first catch (well, not much of a catch, actually) is that, obviously, only static fields can be mapped. Even if you could map instance fields, each instance would be mapped to the same physical memory, making the fields de facto static (shared by all instances) anyway. Mapping instance fields is considered invalid metadata, but it has no serious consequences for the loader; if a field is not static, the loader does not even check to see whether the field is mapped. The only real effect of mapping instance fields is a bloated FieldRVA table. The IL assembler treats mapping of an instance field as an error and produces an error message.

The second catch is to an extent a derivative from the first catch: the mapped static fields are “the most static of them all.” When multiple application domains are sharing the same process (as in the case of ASP.NET, for example) and several application domains are sharing a loaded assembly, the mapped fields of this assembly are shared by all application domains, unlike the “normal” static fields, which are individual per application domain.

The third catch is that a field cannot be mapped if its type contains object references (objects or arrays). The data sections are out of the garbage collector’s reach, so the validity of object references placed in the data sections cannot be guaranteed. If the loader finds object references in a mapped field type, it throws a TypeLoad exception and aborts the loading, even if the code is run in full-trust mode from a local drive and all security-related checks are disabled. The loader checks for the presence of object references on all levels of the field type; in other words, it checks the types of all the fields that make up the type, checks the types of fields that make up those types, and so on.

The fourth catch is that in the verifiable code a field cannot be mapped if its type (value type, of course) contains nonpublic instance fields. The reasoning behind this limitation is that if you map a field with a type containing nonpublic members, you can map another field of some all-public type to the same location and, through this second mapping, get unlimited access to nonpublic member fields of the first type. The loader checks for the presence of nonpublic members on all levels of the mapped field type and throws a TypeLoad exception if it finds such members. This check, unlike the check for object references, is performed only when code verification is required; it is disabled when the code is run from the local drive in full-trust mode.

Note, however, that a mapped field itself can be declared nonpublic without ill consequences. This is based on the simple assumption that if developers decide to overlap their own nonpublic field and thus defy the accessibility control mechanism of the common language runtime object model, they probably know what they are doing.

The last catch worth mentioning is that the initialization data is provided “as is,” exactly as it is defined in the PE file. And if you run the code on a platform other than the one on which the PE file was created, you can face some unpleasant consequences. As a trivial example, suppose you map an int32 field to data containing bytes 0xAA, 0xBB, 0xCC, and 0xDD. On a little endian platform (for instance, an Intel platform), the field is initialized to 0xDDCCBBAA, while on a big endian platform . . . well, you get the picture.

All these catches do not preclude the compilers from using field mapping for initialization.

Version 2.0 or later of the IL assembler provides a means of mapping the fields onto an explicitly specified memory address. In this case, the <data label> name must have the form @<RVA in decimal format>. This technique can hardly be recommended for general use because of the obvious hazards associated with it (you usually don’t know the target RVA before the program has been compiled), but in certain limited cases (when you do know the RVA beforehand) it can be useful. Consider, for example, the following declaration:

.field public static int16NTHeaderMagic at@152

Data Constants Declaration

A data constant declaration in ILAsm has the syntax of

<data_decl> ::= .data[ tls] [<data_label> = ] <data_items>

where <data_label> is a simple name, unique within the module

<data_items> ::= { <data_item> [ , <data_item>* ] } | <data_item>

and where

<data_item> ::= <data_type> [ ( <value> ) ] [ [ <count> ]  ]

Data constants are emitted to the .sdata section or the .tls section, depending on the presence of the tls keyword, in the same sequence in which they were declared in the source code. The unlabeled data declarations can be used for padding between the labeled data declarations and probably for nothing else, since without a label it’s impossible to map a field to this data. Unlabeled—or, more precisely, unreferenced—data might not survive round-tripping (disassembly-reassembly) because the IL disassembler outputs only referenced data.

The nonterminal symbol <data_type> specifies the data type. (See Table 9-2.) The data type is used by the IL assembler exclusively for identifying the size and byte layout of <value> (in order to emit the data correctly) and is not emitted as any part of metadata or the data itself. Having no way to know what the type was intended to be when the data was emitted, the IL disassembler always uses the most generic form, a byte array, for data representation.

Table 9-2. Types Defined for Data Constants

If <value> is not specified, the data is initialized to a value with all bits set to zeros. Thus, it is still “initialized data” in terms of the PE file structure, meaning that this data is part of the PE file disk image.

The optional <count> in square brackets indicates the repetition count of the data item. Here are some examples:

.data tls T_01 = int32(1234)
// 4 bytes in .tls section, value 0x000004D2
.data tls int32
// unnamed 4 bytes padding in .tls section, value doesn't matter
.data D_01 = int32(1234)[32]      // 32 4-byte integers in .sdata section,
                                   //  Each equal to 0x000004D2
.data D_02 = char*("Helloworld!")// Unicode string in .sdata section

Explicit Layouts and Union Declaration

Although instance fields cannot be mapped to data, it is possible to specify the positioning of these fields directly. As you might remember from Chapter 7, a class or a value type can have an explicit flag—a special flag indicating that the metadata contains an exact recipe for the loader regarding the layout of this class. This information is kept in the FieldLayout metadata table, whose records contain these two entries:

OffSet (4-byte unsigned integer): The relative offset of the field in the class layout (not an RVA) or the field’s ordinal in case of sequential layout. The offset is relative to the start of the class instance’s data.
Field (RID to the Field table): The index of the field for which the offset is specified.

In ILAsm, the field offset is specified by putting the offset value in square brackets immediately after the .field keyword, as shown here:

.class public value sealed explicit MyStruct
{
   .field[0] public int32ii
   .field[4] public float64dd
   .field[12] public bool bb
}

Only instance fields can have offsets specified. Since static fields are not part of the class instance layout, specifying explicit offsets for them is meaningless and is considered a metadata error. If an offset is specified for a static field, the loader behaves the same way it does with mapped instance fields: if the field is static, the loader does not check to see whether the field has an offset specified. Consequently, FieldLayout records referencing the static fields are nothing more than a waste of memory.

In a class that has an explicit layout, all the instance fields must have specified offsets. If one of the instance fields does not have an associated FieldLayout record, the loader throws a TypeLoad exception and aborts the loading. Obviously, a field can have only one offset, so duplicate FieldLayout records that have the same Field entry are illegal. This is not checked at run time because this metadata invalidity is not critical: the loader takes the first available FieldLayout record for the current field and ignores any duplicates. It’s worth remembering, though, that while supplying wrong metadata doesn’t always lead to aborted program, it almost certainly leads to unexpected (by the programmer) behavior of the application.

The placement of object references (classes, arrays) is subject to a general limitation: the fields of object reference types must be aligned on pointer size—either 4 or 8 bytes, depending on the platform.

.class public value sealed explicit MyStruct
{
   .field[0] public int16ii
   .field[2] public string str   //Illegal on 32-bit and 64- bit
   .field[6] public int16jj
   .field[8] public int32kk
   .field[12] public object oo   //Illegal on 64-bit platform
   .field[16] public int32[]iArr//Legal on both platforms
}

This alignment requirement may cause platform dependence, unless you decide to always align the fields of object reference types on 8 bytes, which would suit both 32-bit and 64-bit platforms.

Value types with an explicit layout containing object references must have a total size equal to a multiple of the pointer size. The reason is pretty obvious: imagine what happens if you declare an array of such value types.

Explicit layout is a standard way to implement unions in IL. By explicitly specifying field offsets, you can make fields overlap however you want. Let’s suppose, for example, that you want to treat a 4-byte unsigned integer as such, as a pair of 2-byte words, or as 4 bytes. In C++ notation, the respective constructs look like this:

union MultiDword {
   DWORD dw;
   union {
      struct {
         WORD w1;
         WORD w2;
      };
      struct {
         BYTE b1;
         BYTE b2;
         BYTE b3;
         BYTE b4;
      };
   };
};

In ILAsm, the same union will be written like so:

.class public value sealed explicit MultiDword
{
   .field[0] public uint32dw
 
   .field[0] public uint16w1
   .field[2] public uint16w2
 
   .field[0] public uint8b1
   .field[1] public uint8b2
   .field[2] public uint8b3
   .field[3] public uint8b4
}

The only limitation imposed on the explicit-layout unions is that if the overlapping fields contain object references, these object references must not overlap with any other field.

.class public value sealed explicit StrAndIndex
{
   .field[0] public string Str// Reference, size 4 bytes
                                // on 32-bit platform
   .field[4] public uint32Index
}
.class public value sealed explicit MyUnion
{
   .field[0] public valuetype StrAndIndex str_and_index
   .field[0] public uint64whole_thing// Illegal!
   .field[0] public string str// Legal, but unverifiable
   .field[2] public uint32half_and_half// Illegal!
   .field[4] public uint32index//  Legal, object reference
                                  //  not overlapped
}

Such “unionizing” of the object references would provide the means for directly modifying these references, which could thoroughly disrupt the functioning of the garbage collector. The loader checks explicit layouts for object reference overlap; if any is found, it throws a TypeLoad exception and aborts the loading.

This rule has an interesting exception, though: the object references can be overlapped with other object references (only full overlapping is allowed; partial overlapping is forbidden). This looks to me like a gaping hole in the type safety. On the other hand, this overlapping is allowed only in full-trust mode, and in full-trust mode you can do even worse things (run native unmanaged code, for example).

A field can also have an associated FieldLayout record if the type that owns this field has a sequential layout. In this case, the OffSet entry of the FieldLayout record holds a field ordinal rather than an offset. The fields belonging to a sequential-layout class needn’t have associated FieldLayout records, but if one of the class’s fields has such an associated record, all the rest must have one too. The ILAsm syntax for field declaration in types with sequential layout is similar to the case of the explicit layout, except the integer value in square brackets represents the field’s ordinal rather than the offset:

.class public value sealed sequential OneTwoThreeFour
{
   .field[0] public uint8one
   .field[1] public uint8two
   .field[2] public uint8three
   .field[3] publicuint8four
}

Global Fields

Fields declared outside the scope of any class are known as global fields. They don’t belong to a class but instead belong to the module in which they are declared. A module is represented by a special TypeDef record with RID=1 under the name <Module>, so all the formalities that govern how field records are identified by reference from their parent TypeDef records are observed.

Global fields must be static. Since only one instance of the module exists when the assembly is loaded and because it is impossible to create alternative instances of the module, this limitation seems obvious.

Global fields can have public, private, or privatescope accessibility flags—at least that’s what the metadata validity rules say. As you saw in Chapter 1, however, a global item (a field or a method) can have any accessibility flag, and the loader interprets this flag only as assembly, private, or privatescope. The public, assembly, and famorassem flags are all interpreted as assembly, while the family, famandassem, and private flags are all interpreted as private. The global fields cannot be accessed from outside the assembly, so they don’t have true public accessibility. And as no type can be derived from <Module>, the question about family-related accessibility is moot.

Global fields can be accessed from anywhere within the module, regardless of their declared accessibility. In this regard, the classes that are declared within a module and use the global fields have the same access rights as if they were nested in the module. The metadata contains no indications of such nesting, of course.

A reference to a global field declared in the same module has no <class_ref>:: part.

<global_field_ref> ::= [ field] <field_type> <field_name>

The keyword field is used in particular cases when the nature of the reference cannot be inferred from the context, for example in the ldtoken instruction.

A reference to a global field declared in a different module of the assembly also lacks the class name but has resolution scope.

<global_field_ref> ::= [ field] <field_type> [ .module<mod_name>]::<field_ name>

The following are two examples of such declarations:

ldsfld int32globalInt
// field globalInt from this module
ldtoken field int32[ .module supporting.dll]::globalInt
//  globalInt from other module

Since the global fields are static, you cannot explicitly specify their layout except by mapping them to data. Thus, your 4-2-1-byte union MultiDword would look like this if you implemented it with global fields:

.field public static uint32dw at D_00
.field public static uint16w1 at D_00
.field public static uint16w2 at D_02
.field public static uint8b1 at D_00
.field public static uint8b2 at D_01
.field public static uint8b3 at D_02
.field public static uint8b4 at D_03
.data D_00 = int8(0)
.data D_01 = int8(0)
.data D_02 = int8(0)
.data D_03 = int8(0)
...
ldc.i4.1
stsfld uint8b3// Set value of third byte

Fortunately, youdon’t have to do that every time you need a global union. Instead, you can declare the value type MultiDword exactly as before and then declare a global field of this type:

.field public static valuetype MultiDword multi_dword
...
ldc.i4.1
ldsflda valuetype MultiDword multi_dword
// Load reference to the field
// As instance of MultiDword
stfld uint8MultiDword::b3 //  Set value of third byte

Constructors vs. Data Constants

You’ve already taken a look at field mapping as a technique of field initialization, and I’ve listed the drawbacks and limitations of this technique. Field mapping has this distinct “unmanaged” scent about it, but the compilers routinely use it for field initialization nevertheless. Is there a way to get the fields initialized without mapping them? Yes, there is.

The common language runtime object model provides two special methods, the instance constructor (.ctor) and the class constructor (.cctor), a.k.a. the type initializer. We’re getting ahead of ourselves a bit here; Chapter 10 discusses methods in general and constructors in particular, so I won’t concentrate on the details here. For now, all you need to know about .ctor and .cctor is that .ctor is executed when a new instance of a type is created, and .cctor is executed after the type is loaded and before any one of the type members is accessed. The class constructors are static and can deal with static members of the type only, so you have a perfect setup for field initialization: .cctors take care of static fields, and .ctors take care of instance fields.

But how about global fields? The good news is that you can define a global .cctor. Field initialization by constructors is vastly superior to field mapping, with none of its limitations, as described earlier in the section “Mapped Fields.” The catch? Unfortunately, initialization by constructors must be executed at run time, burning processor cycles, whereas mapped fields simply “are there” after the module has been loaded. The mapped fields don’t require additional operations for the initialization. Whether this price is worth the increased freedom and safety regarding field initialization depends on the concrete situation, but in general I think it is.

Let me illustrate the point by building an alternative enumeration. Since all the values of an enumeration are stored in literal fields, which are inaccessible from IL directly, the compilers replace references to these fields with the respective values at compile time. You can use a very simple enum as a model, like so:

.class public enum sealed MagicNumber
{
   .field private specialname int32value__
   .field public static literal valuetype
      MagicNumber MagicOne = int32(123)
   .field public static literal valuetype
      MagicNumber MagicTwo = int32(456)
   .field public static literal valuetype
      MagicNumber MagicThree = int32(789)
}

Let’s suppose that your code uses the symbolic constants of an enumeration declared in a third-party assembly. You compile the code, and the symbolic constants are replaced with their values. Forget for a moment that you must have that third-party assembly available at compile time. But you will need to recompile the code every time the enumeration changes, and you have no control over the enumeration because it is defined outside your jurisdiction. In another scenario, when you declare an enumeration in one of your own modules, you must recompile all the modules that reference this enumeration once it is changed.

Let’s suppose also—for the sake of an argument—that you don’t like this situation, so you decide to devise your own enumeration, like so:

.class public value sealed MagicNumber
{
   .field public int32_value_// Specialname value__ is
                               // reserved for enums
   .field public static valuetype MagicNumber MagicOne at D_00
   .field public static valuetype MagicNumber MagicTwo at D_04
   .field public static valuetype MagicNumber MagicThree at D_08
}
.data D_00 = int32(123)
.data D_04 = int32(456)
.data D_08 = int32(789)

This solution looks good, except in the platform-independence department. You conquered the recompilation problem and can at last address the symbolic constants by their symbols (names), through field access instructions. This approach presents a problem, though: the fields representing the symbolic constants can be written to.

Let’s try again with a class constructor; refer to the sample MyEnums.il on the Apress web site.

.class public value sealed MagicNumber
{
   .field private int32_value_// Specialname value__ is
                                // reserved for enums
   .field public static initonly valuetype MagicNumber MagicOne
   .field public static initonly valuetype MagicNumber MagicTwo
   .field public static initonly valuetype MagicNumber MagicThree
   .method public static specialname void .cctor()
   {
      ldsflda valuetype MagicNumber MagicNumber::MagicOne
      ldc.i4123
      stfld int32MagicNumber::_value_
 
      ldsflda valuetype MagicNumber MagicNumber::MagicTwo
      ldc.i4456
      stfld int32MagicNumber::_value_
 
      ldsflda valuetype MagicNumber MagicNumber::MagicThree
      ldc.i4789
      stfld int32MagicNumber::_value_
 
      ret
   }
   .method public int32ToBase()
   {
      ldarg.0// Instance pointer
      ldfld int32MagicNumber::_value_
      ret
   }
}

This seems to solve all the remaining problems. The initonly flag on the static fields protects them from being overwritten outside the class constructor. Embedding the numeric values of symbolic constants in the IL stream takes care of platform dependence. You are not mapping the fields, so you are free to use any type as the underlying type of your enumeration. And, of course, declaring the _value_ field private protects it from having arbitrary values assigned to it, while public method ToBase() allows everybody interested to query the value of, er, _value_.

Alas, this solution does have a hidden problem: the initonly flag does not provide full protection against arbitrary field overwriting. The operations ldflda (ldsflda) and stfld (stsfld) on initonly fields are unverifiable outside the constructors. They’re unverifiable but not impossible, which means that if the verification procedures are disabled, the initonly fields can be overwritten in any method.

It looks like my attempts to devise a “nice” equivalent of an enum failed. If you have any fresh ideas in this regard, let me know.

Summary of Metadata Validity Rules

The field-related metadata tables are Field, FieldLayout, FieldRVA, FieldMarshal, Constant, and MemberRef. The records of these tables have the following entries:

The Field table contains the Flags, Name, and Signature entries.
The FieldLayout table contains the OffSet and Field entries.
The FieldRVA table contains the RVA and Field entries.
The FieldMarshal table contains the Parent and NativeType (native signature) entries.
The Constant table contains the Type, Parent, and Value entries.
The MemberRef table contains the Class, Name, and Signature entries.

Field Table Validity Rules

The Flags entry can have only those bits set that are defined in the enumeration CorFieldAttrEnum in CorHdr.h (validity mask 0xB7F7).
[run time] The accessibility flag (mask 0x0007) must be one of the following: privatescope, private, famandassem, assembly, family, famorassem, or public.
The literal and initonly flags are mutually exclusive.
If the literal flag is set, the static flag must also be set.
If the rtspecialname flag is set, the specialname flag must also be set.
[run time] If the flag 0x1000 (fdHasFieldMarshal) is set, the FieldMarshal table must contain a record referencing this Field record, and vice versa.
[run time] If the flag 0x8000 (fdHasDefault) is set, the Constant table must contain a record referencing this Field record, and vice versa.
[run time] If the flag 0x0100 (fdHasFieldRVA) is set, the FieldRVA table must contain a record referencing this Field record, and vice versa.
[run time] Global fields, owned by the TypeDef <Module>, must have the static flag set.
[run time] The Name entry must hold a valid reference to the #Strings stream, indexing a nonempty string no more than 1,023 bytes long in UTF-8 encoding.
[run time] The Signature entry must hold a valid reference to the #Blob stream, indexing a valid field signature. Chapter 8 discusses validity rules for field signatures.
No duplicate records—attributed to the same TypeDef and having the same Name and Signature values—can exist unless the accessibility flag is privatescope.
Fields attributed to enumerations must comply with additional rules, described in Chapter 7.

FieldLayout Table Validity Rules

The Field entry must hold a valid reference to the Field table.
The field referenced in the Field entry must not have the static flag set.
[run time] If the referenced field is an object reference type and belongs to TypeDefs that have an explicit layout, the OffSet entry must hold a value that is a multiple of sizeof(void*).
[run time] If the referenced field is an object reference type and belongs to TypeDefs that have an explicit layout, this field must not overlap with any other field.

FieldRVA Table Validity Rules

[run time] The RVA entry must hold a valid nonzero relative virtual address.
The Field entry must hold a valid index to the Field table.
No duplicate records referencing the same field can exist.

FieldMarshal Table Validity Rules

The Parent entry must hold a valid reference to the Field or Param table.
No duplicate records that contain the same Parent value can exist.
The NativeType entry must hold a valid reference to the #Blob stream, indexing a valid marshaling signature. Chapter 7 describes native types that make up the marshaling signatures.

Constant Table Validity Rules

The Type entry must hold a valid ELEMENT_TYPE_* code, one of the following: bool, char, a signed or unsigned integer of 1 to 8 bytes, string, or object.
The Value entry must hold a valid offset in the #Blob stream.
The Parent entry must hold a valid reference to the Field, Property, or Param table.
No duplicate records that contain the same Parent value can exist.

MemberRef Table Validity Rules

[run time] The Class entry must hold a valid reference to one of the following tables: TypeRef, TypeSpec, ModuleRef, MemberRef, or Method.
[run time] The Class entry of a MemberRef record referencing a field must hold a valid reference to the TypeRef or ModuleRef table.
[run time] The Name entry must hold a valid offset in the #Strings stream, indexing a nonempty string no longer than 1,023 bytes in UTF-8 encoding.
[run time] The name defined by the Name entry must not match the common language runtime reserved names _Deleted* or _VtblGap*.
[run time] The Signature entry must hold a valid offset in the #Blob stream, indexing a valid MemberRef signature. Chapter 7 discusses validity rules for MemberRef signatures.
No duplicate records with all three entries matching can exist.
An item (field or method) that a MemberRef record references must not have the accessibility flag privatescope.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 9: Fields and Data Constants

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 9: Fields and Data Constants