Fields and Data Constants
Fields are one of two kinds of typed and named data locations, the second kind being method local variables, which are discussed in Chapter 10. Fields correspond to the data members and global variables of the C++ world. Apart from their own characteristics, fields can have additional information associated with them that defines the way the fields are laid out by the loader, how they are allocated, how they are marshaled to unmanaged code, and whether they have default values. This chapter examines all aspects of member and global fields, and the metadata used to describe these aspects.
To define a field, you must first provide basic information: the field’s name and signature, and the flags indicating the field’s characteristics, stored in the Field metadata table. Then comes optional information, specific to certain kinds of fields: field marshaling information, found in the FieldMarshal table; field layout information in the FieldLayout table; field mapping information in the FieldRVA table; and a default value in the Constant table.
To reference a field, you must know its owner—TypeRef, TypeDef, or ModuleRef—as well as the field’s name and signature. The references to the fields are kept in the MemberRef table. Figure 9-1 shows the general structure of the field metadata group.
Figure 9-1. Field metadata group
Defining a Field
The central metadata table of the group, the Field table, has the associated token type mdtFieldDef (0x04000000). A record in this table has three entries:
As you can see, a Field record does not contain one vital piece of information: which class or value type owns the field. The information about field ownership is furnished by the class descriptor itself: records in the TypeDef table have FieldList entries, which hold the RID in the Field table where the first of the type’s fields can be found.
In the simplest case, the ILAsm syntax for a field declaration is as follows:
.field<flags> <type> <name>
The owner of a field is the class or value type in the lexical scope of which the field is defined.
A field’s binary flags are defined in the CorHdr.h file in the enumeration CorFieldAttr and can be divided into four groups, as described in the following list. I’m using ILAsm keywords instead of the constant names from CorFieldAttr, as I don’t think the constant names are relevant.
In the field declaration, the type of the field (<type> in the previous syntax formula) is the ILAsm notation of the appropriate single encoded type, which together with the calling convention forms the field’s signature. If you forgot what a field signature looks like, see Chapter 8.
The name of the field (<name> in the previous syntax formula), also included in the declaration, can be a simple name or a composite (dotted) name. ILAsm v1.0 and v1.1 did not allow composite field names, although one could always cheat and put a composite name in single quotation marks, turning it into a simple name.
Examples of field declarations include the following:
.field public static marshal(int) int32 I
.field family string S
.field privateint32& pJ // ERROR! ByRef in field signature!
Referencing a Field
Field references in ILAsm have the notation of
<field_ref> ::= <type> [<class_ref>::]<name>
where <class_ref>—as you know from Chapter 7—is defined as
<class_ref> ::= [<resolution_scope>]<full_type_name>
where
<resolution_scope> ::= [<assembly_ref_alias> ]
| [.module<module_ref_name> ]
For instance, this example uses the IL instruction ldfld, which loads the field value on the stack:
ldfld int32[ .module Another.dll]Foo.Bar::idx
When it is not possible to infer unambiguously from the context whether the referenced member is a field or a method, <field_ref> is preceded by the keyword field. Note that the keyword does not contain a leading dot. The following example uses the IL instruction ldtoken, which loads an item’s runtime handle on the stack:
ldtoken field int32[ .module Another.dll]Foo.Bar::idx
The field references reside in the MemberRef metadata table, which has associated token type 0x0A000000. A record of this table has only three entries:
Instance and Static Fields
Instance fields are created every time a type instance is created, and they belong to the type instance. Static fields, which are shared by all instances of the type, are created when the type is loaded. Some of the static fields (literal and mapped fields) are never allocated. The loader simply notes where the mapped fields reside and addresses these locations whenever the fields are to be addressed. And all the references to the literal fields are replaced with the constants at compile time by the high-level compilers (the IL assembler does not do that, leaving it to the programmer).
A field signature contains no indication of whether the field is static or instance. But since the loader keeps separate books for instance fields and for two out of three kinds of static fields—not for literal static fields—the kind of referenced field is easily discerned from the field’s token. When a field token is found in the IL stream, the JIT compiler does not have to dive into the metadata, retrieve the record, and check the field’s flags; by that time, all the fields have been accounted for and duly classified by the loader.
IL has two sets of instructions for field loading and storing. The instructions for instance fields are ldfld, ldflda, and stfld; those for static fields are ldsfld, ldsflda, and stsfld. An attempt to use a static field instruction with an instance field would result in a JIT compilation failure. The inverse combination would work, but it requires loading the instance pointer on the stack, which is, of course, completely redundant for a static field. The good thing about the possibility of using instance field instructions for static fields is that it allows for accessing both static and instance fields in the same way.
Default Values
Default values reside in the Constant metadata table. Three kinds of metadata items can have a default value assigned and therefore can reference the Constant table: fields, method parameters, and properties. A record of the Constant table has three entries:
The current implementation of the common language runtime and ILAsm allows the constant types described in Table 9-1. (As usual, I’ve dropped the ELEMENT_TYPE_ part of the name.)
Table 9-1. Constant Types
Constant Type |
ILAsm Notation |
Comments |
---|---|---|
I1 |
int8 |
Signed 1-byte integer. |
I2 |
int16 |
Signed 2-byte integer. |
I4 |
int32 |
Signed 4-byte integer. |
I8 |
int64 |
Signed 8-byte integer. |
R4 |
float32 |
4-byte floating point. |
R8 |
float64 |
8-byte floating point. |
CHAR |
char |
2-byte Unicode character. |
BOOLEAN |
bool |
1-byte Boolean, true = 1, false = 0. |
STRING |
<quoted_string>, bytearray |
Unicode string. |
CLASS |
nullref |
Null object reference. The value of the constant of this type must be a 4-byte integer containing 0. |
The ILAsm syntax for defining the default value of a field is as follows:
<field_def_const> ::= .field<flags> <type> <name>
= <const_type> [( <value> )]
The value in parentheses is mandatory for all constant types except nullref. For example,
.field public int32i = int32(123)
.field public static literal bool b = bool(true)
.field private float32f = float32(1.2345)
.field public static int16ii = int16(0xFFE0)
.field public object o = nullref
Defining integer and Boolean constants—not to mention nullref—is pretty straightforward, but floating-point constants and strings can present some difficulties.
Floating-point numbers have special cases, such as positive infinity, negative infinity, and not-a-number (NAN), that cannot be presented textually in simple floating-point format. In these special cases, the floating-point constants can alternatively be represented as integer values with a matching byte count. The integer values are not converted to floating-point values; instead, they represent an exact bit image of the floating-point values (in IEEE-754 floating-point format used by the CLR). For example,
.field public float32fPosInf = float32(0x7F800000)
.field public float32fNegInf = float32(0xFF800000)
.field public float32fNAN = float32(0xFFC00000)
Like all other constants, string constants are stored in the #Blob stream. In this regard, they differ from user-defined strings, which are stored in the #US stream. What both kinds of strings have in common is that they are supposed to be Unicode (UTF-16). I say “supposed to be” because the only Unicode-specific restrictions imposed on these strings are that their sizes are reported in Unicode characters and that their byte counts must be even. Otherwise, these strings are simply binary objects and might or might not contain invalid Unicode characters.
Notice that the type of the constant does not need to match the type of the item to which this constant is assigned—in this case, the type of the field. That is, the match is not required by the CLR, which cares nothing about the constants: the constants are provided for compilers’ information only, and the high-level compilers, having encountered a reference to a constant in the source code, emit explicit instructions to assign respective values to fields or parameters.
In ILAsm, a string constant can be defined either as a composite quoted string or as a byte array:
.field public static string str1 = "Isn't" + " it " + "marvelous!"
.field public static string str2 = bytearray(00 01 FF FE 1A 00 00 )
When a string constant is defined as a simple or composite quoted string, this string is converted to Unicode before being stored in the #Blob stream. In the case of a bytearray definition, the specified byte sequence is stored “as is” and padded with one 0 byte if necessary to make the byte count even. In the example shown here, the default value for the str2 field will be padded to bring the byte count to eight (four Unicode characters). And if the bytes specified in thebytearray are invalid Unicode characters, it will surely be discovered when you try to print the string, but not before.
Assigning default values to fields (and parameters) seems to be such a compelling technique that you might wonder why I did not employ it in the simple sample discussed in Chapter 1. Really, defining the default values is a great way to initialize fields—right? Wrong. Here’s a tricky question. Suppose that you define a member field as follows:
.field public static int32ii = int32(12345)
What will the value of the field be when the class is loaded? Correct answer: 0. Why? Default values specified in the Constant table are not used by the loader to initialize the items to which they are assigned. If you want to initialize a field to its default value, you must explicitly call the respective Reflection method to retrieve the value from metadata and then store this value in the field. This doesn’t sound too nice, and I think that the CLR could probably do a better job with field initialization—and with literal fields as well.
Let me remind you once again that literal fields are not true fields. They are not laid out by the loader, and they cannot be directly accessed from IL. From the point of view of metadata, however, literal fields are nevertheless valid fields having valid tokens, which allow the constant values corresponding to these fields to be retrieved by Reflection methods. The common language runtime does not provide an implicit means of accessing the Constant table, which is a pity. It would certainly be much nicer if the JIT compiler would compile the ldsfld instruction into the retrieval of the respective constant value, instead of failing, when this instruction is applied to a literal field. But such are the facts of life, and I am afraid we cannot do anything about it at the moment.
Given this situation, literal fields without associated Constant records are legal from the loader’s point of view, but they are utterly meaningless. They serve no purpose except to inflate the Field metadata table.
But how do the compilers handle literal fields? If every time a constant from an enumeration—represented, as you know, by a literal field—was used, the compiler emitted a call to the Reflection API to get this constant value, then one could imagine where it would leave the performance. Most compilers are smarter than that and resolve the literal fields at compile time, replacing references to literal fields with explicit constant values of these fields so that the literal fields never come into play at run time.
ILAsm, following common language runtime functionality to the letter, allows the definition of the Constant metadata but does nothing about the symbol-to-value resolution at compile time. From the point of view of ILAsm and the runtime, the enumeration types are real, as distinctive types, but the symbolic constants listed in the enumerations are not. You can reference an enum, but you can never reference its literal fields.
Mapped Fields
It is possible to provide unconditional initialization for static fields by mapping the fields to data defined in the PE file and setting this data to the initializing values. The syntax for mapping a field to data in ILAsm is as follows:
<mapped_field_decl> ::= .field<flags> <type> <name> at<data_label>
Here’s an example:
.field public static int64ii at data_ii
The nonterminal symbol <data_label> is a simple name labeling the data segment to which the field is mapped. The ILAsm compiler allows a field to be mapped either to the “normal” data section (.sdata) or to the thread local storage section (.tls), depending on the data declaration to which the field mapping refers. A field can be mapped only to data residing in the same module as the field declaration. (For information about data declaration, see the following section, “Data Constants Declaration.”)
Mapping a field results in emitting a record into the FieldRVA table, which contains two entries:
Two or more fields can be mapped to the same location, but each field can be mapped to one location only. Duplicate FieldRVA records with the same Field values and different RVA values are therefore considered invalid metadata. The loader is not particular about duplicate FieldRVA records, however; it simply uses the first one available for the field and ignores the rest.
The field mapping technique has some catches. The first catch (well, not much of a catch, actually) is that, obviously, only static fields can be mapped. Even if you could map instance fields, each instance would be mapped to the same physical memory, making the fields de facto static (shared by all instances) anyway. Mapping instance fields is considered invalid metadata, but it has no serious consequences for the loader; if a field is not static, the loader does not even check to see whether the field is mapped. The only real effect of mapping instance fields is a bloated FieldRVA table. The IL assembler treats mapping of an instance field as an error and produces an error message.
The second catch is to an extent a derivative from the first catch: the mapped static fields are “the most static of them all.” When multiple application domains are sharing the same process (as in the case of ASP.NET, for example) and several application domains are sharing a loaded assembly, the mapped fields of this assembly are shared by all application domains, unlike the “normal” static fields, which are individual per application domain.
The third catch is that a field cannot be mapped if its type contains object references (objects or arrays). The data sections are out of the garbage collector’s reach, so the validity of object references placed in the data sections cannot be guaranteed. If the loader finds object references in a mapped field type, it throws a TypeLoad exception and aborts the loading, even if the code is run in full-trust mode from a local drive and all security-related checks are disabled. The loader checks for the presence of object references on all levels of the field type; in other words, it checks the types of all the fields that make up the type, checks the types of fields that make up those types, and so on.
The fourth catch is that in the verifiable code a field cannot be mapped if its type (value type, of course) contains nonpublic instance fields. The reasoning behind this limitation is that if you map a field with a type containing nonpublic members, you can map another field of some all-public type to the same location and, through this second mapping, get unlimited access to nonpublic member fields of the first type. The loader checks for the presence of nonpublic members on all levels of the mapped field type and throws a TypeLoad exception if it finds such members. This check, unlike the check for object references, is performed only when code verification is required; it is disabled when the code is run from the local drive in full-trust mode.
Note, however, that a mapped field itself can be declared nonpublic without ill consequences. This is based on the simple assumption that if developers decide to overlap their own nonpublic field and thus defy the accessibility control mechanism of the common language runtime object model, they probably know what they are doing.
The last catch worth mentioning is that the initialization data is provided “as is,” exactly as it is defined in the PE file. And if you run the code on a platform other than the one on which the PE file was created, you can face some unpleasant consequences. As a trivial example, suppose you map an int32 field to data containing bytes 0xAA, 0xBB, 0xCC, and 0xDD. On a little endian platform (for instance, an Intel platform), the field is initialized to 0xDDCCBBAA, while on a big endian platform . . . well, you get the picture.
All these catches do not preclude the compilers from using field mapping for initialization.
Version 2.0 or later of the IL assembler provides a means of mapping the fields onto an explicitly specified memory address. In this case, the <data label> name must have the form @<RVA in decimal format>. This technique can hardly be recommended for general use because of the obvious hazards associated with it (you usually don’t know the target RVA before the program has been compiled), but in certain limited cases (when you do know the RVA beforehand) it can be useful. Consider, for example, the following declaration:
.field public static int16NTHeaderMagic at@152
Data Constants Declaration
A data constant declaration in ILAsm has the syntax of
<data_decl> ::= .data[ tls] [<data_label> = ] <data_items>
where <data_label> is a simple name, unique within the module
<data_items> ::= { <data_item> [ , <data_item>* ] } | <data_item>
and where
<data_item> ::= <data_type> [ ( <value> ) ] [ [ <count> ] ]
Data constants are emitted to the .sdata section or the .tls section, depending on the presence of the tls keyword, in the same sequence in which they were declared in the source code. The unlabeled data declarations can be used for padding between the labeled data declarations and probably for nothing else, since without a label it’s impossible to map a field to this data. Unlabeled—or, more precisely, unreferenced—data might not survive round-tripping (disassembly-reassembly) because the IL disassembler outputs only referenced data.
The nonterminal symbol <data_type> specifies the data type. (See Table 9-2.) The data type is used by the IL assembler exclusively for identifying the size and byte layout of <value> (in order to emit the data correctly) and is not emitted as any part of metadata or the data itself. Having no way to know what the type was intended to be when the data was emitted, the IL disassembler always uses the most generic form, a byte array, for data representation.
Table 9-2. Types Defined for Data Constants
If <value> is not specified, the data is initialized to a value with all bits set to zeros. Thus, it is still “initialized data” in terms of the PE file structure, meaning that this data is part of the PE file disk image.
The optional <count> in square brackets indicates the repetition count of the data item. Here are some examples:
.data tls T_01 = int32(1234)
// 4 bytes in .tls section, value 0x000004D2
.data tls int32
// unnamed 4 bytes padding in .tls section, value doesn't matter
.data D_01 = int32(1234)[32] // 32 4-byte integers in .sdata section,
// Each equal to 0x000004D2
.data D_02 = char*("Helloworld!")// Unicode string in .sdata section
Explicit Layouts and Union Declaration
Although instance fields cannot be mapped to data, it is possible to specify the positioning of these fields directly. As you might remember from Chapter 7, a class or a value type can have an explicit flag—a special flag indicating that the metadata contains an exact recipe for the loader regarding the layout of this class. This information is kept in the FieldLayout metadata table, whose records contain these two entries:
In ILAsm, the field offset is specified by putting the offset value in square brackets immediately after the .field keyword, as shown here:
.class public value sealed explicit MyStruct
{
.field[0] public int32ii
.field[4] public float64dd
.field[12] public bool bb
}
Only instance fields can have offsets specified. Since static fields are not part of the class instance layout, specifying explicit offsets for them is meaningless and is considered a metadata error. If an offset is specified for a static field, the loader behaves the same way it does with mapped instance fields: if the field is static, the loader does not check to see whether the field has an offset specified. Consequently, FieldLayout records referencing the static fields are nothing more than a waste of memory.
In a class that has an explicit layout, all the instance fields must have specified offsets. If one of the instance fields does not have an associated FieldLayout record, the loader throws a TypeLoad exception and aborts the loading. Obviously, a field can have only one offset, so duplicate FieldLayout records that have the same Field entry are illegal. This is not checked at run time because this metadata invalidity is not critical: the loader takes the first available FieldLayout record for the current field and ignores any duplicates. It’s worth remembering, though, that while supplying wrong metadata doesn’t always lead to aborted program, it almost certainly leads to unexpected (by the programmer) behavior of the application.
The placement of object references (classes, arrays) is subject to a general limitation: the fields of object reference types must be aligned on pointer size—either 4 or 8 bytes, depending on the platform.
.class public value sealed explicit MyStruct
{
.field[0] public int16ii
.field[2] public string str //Illegal on 32-bit and 64- bit
.field[6] public int16jj
.field[8] public int32kk
.field[12] public object oo //Illegal on 64-bit platform
.field[16] public int32[]iArr//Legal on both platforms
}
This alignment requirement may cause platform dependence, unless you decide to always align the fields of object reference types on 8 bytes, which would suit both 32-bit and 64-bit platforms.
Value types with an explicit layout containing object references must have a total size equal to a multiple of the pointer size. The reason is pretty obvious: imagine what happens if you declare an array of such value types.
Explicit layout is a standard way to implement unions in IL. By explicitly specifying field offsets, you can make fields overlap however you want. Let’s suppose, for example, that you want to treat a 4-byte unsigned integer as such, as a pair of 2-byte words, or as 4 bytes. In C++ notation, the respective constructs look like this:
union MultiDword {
DWORD dw;
union {
struct {
WORD w1;
WORD w2;
};
struct {
BYTE b1;
BYTE b2;
BYTE b3;
BYTE b4;
};
};
};
In ILAsm, the same union will be written like so:
.class public value sealed explicit MultiDword
{
.field[0] public uint32dw
.field[0] public uint16w1
.field[2] public uint16w2
.field[0] public uint8b1
.field[1] public uint8b2
.field[2] public uint8b3
.field[3] public uint8b4
}
The only limitation imposed on the explicit-layout unions is that if the overlapping fields contain object references, these object references must not overlap with any other field.
.class public value sealed explicit StrAndIndex
{
.field[0] public string Str// Reference, size 4 bytes
// on 32-bit platform
.field[4] public uint32Index
}
.class public value sealed explicit MyUnion
{
.field[0] public valuetype StrAndIndex str_and_index
.field[0] public uint64whole_thing// Illegal!
.field[0] public string str// Legal, but unverifiable
.field[2] public uint32half_and_half// Illegal!
.field[4] public uint32index// Legal, object reference
// not overlapped
}
Such “unionizing” of the object references would provide the means for directly modifying these references, which could thoroughly disrupt the functioning of the garbage collector. The loader checks explicit layouts for object reference overlap; if any is found, it throws a TypeLoad exception and aborts the loading.
This rule has an interesting exception, though: the object references can be overlapped with other object references (only full overlapping is allowed; partial overlapping is forbidden). This looks to me like a gaping hole in the type safety. On the other hand, this overlapping is allowed only in full-trust mode, and in full-trust mode you can do even worse things (run native unmanaged code, for example).
A field can also have an associated FieldLayout record if the type that owns this field has a sequential layout. In this case, the OffSet entry of the FieldLayout record holds a field ordinal rather than an offset. The fields belonging to a sequential-layout class needn’t have associated FieldLayout records, but if one of the class’s fields has such an associated record, all the rest must have one too. The ILAsm syntax for field declaration in types with sequential layout is similar to the case of the explicit layout, except the integer value in square brackets represents the field’s ordinal rather than the offset:
.class public value sealed sequential OneTwoThreeFour
{
.field[0] public uint8one
.field[1] public uint8two
.field[2] public uint8three
.field[3] publicuint8four
}
Global Fields
Fields declared outside the scope of any class are known as global fields. They don’t belong to a class but instead belong to the module in which they are declared. A module is represented by a special TypeDef record with RID=1 under the name <Module>, so all the formalities that govern how field records are identified by reference from their parent TypeDef records are observed.
Global fields must be static. Since only one instance of the module exists when the assembly is loaded and because it is impossible to create alternative instances of the module, this limitation seems obvious.
Global fields can have public, private, or privatescope accessibility flags—at least that’s what the metadata validity rules say. As you saw in Chapter 1, however, a global item (a field or a method) can have any accessibility flag, and the loader interprets this flag only as assembly, private, or privatescope. The public, assembly, and famorassem flags are all interpreted as assembly, while the family, famandassem, and private flags are all interpreted as private. The global fields cannot be accessed from outside the assembly, so they don’t have true public accessibility. And as no type can be derived from <Module>, the question about family-related accessibility is moot.
Global fields can be accessed from anywhere within the module, regardless of their declared accessibility. In this regard, the classes that are declared within a module and use the global fields have the same access rights as if they were nested in the module. The metadata contains no indications of such nesting, of course.
A reference to a global field declared in the same module has no <class_ref>:: part.
<global_field_ref> ::= [ field] <field_type> <field_name>
The keyword field is used in particular cases when the nature of the reference cannot be inferred from the context, for example in the ldtoken instruction.
A reference to a global field declared in a different module of the assembly also lacks the class name but has resolution scope.
<global_field_ref> ::= [ field] <field_type> [ .module<mod_name>]::<field_ name>
The following are two examples of such declarations:
ldsfld int32globalInt
// field globalInt from this module
ldtoken field int32[ .module supporting.dll]::globalInt
// globalInt from other module
Since the global fields are static, you cannot explicitly specify their layout except by mapping them to data. Thus, your 4-2-1-byte union MultiDword would look like this if you implemented it with global fields:
.field public static uint32dw at D_00
.field public static uint16w1 at D_00
.field public static uint16w2 at D_02
.field public static uint8b1 at D_00
.field public static uint8b2 at D_01
.field public static uint8b3 at D_02
.field public static uint8b4 at D_03
.data D_00 = int8(0)
.data D_01 = int8(0)
.data D_02 = int8(0)
.data D_03 = int8(0)
...
ldc.i4.1
stsfld uint8b3// Set value of third byte
Fortunately, youdon’t have to do that every time you need a global union. Instead, you can declare the value type MultiDword exactly as before and then declare a global field of this type:
.field public static valuetype MultiDword multi_dword
...
ldc.i4.1
ldsflda valuetype MultiDword multi_dword
// Load reference to the field
// As instance of MultiDword
stfld uint8MultiDword::b3 // Set value of third byte
Constructors vs. Data Constants
You’ve already taken a look at field mapping as a technique of field initialization, and I’ve listed the drawbacks and limitations of this technique. Field mapping has this distinct “unmanaged” scent about it, but the compilers routinely use it for field initialization nevertheless. Is there a way to get the fields initialized without mapping them? Yes, there is.
The common language runtime object model provides two special methods, the instance constructor (.ctor) and the class constructor (.cctor), a.k.a. the type initializer. We’re getting ahead of ourselves a bit here; Chapter 10 discusses methods in general and constructors in particular, so I won’t concentrate on the details here. For now, all you need to know about .ctor and .cctor is that .ctor is executed when a new instance of a type is created, and .cctor is executed after the type is loaded and before any one of the type members is accessed. The class constructors are static and can deal with static members of the type only, so you have a perfect setup for field initialization: .cctors take care of static fields, and .ctors take care of instance fields.
But how about global fields? The good news is that you can define a global .cctor. Field initialization by constructors is vastly superior to field mapping, with none of its limitations, as described earlier in the section “Mapped Fields.” The catch? Unfortunately, initialization by constructors must be executed at run time, burning processor cycles, whereas mapped fields simply “are there” after the module has been loaded. The mapped fields don’t require additional operations for the initialization. Whether this price is worth the increased freedom and safety regarding field initialization depends on the concrete situation, but in general I think it is.
Let me illustrate the point by building an alternative enumeration. Since all the values of an enumeration are stored in literal fields, which are inaccessible from IL directly, the compilers replace references to these fields with the respective values at compile time. You can use a very simple enum as a model, like so:
.class public enum sealed MagicNumber
{
.field private specialname int32value__
.field public static literal valuetype
MagicNumber MagicOne = int32(123)
.field public static literal valuetype
MagicNumber MagicTwo = int32(456)
.field public static literal valuetype
MagicNumber MagicThree = int32(789)
}
Let’s suppose that your code uses the symbolic constants of an enumeration declared in a third-party assembly. You compile the code, and the symbolic constants are replaced with their values. Forget for a moment that you must have that third-party assembly available at compile time. But you will need to recompile the code every time the enumeration changes, and you have no control over the enumeration because it is defined outside your jurisdiction. In another scenario, when you declare an enumeration in one of your own modules, you must recompile all the modules that reference this enumeration once it is changed.
Let’s suppose also—for the sake of an argument—that you don’t like this situation, so you decide to devise your own enumeration, like so:
.class public value sealed MagicNumber
{
.field public int32_value_// Specialname value__ is
// reserved for enums
.field public static valuetype MagicNumber MagicOne at D_00
.field public static valuetype MagicNumber MagicTwo at D_04
.field public static valuetype MagicNumber MagicThree at D_08
}
.data D_00 = int32(123)
.data D_04 = int32(456)
.data D_08 = int32(789)
This solution looks good, except in the platform-independence department. You conquered the recompilation problem and can at last address the symbolic constants by their symbols (names), through field access instructions. This approach presents a problem, though: the fields representing the symbolic constants can be written to.
Let’s try again with a class constructor; refer to the sample MyEnums.il on the Apress web site.
.class public value sealed MagicNumber
{
.field private int32_value_// Specialname value__ is
// reserved for enums
.field public static initonly valuetype MagicNumber MagicOne
.field public static initonly valuetype MagicNumber MagicTwo
.field public static initonly valuetype MagicNumber MagicThree
.method public static specialname void .cctor()
{
ldsflda valuetype MagicNumber MagicNumber::MagicOne
ldc.i4123
stfld int32MagicNumber::_value_
ldsflda valuetype MagicNumber MagicNumber::MagicTwo
ldc.i4456
stfld int32MagicNumber::_value_
ldsflda valuetype MagicNumber MagicNumber::MagicThree
ldc.i4789
stfld int32MagicNumber::_value_
ret
}
.method public int32ToBase()
{
ldarg.0// Instance pointer
ldfld int32MagicNumber::_value_
ret
}
}
This seems to solve all the remaining problems. The initonly flag on the static fields protects them from being overwritten outside the class constructor. Embedding the numeric values of symbolic constants in the IL stream takes care of platform dependence. You are not mapping the fields, so you are free to use any type as the underlying type of your enumeration. And, of course, declaring the _value_ field private protects it from having arbitrary values assigned to it, while public method ToBase() allows everybody interested to query the value of, er, _value_.
Alas, this solution does have a hidden problem: the initonly flag does not provide full protection against arbitrary field overwriting. The operations ldflda (ldsflda) and stfld (stsfld) on initonly fields are unverifiable outside the constructors. They’re unverifiable but not impossible, which means that if the verification procedures are disabled, the initonly fields can be overwritten in any method.
It looks like my attempts to devise a “nice” equivalent of an enum failed. If you have any fresh ideas in this regard, let me know.
Summary of Metadata Validity Rules
The field-related metadata tables are Field, FieldLayout, FieldRVA, FieldMarshal, Constant, and MemberRef. The records of these tables have the following entries:
Field Table Validity Rules
FieldLayout Table Validity Rules
FieldRVA Table Validity Rules
FieldMarshal Table Validity Rules
Constant Table Validity Rules
MemberRef Table Validity Rules
3.141.7.144