Chapter 4. Classes, Structs, and Objects

Everything is an object! At least, that is the view from inside the CLR and the C# programming language. This is no surprise, because C# is, after all, an object-oriented language. The objects that you create through class definitions in C# have all the same capabilities as the other predefined objects in the system. In fact, keywords in the C# language such as int and bool are merely aliases to predefined value types within the System namespace, in this case System.Int32 and System.Boolean, respectively.

Note

This chapter is rather long, but don't allow it to be intimidating. In order to cater to a wider audience, this chapter covers as much C# base material as reasonably possible. If you're proficient with either C++ or Java, you may find yourself skimming this chapter and referencing it as you read subsequent chapters. Some of the topics touched upon in this chapter are covered in more detail in later chapters.

The first section of this chapter covers class (reference type) definitions, which is followed by a section discussing structure (value type) definitions. These are the two most fundamental classifications of types in the .NET runtime. Then you'll learn about System.Object (the base type of all types), the nuances of creating and destroying instances of objects, expressions for initializing objects, and the topic of boxing and unboxing. I then cover newer C# features such as anonymous types and named and optional arguments. Finally, I cover inheritance and polymorphism, and the differences between inheritance and containment with regard to code reuse.

The ability to invent your own types is paramount to object-oriented systems. The cool thing is that because even the built-in types of the language are plain-old CLR objects, the objects you create are on a level playing field with the built-in types. In other words, the built-in types don't have special powers that you cannot muster in user-defined types. The cornerstone for creating these types is the class definition. Class definitions, using the C# class keyword, define the internal state and the behaviors associated with the objects of that class's type. The internal state of an object is represented by the fields that you declare within the class, which can consist of references to other objects, or values. Sometimes, but rarely, you will hear people describe this as the "shape" of the object, because the instance field definitions within the class define the memory footprint of the object on the heap.

The objects created from a class encapsulate the data fields that represent the internal state of the objects, and the objects can tightly control access to those fields. The behavior of the objects is defined by implementing methods, which you declare and define within the class definition. By calling one of the methods on an object instance, you initiate a unit of work on the object. That work can possibly modify the internal state of the object, inspect the state of the object, or anything else for that matter.

You can define constructors, which the system executes whenever a new object is created. You can also define a method called a finalizer, which works when the object is garbage-collected. As you'll see in Chapter 13, you should avoid finalizers if at all possible. This chapter covers construction and destruction in detail, including the detailed sequence of events that occur during the creation of an object.

Objects support the concept of inheritance, whereby a derived class inherits the fields and methods of a base class. Inheritance also allows you to treat objects of a derived type as objects of its base type. For example, a design in which an object of type Dog derives from type Animal is said to model an is-a relationship (i.e., Dog is-a(n) Animal). Therefore, you can implicitly convert references of type Dog to references of type Animal. Here, implicit means that the conversion takes the form of a simple assignment expression. Conversely, you can explicitly convert references of type Animal, through a cast operation, to references of type Dog if the particular object referenced through the Animal type is, in fact, an object created from the Dog class. This concept, called polymorphism, whereby you can manipulate objects of related types as though they were of a common type, should be familiar to you. Computer wonks always try to come up with fancy five-dollar words for things such as this, and polymorphism is no exception, when all it means is that an object can take on multiple type identities. This chapter discusses inheritance as well as its traps and pitfalls.

The CLR tracks object references. This means each variable of reference type actually contains a reference to an object on the heap (or is null, if it doesn't currently refer to an object). When you copy the value of a reference-type variable into another reference-type variable, another reference to the same object is created—in other words, the reference is copied. Thus, you end up with two variables that reference the same object. In the CLR, you have to do extra work to create copies of objects—e.g., you must implement the ICloneable interface or a similar pattern.

All objects created from C# class definitions reside on the system heap, which the CLR garbage collector manages. The GC relieves you from the task of cleaning up your objects' memory. You can allocate them all day long without worrying about who will free the memory associated with them. The GC is smart enough to track all of an object's references, and when it notices that an object is no longer referenced, it marks the object for deletion. Then, the next time the GC compacts the heap, it destroys the object and reclaims the memory.

Note

In reality, the process is much more complex than this. There are many hidden nuances to how the GC reclaims the memory of unused objects. I talk about this in the section titled "Destroying Objects" later this chapter. Consider this: The GC removes some complexity in one area, but introduces a whole new set of complexities elsewhere.

Along with classes, the C# language supports the definition of new value types through the struct keyword. Value types are lightweight objects that typically don't live on the heap, but instead live on the stack. To be completely accurate, a value type can live on the heap, but only if it is a field inside an object on the heap. Value types cannot be defined to inherit from another class or value type, nor can another value type or class inherit from them.

Value types can have constructors, but they cannot have a finalizer. By default, when you pass value types into methods as parameters, the method receives a copy of the value. I cover the many details of value types, along with their differences from reference types, in this chapter and in Chapter 13.

That said, let's dive in and get to the details. Don't be afraid if the details seem a little overwhelming at first. The fact is you can start to put together reasonable applications with C# without knowing every single detailed behavior of the language. That's a good thing, because C#, along with the Visual Studio IDE, is meant to facilitate rapid application development. However, the more details you know about the language and the CLR, the more effective you'll be at developing and designing robust C# applications.

Class Definitions

Class definitions in C# look similar to class definitions in C++ and Java. Let's look at a simple class now, so you can get a feel for things. In the following code, I've shown the basic pieces for creating a class definition:

//NOTE: This code is not meant to be compiled as-is
[Serializable]
public class Derived : Base, ICloneable
{
    private Derived( Derived other ) {
        this.x = other.x;
    }

    public object Clone() {  //implement the IClonable.Clone interface
        return new Derived( this );
    }

    private int x;
}

This class declaration defines a class Derived, which derives from the class Base and also implements the ICloneable interface.

Note

If this is the first time you've encountered the interface concept, don't worry. Chapter 5 is devoted entirely to interfaces and contract-based programming.

The access modifier in front of the class keyword controls the visibility of the type from outside the assembly (I describe assemblies in Chapter 2). The class Derived is publicly accessible, which means that consumers of the assembly that contains this class can create instances of it. This type contains a private constructor that is used by the public method Clone, which implements the ICloneable interface. When a class implements an interface, you are required to implement all of the methods of the interface.

You can apply attributes to just about any nameable entity within the CLR type system. In this case, I've attached the Serializable attribute to the class to show an example of attribute usage syntax. These attributes become part of the metadata that describes the type to consumers. In addition, you can create custom attributes to attach to various entities, such as classes, parameters, return values, and fields, which easily exercise the capabilities of Aspect Oriented Programming (AOP).

Fields

Fields are the bread and butter that make up the state of objects. Typically, you declare a new class only if you need to model some new type of object with its own custom internal state, represented by its instance fields.

You declare fields with a type, just like all other variables in C. The possible field modifiers are as follows:

new
public
protected
internal
private
static
readonly
volatile

Many of these are mutually exclusive. Those that are mutually exclusive control the accessibility of the field and consist of the modifiers public, protected, internal, and private. I discuss these in more detail in the "Accessibility" section. However, for now, I'll detail the remaining modifiers.

The static modifier controls whether a field is a member of the class type or a member of objects instantiated from the type. In the absence of the static modifier, a field is an instance field, and thus each object created from the class has its own copy of the field. This is the default. When decorated with the static modifier, the field is shared among all objects of a class on a per-application-domain basis.

Note that static fields are not included in the memory footprint of the object instances. In other words, objects don't encapsulate the static fields; rather, types encapsulate the static fields. It would be inefficient for all instances of the object to contain a copy of the same static variable in their memory footprint. And worse than that, the compiler would have to generate some sort of code under the hood to make sure that when the static field is changed for one instance, it would change the field in all instances. For this reason, the static fields actually belong to the class and not to the object instances. In fact, when a static field is publicly accessible outside the class, you use the class name and not the object instance variable to access the field.

Note

Static fields have another important quality: They are global to the application domain within which their containing types are loaded. Application domains are an abstraction that is an isolation mechanism similar to the process abstraction within an operating system, but they are more lightweight. You can have multiple application domains in one operating system process. If your CLR process contains multiple application domains, each will have a copy of the class's static fields. A static field's value in one application domain can be different from the same static field in another application domain. Unless you create extra application domains yourself, your application will have only one application domain that your code runs in: the default application domain. However, it's important to note this distinction when working in environments such as ASP.NET, where the concept of the application domain is used as the isolation mechanism between two ASP.NET applications. In fact, you can easily jump to the conclusion that ASP.NET was the driving motivation behind the application domain notion.

You can initialize fields during object creation in various ways. One straightforward way of initializing fields is through initializers. You use these initializers at the point where the field is defined, and they can be used for either static or instance fields—for example:

private int x = 789;
   private int y;
   private int z = A.InitZ();

The field x is initialized using an initializer. The notation is rather convenient. Note that this initialization occurs at run time and not at compile time. Therefore, this initialization statement could have used something other than a constant. For example, the variable z is initialized by calling a method, A.InitZ. At first, this field initialization notation may seem like a great shortcut, saving you from having to initialize all of the fields inside the body of the constructor. However, I suggest that you initialize instance fields within the instance constructor body for complex type definitions. I cover static and instance initialization in all of its gory detail in the "Creating Objects" section later in this chapter, and you'll see why initializing fields in the constructor can facilitate code that's easier to maintain and debug.

Another field modifier that comes in handy from time to time is the readonly modifier. As you can guess, it defines the field so that you can only read from it. You can write to it only during object creation. You can emulate the same behavior with greater flexibility using a read-only property, which I discuss in the section titled "Properties." Static readonly fields are initialized in a static constructor, while instance readonly fields are initialized in an instance constructor. Alternatively, you can initialize readonly fields using initializers at the point of their declaration in the class definition, just as you can do with other fields. Within the constructor, you can assign to the readonly field as many times as necessary. Only within the constructor can you pass the readonly field as a ref or out parameter to another function. Consider the following example:

public class A
{
   public A()
   {
      this.y = 456;

      // We can even set y again.
      this.y = 654;

      // We can use y as a ref param.
      SetField( ref this.y );
   }

   private void SetField( ref int val )
   {
      val = 888;
   }

   private readonly int x = 123;
   private readonly int y;
   public const     int z = 555;

   static void Main()
   {
      A obj = new A();
System.Console.WriteLine( "x = {0}, y = {1}, z = {2}",
                                obj.x, obj.y, A.z );
   }
}

You should note one important nuance here: The z field is declared using the const keyword. At first, it may seem that it has the same effect as a readonly field, but it does not. First, a const field such as this is known and used at compile time. This means that the code generated by the compiler in the Main routine can be optimized to replace all uses of this variable with its immediate const value. The compiler is free to use this performance trick, simply because the value of the field is known at compile time. Also, note that you access the const field using the class name rather than the instance name. This is because const values are implicitly static and don't affect the memory footprint, or shape, of the object instances. Again, this makes sense because the compiler would optimize away access to that memory slot in the object instance anyway, because it would be the same for all instances of this object.

But one more detail is lurking here with regard to the difference between readonly and const fields. readonly fields are guaranteed to be computed at run time. Therefore, suppose you have one class with both a readonly field and a const field that lives in assembly A, and code in assembly B creates and uses an instance of that class in assembly A. Now, suppose you rebuild assembly A at a later date, and you modify the field initializers for the readonly field and the const field. The consumer in assembly B would see the change in the const field only after you recompile the code in assembly B. This behavior is expected, because when assembly B was built referencing the initial incarnation of assembly A, the compiler optimized the use of the const values by inserting the literal value into the generated IL code. Because of this, you need to be careful when deciding whether to use a readonly field or a const value and, if you choose to use a readonly field, you need to choose carefully between using a readonly field or a read-only property, which I introduce in a later section titled "Properties." Properties provide greater design-time and maintenance-time flexibility over readonly fields.

Lastly, the volatile modifier indicates, as its name implies, that the field is sensitive to read and write timing. Technically, the modifier indicates to the compiler that the field may be accessed or modified by the operating system or hardware running on that system, or more likely, by another thread at any time. The latter case is the most typical. Normally, access to a field by multiple threads only becomes a problem when you don't use any synchronization techniques, such as when not using the C# lock statement or OS synchronization objects. This is typically called lock free programming. When a field is marked as volatile, it tells the implementation—and by that, I mean the CLR JIT compiler—that it must not apply optimizations to that field's access. The fact is, you'll rarely ever need the volatile modifier or come into contact with it.

I've already covered some of the ways that field initialization can occur within an object instance during class initialization. I cover many more nuances of field initialization in the "Field Initialization" section. However, note that C# has rules about default field initialization that are applied before any field initialization code that occurs in the constructor method's code block. C#, by default, creates verifiably type-safe code, which is guaranteed not to use uninitialized variables and fields. The compiler goes to great lengths to ensure that this requirement is satisfied. For example, it initializes all fields, whether they're instance or static fields, to a default value before any of your variable initializers execute. The default value for just about anything can easily be represented by either the value 0 or null. For example, you can initialize an integer or any other similar value type by setting all of the bits in its storage space to 0. For reference types, you set the initial default value to null. Again, this is usually the result of the implementation setting all of the bits of the reference to 0. These default initializations occur before any code executes on the instance or class. Therefore, it's impossible to inspect the uninitialized values of an object or a class during initial construction.

Constructors

Constructors are called when a class is first loaded by the CLR or an object is created. There are two types of constructors: static constructors and instance constructors.

A class can have only one static constructor which is called when the type is loaded by the CLR, and it can have no parameters. The name of the static constructor must match the name of the class it belongs to. As with any other class member, you can attach metadata attributes to the static constructor.

Instance constructors, on the other hand, are called when an instance of a class is created. They typically set up the state of the object by initializing the fields to a desired predefined state. You can also do any other type of initialization work, such as connecting to a database and opening a file. A class can have multiple instance constructors that can be overloaded (i.e., have different parameter types). As with the static constructor, instance constructor names must match the name of the defining class. One notable capability of an instance constructor is that of the optional constructor initializer clause. Using the initializer, which follows a colon after the parameter list, you can call a base class constructor or another constructor in the same class through the keywords base and this, respectively. I have more to say about the base keyword in the section titled "base Keyword." Consider the following sample code and the two comments:

class Base
{
   public int x = InitX();

   public Base( int x )
   {
      this.x = x;    // disambiguates the parameter and the instance variable
   }
}

class Derived : Base
{
   public Derived( int a )
      :base( a )    // calls the base class constructor
   {
   }
}

Methods

A method defines a procedure that you can perform on an object or a class. If the method is an instance method, you can call it on an object. If the method is a static method, you can call it only on the class. The difference is that instance methods have access to both the instance fields of the object instance and the static fields of the class, whereas static methods don't have access to instance fields or methods. Static methods can only access static class members.

Methods can have metadata attributes attached to them, and they can also have optional modifiers attached. I discuss them throughout this chapter. These modifiers control the accessibility of the methods, as well as facets of the methods that are germane to inheritance. Every method either does or does not have a return type. If a method doesn't have a return type, the declaration must declare the return type as void. Methods may or may not have parameters.

Static Methods

You call static methods on the class rather than on instances of the class. Static methods only have access to the static members of the class. You declare a method as static by using the static modifier, as in the following example:

public class A
{
   public static void SomeFunction()
   {
      System.Console.WriteLine( "SomeFunction() called" );
   }

   static void Main()
   {
      A.SomeFunction();
      SomeFunction();
   }
}

Notice that both methods in this example are static. In the Main method, I first access the SomeFunction method using the class name. I then call the static method without qualifying it. This is because the Main and SomeFunction methods are both defined in the same class and are both static methods. Had SomeFunction been in another class, say class B, I would have had no choice but to reference the method as B.SomeFunction.

Instance Methods

Instance methods operate on objects. In order to call an instance method, you need a reference to an instance of the class that defines the method. The following example shows the use of an instance method:

public class A
{
   private void SomeOperation()
   {
      x = 1;
      this.y = 2;
      z = 3;

      // assigning this in objects is an error.
      // A newinstance = new A();
      // this = newinstance;
   }

   private int x;
   private int y;
   private static int z;

   static void Main()
   {
A obj = new A();

      obj.SomeOperation();

      System.Console.WriteLine( "x = {0}, y = {1}, z= {2}",
                                obj.x, obj.y, A.z );
   }
}

In the Main method, you can see that I create a new instance of the A class and then call the SomeOperation method through the instance of that class. Within the method body of SomeOperation, I have access to the instance and static fields of the class, and I can assign to them simply by using their identifiers. Even though the SomeOperation method can assign the static field z without qualifying it, as I mentioned before, I believe it makes for more readable code if the assignment of static fields is qualified by the class name even in the methods of the same class. Doing so is helpful for whoever comes after you and has to maintain your code— that someone could even be you!

Notice that when I assign to y, I do so through the this identifier. You should note a few important things about this when used within an instance method body. It is treated as a read-only reference whose type is that of the class. Using this, you can access the fields of the instance, as I did when assigning the value of y in the previous code example. Because the this value is read-only, you may not assign it, which would make it reference a different instance. If you try to do so, you'll hear about it when the compiler complains to you and fails to compile your code.

Properties

Properties are one of the nicest mechanisms within C# and the CLR that enable you to enforce encapsulation better. In short, you use properties for strict control of access to the internal state of an object.

A property, from the point of view of the object's client, looks, smells, and behaves just like a public field. The notation to access a property is the same as that used to access a public field on the instance. However, a property doesn't have any associated storage space within the object, as a field does. Rather, a property is a shorthand notation for defining accessors used to read and write fields. The typical pattern is to provide access to a private field in a class through a public property. C# 3.0 made this even easier with its introduction of auto-implemented properties.

Properties significantly enhance your flexibility as a class designer. For example, if a property represents the number of table rows in a database table object, the table object can defer the computation of the value until the point where it is queried through a property. It knows when to compute the value, because the client will call an accessor when it accesses the property.

Declaring Properties

The syntax for declaring properties is straightforward. As with most class members, you can attach metadata attributes to a property. Various modifiers that are valid for properties are similar to ones for methods. Other modifiers include the ability to declare a property as virtual, sealed, override, abstract, and so on. I also cover these in detail in the section titled "Inheritance and Virtual Methods" later in this chapter.

The following code defines a property, Temperature, in class A:

public class A
{
   private int temperature;
public int Temperature
   {
      get
      {
         System.Console.WriteLine( "Getting value for temperature" );
         return temperature;
      }

      set
      {
         System.Console.WriteLine( "Setting value for temperature" );
         temperature = value;
      }
   }
}

public class MainClass
{
   static void Main()
   {
      A obj = new A();

      obj.Temperature = 1;
      System.Console.WriteLine( "obj.Temperature = {0}",
                                obj.Temperature );
   }
}

First I defined a property named Temperature, which has a type of int. Each property declaration must define the type that the property represents. That type should be visible to the compiler at the point where it is declared in the class, and it should have at least the same accessibility as the property being defined. By that, I mean that if a property is public, the type of the value that the property represents must at least be declared public in the assembly within which it is defined. In the example, the int type is an alias for Int32. That class is defined in the System namespace, and it is public. So, you can use int as a property type in this public class A.

The Temperature property merely returns the private field temperature from the internal state of the object instance. This is the universal convention. You name the private field with a leading lowercase character, while naming the property with a leading uppercase character. Of course, you're not obligated to follow this convention, but there is no good reason not to and C# programmers expect it.

Note

If it looks like a lot of typing simply to expose a field value as a property, don't worry. The C# team recognized this and added auto-implemented properties to the languages in C# 3.0, which I cover shortly in the section titled "Auto-Implemented Properties."

Accessors

In the previous example, you can see that there are two blocks of code within the property block. These are the accessors for the property, and within the blocks of the accessors, you put the code that reads and writes the property. As you can see, one is named get and the other is named set. It should be obvious from their names what each one does.

The get block is called when the client of the object reads the property. As you would expect, this accessor must return a value or an object reference that matches the type of the property declaration. It can also return an object that is implicitly convertible to the type of the property declaration. For example, if the property type is a long and the getter returns an int, the int will be implicitly converted to a long without losing precision. Otherwise, the code in this block is just like a parameterless method that returns a value or reference of the same type as the property.

The set accessor is called when the client attempts to write to the property. Note that there is no return value. Note also that a special variable named value is available to the code within this block, and it's the same type as that of the property declaration. When you write to the property, the value variable will have been set to the value or object reference that the client has attempted to assign to the property. If you attempt to declare a local variable named value in the set accessor, you'll receive a compiler error. The set accessor is like a method that takes one parameter of the same type as the property and returns void.

Read-Only and Write-Only Properties

If you define a property with only a get accessor, that property will be read-only. Likewise, if you define a property with only a set accessor, you'll end up with a write-only property. And lastly, a property with both accessors is a read-write property.

You may be wondering why a read-only property is any better or worse than a readonly public field. At first thought, it may seem that a read-only property is less efficient than a readonly public field. However, given the fact that the CLR can inline the code to access the property during JIT compilation, in the case where the property simply returns a private field, this argument of inefficiency does not hold. Now, of course, writing the code is not as efficient. However, because programmers aren't lazy and auto-implemented properties make it so simple, that's really no argument either.

The fact is, in 99% of all cases, a read-only property is more flexible than a readonly public field. One reason is that you can defer a read-only property's computation until the point where you need it (a technique known as lazy evaluation, or deferred execution). So, in reality, it could provide for more efficient code, when the property is meant to represent something that takes significant time to compute. If you're using a readonly public field for this purpose, the computation would have to happen in the block of the constructor. All the necessary data to make the computation may not even be available at that point. Or, you may waste time in the constructor computing the value, when the user of the object may not ever access the value.

Also, read-only properties help enforce encapsulation. If you originally had a choice between a read-only property and a readonly public field, and you chose the read-only property, you would have had greater flexibility in future versions of the class to do extra work at the point where the property is accessed without affecting the client. For example, imagine if you wanted to do some sort of logging in debug builds each time the property is accessed. The client would effectively be calling a method implicitly, albeit one of the special property methods, to access the data. The flexibility of things that you can do in that method is almost limitless. Had you accessed the value as a public readonly field, you wouldn't call a method or be able to do anything without switching it over to a property and forcing the client code to recompile. This discussion leads directly into the discussion regarding encapsulation in the later section titled "Encapsulation."

Auto-Implemented Properties

Many times, you need a type, say a class, which contains a few fields that are treated as a cohesive unit. For example, imagine an Employee type that contains a full name and an identification number but, for the sake of example, manages this data using strings as shown below:

public class Employee
{
    string fullName;
    string id;
}

As written, this class is essentially useless. The two fields are private and must be made accessible. For the sake of encapsulation, we don't want to just make the fields public. However, for such a simple little type, it sure is painful to code up basic property accessors as the following code shows:

public class Employee
{
    public string FullName {
        get { return fullName; }
        set { fullName = value; }
    }

    public string Id {
        get { return id; }
        set { id = value; }
    }

    string fullName;
    string id;
}

What a lot of code just to get a type with a couple of read/write properties!

Note

I'd be willing to bet that there are many developers out there who have simply avoided properties and used public fields in these kinds of helper types simply because of the typing overhead alone. The problem with that short-sighted approach is that you cannot do any sort of validation upon setting the field or perform any lazy evaluation during property access if that requirement becomes necessary as your application evolves.

Thankfully, C# 3.0 added a new feature called auto-implemented properties that reduce this burden significantly. Look how the previous Employee type changes after using auto-implemented properties:

public class Employee
{
    public string FullName { get; set; }
    public string Id { get; set; }
}

That's it! Basically, what you're telling the compiler is, "I want a string property named FullName and I want it to support get and set." Behind the scenes the compiler generates a private field in the class for the storage and implements the accessors for you. The beauty of this is that it's just a little more typing than declaring public fields, but at the same time because they are properties you can change the underlying implementation without having to modify the public contract of the type. That is, if you later decided you wanted to customize the accessors for Id, you could do so without forcing the clients of Employee to recompile.

Note

If you're curious about the private field that the compiler declares in your type for auto-implemented properties, you can always look at the field using ILDASM. Using my current implementation, the private field providing storage for FullName in the Employee class is named <>k__AutomaticallyGeneratedPropertyField0 and is of type string. Notice that the field name is "unspeakable," meaning that you cannot type it into code and compile without getting syntax errors. The C# compiler implementers do this on purpose so we don't use the type name directly. After all, the name of the field is a compiler implementation detail that is subject to change in the future.

You can also create a read-only auto-implemented property by inserting the private keyword as shown below:

public class Employee
{
    public string FullName { get; private set; }
    public string Id { get; set; }
}

At this point, you may be wondering how the FullName field ever gets set. After all, it's read-only and the private field representing the underlying storage has a compiler-generated name that we cannot use in a constructor to assign to it. The solution is to use a conventional constructor or a factory method. The example below shows the use of a conventional instance constructor:

using System;

public class Employee
{
    public Employee( string fullName, string id ) {
        FullName = fullName;
        Id = id;
    }

    public string FullName { get; private set; }
    public string Id { get; set; }
}

public class AutoProps
{
    static void Main() {
Employee emp = new Employee(
            "John Doe",
            "111-11-1111" );
    }
}

Encapsulation

Arguably, one of the most important concepts in object-oriented programming is that of encapsulation. Encapsulation is the discipline of tightly controlling access to internal object data and procedures. It would be impossible to consider any language that does not support encapsulation as belonging to the set of object-oriented languages.

You always want to follow this basic concept: Never define the data fields of your objects as publicly accessible. It's as simple as that. However, you would be surprised how many programmers still declare their data fields as public. Typically, this happens when a small utility object is defined and the creators are either lazy or think they are in too much of a hurry. There are some things, though, you should just not do, and cutting corners like this is one of them.

You want the clients of your object to speak to it only through controlled means. This normally means controlling communication to your object via methods on the object (or properties which, under the covers, are method calls). In this way, you treat the internals of the object as if they are inside a black box. No internals are visible to the outside world, and all communications that could modify those internals are done through controlled channels. Through encapsulation, you can engineer a design whereby the integrity of the object's state is never compromised.

A simple example of what I'm talking about is in order. In this example, I create a dummy helper object to represent a rectangle. The example itself is a tad contrived, but it's a good one for the sake of argument because of its minimal complexity:

class MyRectangle
{
   public uint width;
   public uint height;
}

You can see a crude example of a custom rectangle class. Currently, I'm only interested in the width and the height of the rectangle. Of course, a useful rectangle class for a graphics engine would contain an origin as well, but for the sake of this example, I'll only be interested in the width and height. So, I declare the two fields for width and height as public. Maybe I did that because I was in a hurry as I was designing this basic little class. But as you'll soon see, just a little bit more work up front will provide much greater flexibility.

Now, let's say that time has passed, and I have merrily used my little rectangle class for many uses. Never mind the fact that my little rectangle class is not very useful in and of itself, but let's say I have come up with a desire to make it a little more useful. Suppose I have some client code that uses my rectangle class and needs to compute the area of the rectangle. Back in the days of ANSI C and other purely procedural imperative programming languages, you would have created a function named something like ComputeArea, which would take, as a parameter, a pointer to an instance of MyRectangle. Good object-oriented principles guide me to consider that the best way to do this is to let the instances of MyRectangle tell the client what their area values are. So, let's do it:

class MyRectangle
{
   public uint width;
public uint height;

   public uint GetArea()
   {
      return width * height;
   }
}

As you can see, I've added a new member: the GetArea method. When called on an instance, the trusty MyRectangle will compute the area of itself and return the result. Now, I've still just got a basic little rectangle class that has one helper function defined on it to make clients' lives a little bit easier if they need to know the area of the rectangle. But let's suppose I have some reason to precompute the value of the area, so that each time the GetArea method is called, I don't have to recompute it every time. Maybe I want to do this because I know, for some reason, that GetArea will be called many times on the same instance during its lifetime. Ignoring the fact that early optimization is foolish, let's say that I decide to do it. Now, my new MyRectangle class could look something like this:

class MyRectangle
{
   public uint width;
   public uint height;

   public uint area;

   public uint GetArea()
   {
      return area;
   }
}

If you look closely, you can start to see my errors. Notice that all of the fields are public. This allows the consumer of my MyRectangle instances to access the internals of my rectangle directly. What would be the point of providing the GetArea method if the consumer can simply access the area field directly? Well, you say, maybe I should make the area field private. That way, clients are forced to call GetArea to get the area of the rectangle. This is definitely a step in the right direction, as shown in the following code:

class MyRectangle
{
   public uint width;
   public uint height;

   private uint area;

   public uint GetArea()
   {
      if( area == 0 ) {
         area = width * height;
      }

      return area;
   }
}

I've made the area field private, forcing the consumer to call GetArea in order to obtain the area. However, in the process, I realized that I have to compute the area of the rectangle at some point. So, because I'm lazy to begin with, I decide to check the value of the area field before returning it, and if it's 0, I assume that I need to compute the area before I return it. This is a crude attempt at an optimization. But now, I only compute the area if it is needed. Suppose a consumer of my rectangle instance never needed to know the area of the rectangle. Then, given the previous code, that consumer wouldn't have to lose the time it takes to compute the area. Of course, in my contrived example, this optimization will most likely be extremely negligible. But if you think for just a little bit, I'm sure you can come up with an example where it may be beneficial to use this lazy evaluation technique. Think about database access across a slow network where only certain fields in a table may be needed at run time. Or, for the same database access object, it may be expensive to compute the number of rows in the table. You should only use this technique when necessary.

A glaring problem still exists with my rectangle class. The width and height fields are public, so what happens if consumers change one of the values after they've called GetArea on the instance? Well, then I'll have a really bad case of inconsistent internals. The integrity of the state of my object would be compromised. This is definitely not a good situation to be in. So, now you see the error of my ways yet again. I must make the width and height fields of my rectangle private as well:

class MyRectangle
{
   private uint width;
   private uint height;
   private uint area;

   public uint Width
   {
      get
      {
         return width;
      }

      set
      {
         width = value;
         ComputeArea();
      }
   }

   public uint Height
   {
      get
      {
         return height;
      }

      set
      {
         height = value;
         ComputeArea();
      }
   }

   public uint Area
{

      get
      {
         return area;
      }
   }

   private void ComputeArea()
   {
      area = width * height;
   }
}

Now, in my latest incarnation of MyRectangle, I have become really wise. After making the width and height fields private, I realized that the consumer of the objects needs some way to get and set the values of the width and the height. That's where I use C# properties. Internally, I now handle the changes to the internal state through a method body, and the methods called belong to the set of specially named methods on the class. I have more to say about special—sometimes called reserved—member names in the section titled "Reserved Member Names." Now, I have tight control over access to the internals, and along with that control comes the most essential value of encapsulation. I can effectively manage the state of the internals so that they never become inconsistent. It's impossible to guarantee the integrity of the object's state when foreign entities have access to the state through back-door means.

In this example, my object knows exactly when the width and height fields change. Therefore, it can take the necessary action to compute the new area. If the object had used the approach of lazy evaluation, such that it contained a cached value of the area computed during the first call of the Area property getter, then I would know to invalidate that cache value as soon as either of the setters on the Width or Height properties is called.

The moral of the story is, a little bit of extra work up front to foster encapsulation goes a long way as time goes on. One of the greatest properties of encapsulation that you need to burn into your head and take to the bank is that, when used properly, the object's internals can change to support a slightly different algorithm without affecting the consumers. In other words, the interface visible to the consumer (also known as the contract) does not change. For example, in the final incarnation of the MyRectangle class, the area is computed up front as soon as either of the Width or Height properties is set. Maybe once my software is nearing completion, I'll run a profiler and determine that computing the area early is really sapping the life out of the processor as my program runs. No problem. I can change the model to use a cached area value that is only computed when first needed, and because I followed the tenets of encapsulation, the consumers of my objects don't even need to know about it. They don't even know a change internal to the object occurred. That's the power of encapsulation. When the internal implementation of an object can change, and the clients that use it don't have to change, then you know encapsulation is working as it should.

Note

Encapsulation helps you achieve the age-old guideline of strong cohesion of objects with weak coupling between objects.

Accessibility

I've mentioned access modifiers several times up to this point. Their use may seem intuitive to you if you have any experience with any other object-oriented language, such as C++ or Java. However, certain nuances of C# and CLI member access modifiers bear mentioning. Before I discuss the various types of modifiers, let's talk a little bit about where you can apply them.

Essentially, you can use access modifiers on just about any defined entity in a C# program, including classes and any member within the class. Access modifiers applied to a class affect its visibility from outside the containing assembly. Access modifiers applied to class members, including methods, fields, properties, events, and indexers, affect the visibility of the member from outside of the class. Table 4-1 describes the various access modifiers available in C#.

Table 4-1. Access Modifiers in C#

Access Modifier

Meaning

public

Member is completely visible outside both the defining scope and the internal scope. In other words, access to a public member is not restricted at all.

protected

Member is visible only to the defining class and any class that derives from the defining class.

internal

Member is visible anywhere inside the containing assembly. This includes the defining class and any scope within the assembly that is outside the defining class.

protected internal

Member is visible within the defining class and anywhere else inside the assembly. This modifier combines protected and internal using a Boolean OR operation. The member is also visible to any class that derives from the defining class, whether it's in the same assembly or not.

private

Member is visible only within the defining class, with no exceptions. This is the strictest form of access and is the default access for class members.

Note that the CLR supports one more form of accessibility that the C# language designers felt strongly was unnecessary to implement. Within the CLR, it is known as family-and-assembly accessibility. In C# parlance, that equates to protected and internal. If, for some reason, you absolutely must use this accessibility modifier, then you need to use a different language, such as C++/CLI or raw IL.

Now, let's examine the allowed usage of these modifiers on various defined entities within C#. Class members can use all five variants of the C# access modifiers. The default access of the class members, in the absence of any modifiers at all, is private. Types defined either within or outside a namespace can only have one of two access modifiers; they can either be public or internal. By default, they are internal.

You can apply only public, private, and internal to struct member definitions. I cover struct definitions in greater detail later in the chapter in the section titled "Value Type Definitions." Notice the absence of protected and protected internal. They aren't needed, because structs are implicitly sealed, meaning they cannot be base classes. I cover the sealed modifier in more detail in the section titled "Sealed Classes."

Note

One more important note is in order for those used to coding in C++: struct members are private by default, just like in class definitions, whereas they are public by default in C++.

Lastly, members of interfaces, which I describe fully in Chapter 5, and enums, which I covered in Chapter 3, are implicitly public by their very nature. Interfaces are meant to define a set of operations, or a contract, that a class can implement. It makes no sense for an interface to have any restricted access members, because restricted access members are normally associated with a class implementation, and interfaces, by their definition, contain no implementation. Enumerations, on the other hand, are normally used as a named collection of constants. Enumerations have no internal implementation either, so it makes no sense for enumeration members to have any restricted access. In fact, you get an error if you specify an access modifier, even public, on an interface member or an enumeration member.

As you can see, access for just about anything defaults to the strictest form of access that makes sense for that entity. In other words, you have to do work to allow others access to classes or class members. The only exception is the access for a namespace, which is implicitly public and cannot have any access modifiers applied to it.

Interfaces

Even though I devote much of Chapter 5 to the topic of interfaces, it is worth introducing interfaces at this point for the purpose of discussion in the rest of this chapter. Generally speaking, an interface is a definition of a contract. Classes can choose to implement various interfaces, and by doing so, they guarantee to adhere to the rules of the contract. When a class inherits from an interface, it is required to implement the members of that interface. A class can implement as many interfaces as it wants by listing them in the base class list of the class definition.

In general terms, an interface's syntax closely resembles that of a class. However, each member is implicitly public. In fact, you'll get a compile-time error if you declare any interface member with any modifiers. Interfaces can only contain instance methods; therefore, you can't include any static methods in the definition. Interfaces don't include an implementation; therefore, they are semantically abstract in nature. If you're familiar with C++, you know that you can create a similar sort of construct by creating a class that contains all public, pure virtual methods that have no default implementations.

The members of an interface can only consist of members that ultimately boil down to methods in the CLR. This includes methods, properties, events, and indexers. I cover indexers in the "Indexers" section, and I cover events in Chapter 10.

Note

If you're a stickler for terminology, the C# specification actually calls properties, events, indexers, operators, constructors, and destructors function members. It's actually a misnomer to call them methods. Methods contain executable code, so they're also considered function members.

The following code shows an example of an interface and a class that implements the interface:

public interface IMusician
//Note:A standard practice is that you preface interface names with a capital "I"
{
   void PlayMusic();
}

public class TalentedPerson : IMusician
{
   public void PlayMusic() {}
   public void DoALittleDance() {}
}

public class EntryPoint
{
   static void Main()
   {
      TalentedPerson dude = new TalentedPerson();
      IMusician musician = dude;

      musician.PlayMusic();
      dude.PlayMusic();
      dude.DoALittleDance();
   }
}

In this example, I've defined an interface named IMusician. A class, TalentedPerson, indicates that it wants to support the IMusician interface. The class declaration is basically saying, "I would like to enter into a contract to support the IMusician interface, and I guarantee to support all the methods of that interface." The requirement of that interface is merely to support the PlayMusic method, which the TalentedPerson class does so faithfully. As a final note, it is customary to name an interface type with a leading uppercase I. When reading code, this stands as a marker to indicate that the type in question is, in fact, an interface.

Now, clients can access the PlayMusic method in one of two ways. They can either call it through the object instance directly, or they can obtain an interface reference onto the object instance and call the method through it. Because the TalentedPerson class supports the IMusician interface, references to objects of that class are implicitly convertible to references of IMusician. The code inside the Main method in the previous example shows how to call the method both ways.

The topic of interfaces is broad enough to justify devoting an entire chapter to them, which I do in Chapter 5. However, the information regarding interfaces that I've covered in this section is enough to facilitate the discussions in the rest of this chapter.

Inheritance

If you ask around, many developers will tell you that inheritance is the backbone of object-oriented programming. Although inheritance is a really slick concept to those who first encounter it, I beg to differ that inheritance is the backbone. I'm a firm believer that encapsulation is the strongest feature of object-oriented programming. Inheritance is an important concept and a useful tool. However, like many powerful tools, it can be dangerous when misused. My goal in this section is to introduce you to inheritance in a way that makes you respect its power and that helps you to avoid abusing it.

Earlier, I covered the syntax for defining a class. You specify the base class after a colon that follows the class name. In C#, a class can have only one base class. (Some other languages, such as C++, support multiple inheritance.)

Accessibility of Members

Accessibility of members plays an important aspect in inheritance, specifically with respect to accessing members of the base class from the derived class. Any public members of the base class become public members of the derived class.

Any members marked as protected are only accessible internally to the declaring class and to the classes that inherit from it. Protected members are never accessible publicly from outside the defining class or any class deriving from the defining class. Private members are never accessible to anything except the defining class. So even though a derived class inherits all the members of the base class, including the private ones, the code in the derived class cannot access the private members inherited from the base class. In addition, protected internal members are visible to all types that are defined within the containing assembly and to classes that derive from the class defining the member. The reality is that the derived class inherits every member of a base class, except instance constructors, static constructors, and destructors.

As you've seen, you can control the accessibility of the entire class itself when you define it. The only possibilities for the class type's accessibility are internal and public. When using inheritance, the rule is that the base class type must be at least as accessible as the deriving class. Consider the following code:

class A
{
   protected int x;
}

public class B : A
{
}

This code doesn't compile, because the A class is internal and is not at least as accessible as the deriving class B. Remember that in the absence of an access modifier, class definitions default to internal access—hence, the reason class A is internal. In order for the code to compile, you must either promote class A to public access or demote class B to internal access. Also note that it is legal for class A to be public and class B to be internal.

Implicit Conversion and a Taste of Polymorphism

You can view inheritance and what it does for you in several ways. First and most obvious, inheritance allows you to borrow an implementation. In other words, you can inherit class D from class A and reuse the implementation of class A in class D. It potentially saves you from having to do some work when defining class D. Another use of inheritance is specialization, where class D becomes a specialized form of class A. For example, consider the class hierarchy, as shown in Figure 4-1.

Inheritance specialization

Figure 4-1. Inheritance specialization

As you can see, classes Rectangle and Circle derive from class GeometricShape. In other words, they are specializing the GeometricShape class. Specialization is meaningless without polymorphism and virtual methods. I cover the topic of polymorphism in more detail in the "Inheritance and Virtual Methods" section of this chapter. For the moment, I'll define basically what it means for the purpose of this conversation.

Polymorphism describes a situation in which a type referenced with a particular variable can behave like, and actually be, a different (more specialized) type instance. Chapter 5 examines the differences and similarities between interfaces and contracts. Figure 4-1 shows a method in GeometricShape named Draw. This same method appears in both Rectangle and Circle. You can implement the model with the following code:

public class GeometricShape
{
   public virtual void Draw()
   {
      // Do some default drawing stuff.
   }
}

public class Rectangle : GeometricShape
{
   public override void Draw()
   {
      // Draw a rectangle
   }
}

public class Circle : GeometricShape
{
   public override void Draw()
   {
      // Draw a circle
   }
}

public class EntryPoint
{
   private static void DrawShape( GeometricShape shape )
   {
shape.Draw();
   }

   static void Main()
   {
      Circle circle = new Circle();
      GeometricShape shape = circle;

      DrawShape( shape );
      DrawShape( circle );
   }
}

You create a new instance of Circle in the Main method. Right after that, you obtain a GeometricShape reference on the same object. This is an important step to note. The compiler has implicitly converted the reference into a GeometricShape type reference by allowing you to use a simple assignment expression. Underneath the covers, however, it's really still referencing the same Circle object. This is the gist of type specialization and the automatic conversion that goes along with it.

Now let's consider the rest of the code in the Main method. After you get a GeometricShape reference on the Circle instance, you can pass it to the DrawShape method, which does nothing but call the Draw method on the shape. However, the shape object reference really points to a Circle, the Draw method is defined as virtual, and the Circle class overrides the virtual method, so calling Draw on the GeometricShape reference actually calls Circle.Draw. That is polymorphism in action. The DrawShape method doesn't need to care at all about what specific type of shape the object is. All it cares about is whether it is, in fact, a GeometricShape. And Circle is a GeometricShape. This is why inheritance is often referred to as an is-a relationship. In the given example, Rectangle is-a GeometricShape, and Circle is-a GeometricShape. The key to determining whether inheritance makes sense or not is to apply the is-a relationship, along with some good old common sense, to your design. If a class D inherits from a class B, and class D semantically is-not-a class B, then inheritance is not the correct tool for that relationship.

One last important note about inheritance and convertibility is in order. I've said that the compiler implicitly converts the Circle instance reference into a GeometricShape instance reference. Implicit, in this case, means that the code doesn't have to do anything special to do the conversion, and by something special, I typically mean a cast operation. Because the compiler has the ability to do this based upon its knowledge of the inheritance hierarchy, it would seem to make sense that you don't have to get a GeometricShape reference before you can call DrawShape with the Circle object instance. In fact, this is exactly true. The last line of the Main method demonstrates this. You can simply pass the Circle instance reference directly into the DrawShape method, and because the compiler can implicitly convert the type to a GeometricShape reference based upon the inheritance, it does all of the work for you. Again, you can see the power of this mechanism.

Now, you can pass any object instance that derives from GeometricShape. After the software is shrink-wrapped and labeled version 1, someone can come along later in version 2 and define new shapes that derive from GeometricShape, and the code for DrawShape does not need to change. It doesn't even need to know what the new specializations are. They could be Trapezoid, Square (a specialization of Rectangle), or Ellipse. It does not matter, as long as they derive from GeometricShape.

Member Hiding

From the previous section's discussion, you can see how the concept of inheritance, although a powerful one, can be overused. When programmers are first introduced to inheritance, they have a tendency to use it too much, creating designs and hierarchical structures that are hard to maintain. It's important to note that there are alternatives to using inheritance that in many cases make more sense. Among the various types of associations between classes in a software system design, inheritance is the strongest bond of them all. I uncover many more issues with regards to inheritance near the end of the chapter. However, let's go ahead and cover some basic effects of inheritance here.

Note that inheritance extends functionality but cannot remove functionality. For example, the public methods available on a base class are available through instances of the derived class and classes derived from that class. You cannot remove these capabilities from the derived class. Consider the following code:

public class A
{
   public void DoSomething()
   {
      System.Console.WriteLine( "A.DoSomething" );
   }
}

public class B : A
{
   public void DoSomethingElse()
   {
      System.Console.WriteLine( "B.DoSomethingElse" );
   }
}

public class EntryPoint
{
   static void Main()
   {
      B b = new B();

      b.DoSomething();
      b.DoSomethingElse();
   }
}

In Main, you create a new instance of class B, which derives from class A. Class B inherits from class A, therefore class B gets a union of the members of both class A and class B. That is why you can call both DoSomething and DoSomethingElse on the instance of class B. This is pretty obvious, because inheritance extends functionality.

But what if you want to inherit from class A but hide the DoSomething method? In other words, what if you just want to extend part of A's functionality? This is impossible with inheritance. However, you have the option of member hiding, as shown in the following code, which is a modified form of the previous example:

public class A
{
   public void DoSomething()
   {
      System.Console.WriteLine( "A.DoSomething" );
   }
}

public class B : A
{
public void DoSomethingElse()
   {
      System.Console.WriteLine( "B.DoSomethingElse" );
   }

   public new void DoSomething()
   {
      System.Console.WriteLine( "B.DoSomething" );
   }
}

public class EntryPoint
{
   static void Main()
   {
      B b = new B();

      b.DoSomething();
      b.DoSomethingElse();

      A a = b;
      a.DoSomething();
   }
}

You can see that in this version I've introduced a new method on class B named DoSomething. Also notice the addition of the new keyword to the declaration of B.DoSomething. If you don't add this keyword, the compiler will complain with a warning. This is the compiler's way of telling you that you need to be more explicit about the fact that you're hiding a method in the base class. Arguably, the compiler does this because hiding members this way is generally considered bad design. Let's see why. The output from the previous code is as follows:

B.DoSomething
B.DoSomethingElse
A.DoSomething

First notice that which DoSomething method gets called depends on the type of reference it is being called through. This is rather nonintuitive, because B is-an A, and you know that inheritance models an is-a relationship. If that's the case, shouldn't the entire public interface for A be available to consumers of the instance of class B? The short answer is yes. If you really want the method to behave differently in subclasses, then at the point class A is defined, you would declare the DoSomething method as virtual. That way, you could utilize polymorphism to do the right thing. Then, the most derived DoSomething would get called no matter which type of reference it is called through.

I have more to say about virtual methods later on, but think about this for a moment. In order to declare DoSomething as virtual, you need to think about the future at the point you define it. That is, you have to anticipate the possibility that someone could inherit from your class and possibly may want to override this functionality. This is just one reason why inheritance can be more complicated during the design process than it initially seems. As soon as you employ inheritance, you have to start thinking about a lot more things like this. And we all know that nobody can predict the future.

Even though class B now hides class A's implementation of DoSomething, remember, it does not remove it. It hides it when calling the method through a B reference on the object. However, in the Main method, you can see that you can easily get around this by using implicit conversion to convert the B instance reference into an A instance reference and then calling the A.DoSomething implementation through the A reference. So, A.DoSomething is not gone—it's just hidden. You have to do a little more work to get to it.

Suppose you passed the B instance reference to a method that accepted an A instance reference, similar to the DrawShape example. The B instance reference would be implicitly converted to an A instance reference, and if that method called DoSomething on that A instance reference passed to it, it would get to A.DoSomething rather than B.DoSomething. That's probably not what the caller of the method would expect.

This is a classic demonstration that just because the language allows you to do something like this doesn't mean that doing so fosters good design. Just about any language available out there, including C++, has features in the backwaters of its spec that, when used (or used improperly), really just add unnecessary complexity and result in bad designs.

The base Keyword

When you derive from a class, often you need to call a method or access a field, a property, or an indexer on the base class from within a method on the derived class. The base keyword exists for this purpose. You can use the base keyword just like any other instance variable, but you can use it only within the block of an instance constructor, instance method, or instance property accessor. You cannot use it in static methods. This makes complete sense, because base allows access to base class implementations of an instance, much like this allows access to the instance owning the method. Let's look at the following code block:

public class A
{
   public A( int var )
   {
      this.x = var;
   }

   public virtual void DoSomething()
   {
      System.Console.WriteLine( "A.DoSomething" );
   }

   private int x;
}

public class B : A
{
   public B()
      : base( 123 )
   {
   }

   public override void DoSomething()
   {
System.Console.WriteLine( "B.DoSomething" );
      base.DoSomething();
   }

}

public class EntryPoint
{
   static void Main()
   {
      B b = new B();

      b.DoSomething();
   }
}

In this example, you can see two uses of the base keyword. The first is in the constructor for class B. Remember that the base class doesn't inherit instance constructors. However, when initializing the object, it is sometimes necessary to call one of the base class constructors explicitly during initialization of the derived class. This explains the notation in the class B instance constructor. The base class initialization occurs after the declaration of the derived class constructor's parameter list, but before the constructor code block. I discuss the ordering of constructor calls and object initialization in greater detail later, in the section titled "Creating Objects."

The second use of the base keyword is in the B.DoSomething implementation. I have decided that, in my implementation of class B, I want to borrow the DoSomething implementation in class A while implementing B.DoSomething. I can call the A.DoSomething implementation directly from within the B.DoSomething implementation by going through the base keyword.

If you're familiar with virtual methods, you may have raised an eyebrow at this point. If the DoSomething method is virtual, and the base keyword acts like an instance variable on the base class, wouldn't the call to base.DoSomething actually end up calling B.DoSomething? After all, that's how polymorphism works, and base.DoSomething is equivalent to doing ((B)this).DoSomething, which is just casting the this reference into a class B reference on this and then calling B.DoSomething, isn't it? Well, if that were the case, then the code in B.DoSomething would introduce an infinite loop.

The answer to the question is that no infinite loop has been introduced. The base keyword is treated specially when used inside an instance member to call a virtual method. Normally, calling a virtual method on an instance calls the most derived implementation of the virtual method, which in this case is B.DoSomething. However, when it's called through the base keyword, the most derived method with respect to the base class is called. Because A is the base class and A.DoSomething is the most derived version of DoSomething with respect to class A, then base.DoSomething calls A.DoSomething. Thus, this is how you can implement an override method while borrowing the implementation of the base class. If you're curious about the details, the fact is that the generated IL code calls through the base reference using the call instruction rather than callvirt.

sealed Classes

I hinted previously that inheritance is such a powerful tool that it's easily abused. In fact, this is so true that I devote an entire discussion to the pitfalls of inheritance in the section titled "Inheritance, Containment, and Delegation" later in this chapter. When you create a new class, sometimes you create it with the express intent for it to serve as a base class or to allow for specialization. Often, though, classes are designed with no knowledge or foresight about whether they will be used as base classes or not. In fact, it's likely that a class you design today will be used as a base class tomorrow, even though you never intended for it to be used as a base class.

C# offers the sealed keyword for the occasions when you never want a client to derive from a class. When applied to the entire class, the sealed keyword indicates that this class is a leaf class. By that, I mean that nothing can inherit from this class. If you visualize your inheritance diagrams in your design as trees, then it makes sense to call sealed classes leaf classes. At first, you might think that you should rarely use the sealed keyword. However, I believe that the contrary is true. You should use the sealed keyword as often as possible when designing new classes. In fact, use it by default.

Inheritance is such a tricky beast that, in order for a class to serve as a good base class, you must design it with that goal in mind. If not, you should mark it as sealed. It's as simple as that. Now, you may be thinking, "Shouldn't I leave it unsealed so that someone can possibly derive from it in the future, thus retaining maximum flexibility?" The answer is no, in a good design. Again, a class that is meant to serve as a base class must be designed with that in mind from the start. If it is not, then it's likely that you'll hit pitfalls while trying to derive from the class effectively.

Note

In many cases, classes that are meant to serve as extendable base classes are contained in consumable libraries. Creating libraries is a detail-oriented business that you must focus lots of time on for your library to be maximally useful. Additionally, once you publish a library, you may be stuck with supporting it for a long time; therefore, you want to get it right the first time. I suggest you reference Framework Design Guidelines: Conventions, Idioms, and Patterns for Reusable .NET Libraries by Krzysztof Cwalina and Brad Abrams (Addison-Wesley Professional, 2005) if you're planning to create libraries; the book originated from the internal design guidelines that the .NET Base Class Library team used while developing the framework.

abstract Classes

At the opposite end of the spectrum from sealed classes are abstract classes. Sometimes, you need to design a class whose only purpose is to serve as a base class. You should mark classes such as these with the abstract keyword.

The abstract keyword tells the compiler that this class is meant to be used only as a base class, and therefore it does not allow code to create instances of that class. Let's revisit the GeometricShape example from earlier in the chapter:

public abstract class GeometricShape
{
   public abstract void Draw();
}

public class Circle : GeometricShape
{
   public override void Draw()
   {
      // Do some drawing.
   }
}

public class EntryPoint
{
   static void Main()
{
      Circle shape = new Circle();

      // This won't work!
      // GeometricShape shape2 = new GeometricShape();

      shape.Draw();
   }
}

It makes no sense to create a GeometricShape object all by itself, so I've marked the GeometricShape class as abstract. Therefore, if the code in Main attempts to create an instance of GeometricShape, a compiler error will be emitted. You may have also noted the use of the abstract keyword on the GeometricShape.Draw method. I cover this usage of the keyword in more detail in the "Virtual and Abstract Methods" section. In short, using the abstract keyword is a way of saying to the compiler that the deriving classes must override the method. The method must be overridden by the derived classes, thus it makes no sense for GeometricShape.Draw to have an implementation when you can't ever create an instance of GeometricShape anyway. Therefore, abstract methods don't need to have an implementation. If you come from the C++ world, you may be exclaiming that C++ allows an abstract method to have an implementation. This is true, but the designers of C# considered the idea unnecessary. In my experience, I've rarely found the need to use a default implementation of an abstract method except in debug builds.

As you can see, there can be times in a design when you use a base class to define a sort of template of behavior by providing an implementation to inherit. The leaf classes can inherit from this base template of an implementation and flesh out the details.

Nested Classes

You define nested classes within the scope of another class definition. Classes that are defined within the scope of a namespace, or outside the scope of a namespace but not inside the scope of another class, are called non-nested classes. Nested classes have some special capabilities and lend themselves well to situations where you need a helper class that works on behalf of the containing class.

For example, a container class might maintain a collection of objects. Imagine that you need some facility to iterate over those contained objects and also allow external users who are doing the iteration to maintain a marker, or a cursor of sorts, representing their place during the iteration. This is a common design technique. Preventing the users from holding on to direct references to the contained objects gives you much greater flexibility to change the internal behavior of the container class without breaking code that uses the container class. Nested classes provide a great solution to this problem for several reasons.

First, nested classes have access to all of the members that are visible to the containing class, even if they're private. Consider the following code, which represents a container class that contains instances of GeometricShape:

using System.Collections;

public abstract class GeometricShape
{
   public abstract void Draw();
}

public class Rectangle : GeometricShape
{
public override void Draw()
   {
      System.Console.WriteLine( "Rectangle.Draw" );
   }
}

public class Circle : GeometricShape
{
   public override void Draw()
   {
      System.Console.WriteLine( "Circle.Draw" );
   }
}

public class Drawing : IEnumerable
{
   private ArrayList shapes;

   private class Iterator : IEnumerator
   {
      public Iterator( Drawing drawing )
      {
         this.drawing = drawing;
         this.current = −1;
      }

      public void Reset()
      {
         current = −1;
      }

      public bool MoveNext()
      {
         ++current;
         if( current < drawing.shapes.Count ) {
            return true;
         } else {
            return false;
         }
      }

      public object Current
      {
         get
         {
            return drawing.shapes[ current ];
         }
      }

      private Drawing   drawing;
      private int       current;
   }
public Drawing()
   {
      shapes = new ArrayList();
   }

   public IEnumerator GetEnumerator()
   {
      return new Iterator( this );
   }

   public void Add( GeometricShape shape )
   {
      shapes.Add( shape );
   }
}

public class EntryPoint
{
   static void Main()
   {
      Rectangle rectangle = new Rectangle();
      Circle circle = new Circle();

      Drawing drawing = new Drawing();
      drawing.Add( rectangle );
      drawing.Add( circle );

      foreach( GeometricShape shape in drawing ) {
         shape.Draw();
      }
   }
}

This example introduces a few new concepts, such as the IEnumerable and IEnumerator interfaces, which I detail in Chapter 9. For now, let's focus primarily on the nested class usage. As you can see, the Drawing class supports a method called GetEnumerator, which is part of the IEnumerable implementation. It creates an instance of the nested Iterator class and returns it.

Here's where it gets interesting. The Iterator class takes a reference to an instance of the containing class, Drawing, as a parameter to its constructor. It then stores away this instance for later use so that it can get at the shapes collection within the drawing object. However, notice that the shapes collection in the Drawing class is private. It doesn't matter, because nested classes have access to the containing class's private members.

Also, notice that the Iterator class itself is declared private. Non-nested classes can only be declared as either public or internal, and they default to internal. You can apply the same access modifiers to nested classes as you can to any other member of the class. In this case, you declare the Iterator class as private so that external code, such as in the Main routine, cannot create instances of the Iterator directly. Only the Drawing class itself can create instances of Iterator. It doesn't make sense for anyone other than Drawing.GetEnumerator to be able to create an Iterator instance.

Nested classes that are declared public can be instantiated by code external to the containing class. The notation for addressing the nested class is similar to that of namespace qualification. In the following example, you can see how to create an instance of a nested class:

public class A
{
   public class B
   {
   }
}

public class EntryPoint
{
   static void Main()
   {
      A.B b = new A.B();
   }
}

Sometimes when you introduce a nested class, its name may hide a member name within a base class using the new keyword, similar to the way method hiding works. This is extremely rare, and can, for the most part, be avoided. Let's take a look at an example:

public class A
{
   public void Foo()
   {
   }
}

public class B : A
{
   public new class Foo
   {
   }
}

In this case, you define a nested class Foo inside the class B definition. The name is the same as the Foo method in class A, therefore you must use the new keyword, or else the compiler will let you know about the collision in names. Again, if you get into a situation like this, it's probably time to rethink your design or simply rename the nested class unless you really meant to hide the base member. Hiding base members like this is questionable design and not something you should generally do just because the language allows it.

Indexers

Indexers allow you to treat an object instance as if it were an array or a collection. This allows for a more natural usage of objects that are meant to behave as a collection, such as instances of the Drawing class from the previous section.

Generally, indexers look a little bit like a method whose name is this. As with just about every entity in the C# type system, you can apply metadata attributes to indexers. You can also apply the same modifiers to them that just about every other class member can have, except one: Indexers may not be static. Indexers are, therefore, always instance-based and work on a specific instance of an object of the defining class. Following the modifiers in the declaration is the type of the indexer. The indexer will return this type of the object to the caller. Then you put the this keyword, followed by the parameter list in square brackets, which I show in the next example.

Essentially, an indexer behaves a lot like a hybrid between a property and a method. After all, under the covers, it is one of the special methods defined by the compiler when you define an indexer. Conceptually, an indexer is similar to a method, in that it can take a set of parameters when used. However, it also behaves like a property, as you define the accessors with a similar syntax. You can apply many of the same modifiers to indexers as you can to a method. For example, indexers can be virtual, they can be an override of a base class indexer, or they can be overloaded based on the parameter list, just as methods can. Following the parameter list is the code block for the indexer, which is just like a property code block in its syntax. The main difference is that the accessors for the indexer can access the parameter list variables, whereas the accessors of a property don't have user-defined parameters. Let's add an indexer to the Drawing object and see how you can use it:

using System.Collections;

public abstract class GeometricShape
{
   public abstract void Draw();
}

public class Rectangle : GeometricShape
{
   public override void Draw()
   {
      System.Console.WriteLine( "Rectangle.Draw" );
   }
}

public class Circle : GeometricShape
{
   public override void Draw()
   {
      System.Console.WriteLine( "Circle.Draw" );
   }
}

public class Drawing
{
   private ArrayList shapes;

   public Drawing()
   {
      shapes = new ArrayList();
   }

   public int Count
   {
      get
      {
         return shapes.Count;
      }
   }

   public GeometricShape this[ int index ]
{
      get
      {
         return (GeometricShape) shapes[index];
      }
   }

   public void Add( GeometricShape shape )
   {
      shapes.Add( shape );
   }
}

public class EntryPoint
{
   static void Main()
   {
      Rectangle rectangle = new Rectangle();
      Circle circle = new Circle();

      Drawing drawing = new Drawing();
      drawing.Add( rectangle );
      drawing.Add( circle );

      for( int i = 0; i < drawing.Count; ++i ) {
         GeometricShape shape = drawing[i];
         shape.Draw();
      }
   }
}

As you can see, you can access the elements of the Drawing object in the Main method as if they were inside a normal array. Most collection types support some type of indexer such as this. Also, because this indexer only has a get accessor, it is read-only. However, keep in mind that if the collection maintains references to objects, the client code can still change the state of the contained object through that reference. But because the indexer is read-only, the client code cannot swap out the object reference at a specific index with a reference to a completely different object.

One difference is worth noting between a real array and an object that provides an indexer. You cannot pass the results of calling an indexer on an object as an out or ref parameter to a method as you can do with a real array. A similar restriction is placed on properties.

partial Classes

Classes defined as partial were a new addition to C# 2.0. So far, I've shown you how to define classes in one single file. This was a requirement in C# 1.0. It was impossible to split the definition of a class across multiple files.

At first, such a convenience may not seem worthwhile. After all, if a class has become so large that the file is hard to manage, that may be an indication of poor design. But arguably, the main reason partial classes were introduced is to support code-generation tools.

Normally, when you work within the confines of the IDE, the IDE tries to help you out by generating some code for you. For example, a wizard generates helpful DataSet-derived classes when using ADO.NET facilities. The classic problem has always been editing the resulting code generated by the tool. It was always a dangerous proposition to edit the output from the tool, because any time the parameters to the tool change, the tool regenerates the code, thus overwriting any changes made. This is definitely not desired. Previously, the only way to work around this was to use some form of reuse, such as inheritance or containment, thus inheriting a class from the class produced by the code-generation tool. Many times these were not natural solutions to the problem. And many times, the code generated by these tools was not designed to take inheritance into consideration.

Now, you can slip the partial keyword into the class definition right before the class keyword, and voilà—you can split the class definition across multiple files. One requirement is that each file that contains part of the partial class must use the partial keyword, and all of the partial pieces must be defined within the same namespace, if you declare them in a namespace at all. Now, with the addition of the partial keyword, the code generated from the code-generation tool can live in a separate file from the additions to that generated class, and when the tool regenerates the code, you don't lose your changes.

You should know some things about the process the compiler goes through to assemble partial classes into a whole class. You must compile all the partial pieces of a class together at once so the compiler can find all of the pieces. For the most part, all of the members and aspects of the class are merged together using a union operation. Therefore, they must coexist together as if you had declared and defined them all in the same file. Base interface lists are unioned together. However, because a class can have one base class at most, if the partial pieces list a base class, they must all list the same base class. Other than those obvious restrictions, I think you'll agree that partial classes are a welcome addition to the C# language.

partial Methods

C# 3.0 introduced the partial keyword for methods to complement partial classes. A partial method is simply a method whose signature is declared without a body in one piece of the partial class and defined in another piece of the partial class. Just like partial classes, partial methods come in really handy when you are consuming code created by wizards and code generators. But the beauty of partial methods is that if a generator creates a declaration for a partial method in one part of the class declaration and you don't implement it in your part, then the method is not included as part of the final assembled class. Moreover, any code in the generated piece that calls the partial method will not break. It will simply not call the partial method at all. There are several restrictions on partial methods necessary to provide this behavior.

  • Partial methods must have a return type of void.

  • Partial methods may not accept out parameters but may accept ref parameters.

  • Partial methods may not be extern as well.

  • Partial methods cannot be marked virtual and may not be decorated with access modifiers because they are implicitly private.

  • Partial methods can be marked either static or unsafe[10].

  • Partial methods can be generic and may be decorated with constraints, although repeating the constraints in the declaration of the implementation is optional.

  • Delegates may not be wired up to call partial methods because they are not guaranteed to exist in the final compiled code.

With all of that in mind, let's look at a short example of partial methods. Imagine one partial class that is, for the sake of this example, a result of some code generator and is shown below:

public partial class DataSource
{
    // Some useful methods
    // ...

    partial void ResetSource();
}

Let's pretend from this DataSource class that the generator created represents some sort of back-end data store that, in order to satisfy some design requirement, needs to be able to be reset from time to time. Moreover, let's assume that the steps required to reset the data source are only known by the one who completes and consumes this partial class and implements the partial method. With that in mind, a possible completion of this partial class by the consumer could look like the following:

using System;

public partial class DataSource
{
    partial void ResetSource() {
        Console.WriteLine( "Source was reset" );
    }

    public void Reset() {
        ResetSource();
    }
}

public class PartialMethods
{
    static void Main() {
        DataSource ds = new DataSource();

        ds.Reset();
    }
}

You can see that I had to add a public method named Reset in order for Main to be able to reset instances of DataSource. That's because the ResetSource method is implicitly private. If you inspect the resultant executable with ILDASM, you will see the private method DataSource.ResetSource and if you inspect the IL generated for DataSource.Reset, you will see it calling through to ResetSource. If you were to comment out, or remove, the partial implementation of ResetSource and recompile, ILDASM would show that the DataSource.ResetSource method does not exist and the call to ResetSource within the Reset method is simply removed.

Static Classes

C# 2.0 introduced a new class modifier that allows you to designate that a class is nothing more than a collection of static members and cannot have objects instantiated from it. The way you do this is by decorating the class declaration with the static modifier. Once you do that, several restrictions are placed upon the class, as follows:

  • The class may not derive from anything other than System.Object, and if you don't specify any base type, derivation from System.Object is implied.

  • The class may not be used as a base class of another class.

  • The class can only contain static members, which can be public or private. However, they cannot be marked protected or protected internal, because the class cannot be used as a base class.

  • The class may not have any operators, because defining them would make no sense if you cannot create instances of the class.

Even though the entire class is marked static, you still must mark each individual member as static as well. Although it would be nice for the compiler to just assume that any member within a static class is static itself, it would add unnecessary complexity to an already complex compiler. If you put the static modifier on a nested class, it too will be a static class just as the containing class is, but you'll be able to instantiate nested classes not decorated with static.

Note

In essence, declaring a class static is just the same as declaring it sealed and abstract at the same time, but the compiler won't let you do such a thing. However, if you look at the IL code generated for a static class, you'll see that this is exactly what the compiler is doing—that is, the class is decorated with the abstract and sealed modifiers in the IL.

The following code shows an example of a static class:

using System;

public static class StaticClass
{
    public static void DoWork() {
        ++callCount;
        Console.WriteLine( "StaticClass.DoWork()" );
    }

    public class NestedClass {
        public NestedClass() {
            Console.WriteLine( "NestedClass.NestedClass()" );
        }
    }

    private static long callCount = 0;
public static long CallCount {
        get {
            return callCount;
        }
    }
}

public static class EntryPoint
{
    static void Main() {
        StaticClass.DoWork();

        // OOPS! Cannot do this!
        // StaticClass obj = new StaticClass();

        StaticClass.NestedClass nested =
            new StaticClass.NestedClass();

        Console.WriteLine( "CallCount = {0}",
                           StaticClass.CallCount );
    }
}

The StaticClass type contains one method, a field, a property, and a nested class. Notice that because the NestedClass is not declared static, you can instantiate it just like any other class. Also, because the EntryPoint class merely contains the static Main method, it too is marked as static to prevent anyone from instantiating it inadvertently.

Static classes are useful when you need a logical mechanism to partition a collection of methods. An example of a static class within the Base Class Library is the venerable System.Console class. It contains static methods, properties, and events, which are all static because only one console can be attached to the process at a single time.

Reserved Member Names

Several of the capabilities provided by the C# language are really just syntactic sugar that boils down to methods and method calls in the IL code that you never see, unless you open the generated assembly with a tool such as ILDASM. It's important to be aware of this, just in case you attempt to declare a method whose name conflicts with one of these underlying reserved method names. These syntactic shortcuts include properties, events, and indexers. If you try to declare a method with one of these special internal names and you also have a property, an event, or an indexer already defined that declares the same method names internally, the compiler will complain about duplicate symbols.

Note

If you follow the conventions in Framework Design Guidelines: Conventions, Idioms, and Patterns for Reusable .NET Libraries by Krzysztof Cwalina and Brad Abrams (Addison-Wesley Professional, 2005) or you use FxCop to regularly analyze your code, you should never encounter a name conflict between one of your class members and a reserved member name.

Reserved Names for Properties

For a property named Prop of type T, the following signatures are reserved for the implementation of the property:

T get_Prop();
void set_Prop( T value );

Reserved Names for Indexers

If the class contains an indexer that is of type T and takes a parameter list represented by Params, it will contain the following reserved method names:

T get_Item( Params );
void set_Item( Params, T value );

Reserved Names for Destructors

If the class is defined with a finalizer (using the destructor syntax), it will contain a definition of the following method:

void Finalize();

I have a lot more to say about destructors and the Finalize method later in this chapter and in Chapter 13.

Reserved Names for Events

If the class contains an event definition of type T that is named Event, the following methods are reserved on the class:

void add_Event( T callback );
void remove_Event( T callback );

I discuss events in Chapter 10, when I cover delegates and anonymous methods.

Value Type Definitions

A value type is a lightweight type that you typically don't create on the heap. The only exception to this rule is a value type that is a field in a reference object that lives on the heap. A value type is a type that behaves with value semantics. That is, when you assign a value-type variable to another value-type variable, the contents of the source are copied into the destination and a full copy of the instance is made. This is in contrast to reference types, or object instances, where the result of copying one reference-type variable to another is that there is now a new reference to the same object. Also, when you pass a value type as a parameter to a method, the method body receives a local copy of the value, unless the parameter was declared as a ref or an out parameter. All of the C# built-in types except string, arrays, and delegates are value types. In C#, you declare a value type using the struct keyword rather than the class keyword.

On the whole, the syntax of defining a struct is the same as for a class—with some notable exceptions, as you'll soon see. A struct cannot declare a base class. Also, a struct is implicitly sealed. That means that nothing else can derive from a struct. Internally, a struct derives from System.ValueType, which in turn extends System.Object. This is so that ValueType can provide implementations of Object.Equals and Object.GetHashCode, among others, which are meaningful for value types. In the section titled "System.Object," I cover the nuances involved with implementing the methods inherited from System.Object for a value type. Like classes, structs can be declared in partial pieces, and the same rules for partial pieces apply to structs as they do to classes.

Constructors

Types defined as structs can have static constructors just like classes. Structs can also have instance constructors, with one notable exception. They cannot have a user-defined default, parameterless constructor, nor can they have instance field initializers in the struct definition. Static field initializers are permitted, though. Parameterless constructors are not necessary for value types, because the system provides one, which simply sets the fields of the value to their default values. In all cases, that amounts to setting the bits of the field's storage to 0. So, if a struct contains an int, the default value will be 0. If a struct contains a reference type field, the default value will be null. Each struct gets this implicit, parameterless constructor, which takes care of this initialization. It's all part of the language's endeavor to create verifiably type-safe code. However, it's completely possible for a user to declare a variable of a value type without calling a constructor on it at all—that is also without using the new keyword. If that happens, the coder is responsible for setting up the struct appropriately before any methods on it can be called. Consider the following code:

using System;

public struct Square
{
   // Not a good idea to have public fields, but I use them
   // here only for the sake of example.  Prefer to expose
   // these with properties instead.
   public int width;
   public int height;
}

public class EntryPoint
{
   static void Main()
   {
      Square sq;
      sq.width = 1;

      // Can't do this yet.
      // Console.WriteLine( "{0} x {1}", sq.width, sq.height );

      sq.height = 2;

      Console.WriteLine( "{0} x {1}", sq.width, sq.height );
   }
}

In Main, I've allocated space on the stack for a Square object. However, immediately after, I only assign to the width field. I've commented out a call to Console.WriteLine immediately after that because it won't compile. The reason is that you can't call methods on a struct before it is fully initialized. Properties are really method calls under the covers. After I initialize the height field, I can successfully use the Square instance to send the width and height to the console. Can you spot the problem in the following code?

using System;

public struct Square
{
   public int Width
   {
      get
      {
         return width;
      }

      set
      {
         width = value;
      }
   }

   public int Height
   {
get
      {
         return height;
      }

      set
      {
         height = value;
      }
   }

   private int width;
   private int height;
}

public class EntryPoint
{
   static void Main()
   {
      Square sq;
      sq.Width = 1;
      sq.Height = 1;
   }
}

The problem is in the Main method. If you try to compile this code, the compiler will fail with an error. You cannot initialize the fields because they're now private. Also, you cannot initialize them with the properties, because properties are really methods, and it's illegal to call methods on a value that is not fully initialized. One way to get out of this pickle is to use the new keyword when you declare the new Square instance. You can either call one of the constructors on the struct or the default constructor. In this case, I'll call the default constructor so the Main method will change to the following:

public class EntryPoint
{
   static void Main()
   {
      Square sq = new Square();
      sq.Width = 1;
      sq.Height = 1;
   }
}

Because a struct cannot derive from another struct or class, it is not permitted to call any base constructor through the base keyword while inside the constructor block. Even though you know that a struct derives from System.ValueType internally, you may not invoke the constructor of the base type explicitly.

The Meaning of this

Previously, I said that the this keyword within class methods behaves as a constant, read-only value that contains a reference to the current object instance. In other words, it's a read-only object reference in class methods. However, with value types, this behaves like a regular ref parameter. In instance constructors that don't have an initializer clause, the this value behaves as an out parameter. That means that you can actually assign a value to this, as in the following example:

public struct ComplexNumber
{
   public ComplexNumber( double real, double imaginary )
   {
      this.real = real;
      this.imaginary = imaginary;
   }

   public ComplexNumber( ComplexNumber other )
   {
      this = other;
   }

   private double real;
   private double imaginary;
}

public class EntryPoint
{
   static void Main()
   {
      ComplexNumber valA = new ComplexNumber( 1, 2 );
      ComplexNumber copyA = new ComplexNumber( valA );
   }
}

Notice that the second constructor takes, as a parameter, another ComplexNumber value. This constructor behaves similarly to a copy constructor in C++. But instead of having to assign each field individually, you can simply assign to this, thus making a copy of the parameter's state in one line of code. Again, the this keyword acts like an out parameter in this case.

Remember that out parameters behave similarly to ref parameters, with one special difference. When a parameter is marked as an out parameter, the compiler knows that the value is uninitialized at the point the method body starts executing. Therefore, the compiler must make sure that every field of the value is initialized before the constructor exits. For example, consider the following code, which doesn't compile:

public struct ComplexNumber
{
   public ComplexNumber( double real, double imaginary )
   {
      this.real = real;
      this.imaginary = imaginary;
   }

   public ComplexNumber( double real )
   {
      this.real = real;
   }

   private double real;
private double imaginary;
}

The problem with this code lies in the second constructor. Because value types typically are created on the stack, the allocation of such values merely requires adjustment of the stack pointer. Of course, an allocation of this sort says nothing about the state of the memory. The odds are that the memory reserved on the stack for the value contains random garbage. The CLR could elect to zero-initialize these blocks of memory, but that would defeat half the purpose of value types. Value types are meant to be lightweight and fast. If the CLR has to zero-initialize the stack memory for a value type each time the memory is reserved, that's hardly a fast operation. Of course, the default parameterless constructor generated by the system does exactly this. But you must call it explicitly by creating the instance with the new keyword. Because the this keyword is treated as an out parameter in the instance constructors, the instance constructor must initialize each field of the value before it exits. And it is the duty of the C# compiler, which is supposed to generate verifiably type-safe code, to make sure you do so. That's why the previous code example produces a compiler error.

Even though instance constructors in value types cannot use the base keyword to call base class constructors, they can have an initializer. It is valid for the initializer to use the this keyword to call other constructors on the same struct during initialization. So you can make one minor modification to the preceding code example to make it compile:

public struct ComplexNumber
{
   public ComplexNumber( double real, double imaginary )
   {
      this.real = real;
      this.imaginary = imaginary;
   }

   public ComplexNumber( double real )
      :this( real, 0 )
   {
      this.real = real;
   }

   private double real;
   private double imaginary;
}

public class EntryPoint
{
   static void Main()
   {
      ComplexNumber valA = new ComplexNumber( 1, 2 );
   }
}

Notice the difference in the second constructor. I've now introduced an initializer that calls the first constructor from the second one. Even though the single line of code in the second constructor's body is redundant, I left it there to prove a point. Notice that it only assigns the real value as in the previous example, but the compiler doesn't complain. That's because when an instance constructor contains an initializer, the this keyword behaves as a ref parameter in that constructor's body rather than an out parameter. And, because it is a ref parameter, the compiler can assume that the value has been initialized properly before entry into the method's code block. In essence, the initialization burden is deferred to the first constructor, whose duty it is to make sure it initializes all fields of the value.

One last note to consider is that even though the system generates a default, parameterless initializer, you can't call it using the this keyword. For example, the following code doesn't compile:

public struct ComplexNumber
{
   public ComplexNumber( double real, double imaginary )
   {
      this.real = real;
      this.imaginary = imaginary;
   }

   public ComplexNumber( double real )
      :this()
   {
      this.real = real;
   }

   private double real;
   private double imaginary;
}

If you had a struct that had quite a few fields in it and you wanted to initialize all but one of them to 0 or null, it would save you a little bit of typing to be able to do this. But, alas, the compiler doesn't allow it.

Finalizers

Value types are not allowed to have a finalizer. The concept of finalization, or nondeterministic destruction, is reserved for instances of classes, or objects. If structs had finalizers, the runtime would have to manage the calling of the finalizer each time the value goes out of scope.

Keep in mind that you want to be careful about initializing resources within constructors of value types. Just don't do it. Consider a value type that has a field, which is a handle to some sort of low-level system resource. Suppose this low-level resource is allocated, or acquired, in a special constructor that accepts parameters. You now have a couple of problems to deal with. Because you cannot create a default, parameterless constructor, how can you possibly acquire the resource when the user creates an instance of the value without using one of the custom constructors? The answer is, you cannot. The second problem is that you have no automatic trigger to clean up and release the resource, because you have no finalizer. You would have to force the user of the value to call some special method to clean up before the value goes out of scope. Requiring the user to remember to do something like that in order to avoid resource leaks is poor design.

Interfaces

Although it's illegal for a struct to derive from another class, it can still implement interfaces. Supported interfaces are listed in the same way as they are for classes, in a base interface list after the struct identifier. Generally, supporting interfaces for structs is the same as supporting interfaces for classes. I cover interfaces in much more detail in Chapter 5. There are performance implications of implementing interfaces on structs, in that doing so incurs a boxing operation to call methods through an interface reference on the struct value instances. I talk more about that in Chapter 13.

Anonymous Types

How many times have you needed a lightweight class to hold a handful of related values for use within a particular method and you lamented having to type a whole type definition complete with private fields and public property accessors? Enter anonymous types. C# allows you to introduce these types using implicitly typed local variables together with an extended syntax of the new operator. Let's see what this looks like:

using System;

public class EntryPoint
{
    static void Main() {
        var employeeInfo = new { Name = "Joe", Id = 42 };
        var customerInfo = new { Name = " Jane", Id = "AB123" };

        Console.WriteLine( "Name: {0}, Id: {1}",
                           employeeInfo.Name,
                           employeeInfo.Id );

        Console.WriteLine( "employeeInfo Type is actually: {0}",
                           employeeInfo.GetType() );
        Console.WriteLine( "customerInfo Type is actually: {0}",
                           customerInfo.GetType() );
    }
}

Notice the interesting syntax within the braces after the new keyword while declaring employeeInfo. The name/value pairs declare a property name within the anonymous type and initialize it to the given value. In this case, two anonymous types are created with two properties. In the first anonymous type, the first property is a System.String called Name, and the second is a System.Int32 called Id. It's important to note that the underlying type of the instance created is a strong type, it's just compiler generated and you don't know the name of it. But as you can see from the following output from the code above, you can figure out the name of the type:

Name: Joe, Id: 42
employeeInfo Type is actually: <>f__AnonymousType0`2[System.String,System.Int32]
customerInfo Type is actually: <>f__AnonymousType0`2[System.String,System.String]

Note

The compiler-generated type names are implementation specific, so you should never rely on them. Additionally, you'll notice that they are "unspeakable" to the compiler; if you were to attempt to declare an instance using that type name, the compiler would complain with a syntax error.

You do not know the compiler-generated name of the type, therefore you are forced to declare the variable instance as an implicitly typed local variable using the var keyword, as I did in the code.

Also, notice that the compiler-generated type is a generic type that takes two type parameters. It would be inefficient for the compiler to generate a new type for every anonymous type that contains two types with the same field names. The output above indicates that the actual type of employeeInfo looks similar to the type name below:

<>f__AnonymousType0<System.String, System.Int32>

And because the anonymous type for customerInfo contains the same number of fields with the same names, the generated generic type is reused and the type of customerInfo looks similar to the type below:

<>f__AnonymousType0<System.String, System.String>

Had the anonymous type for customerInfo contained different field names than those for employeeInfo, then another generic anonymous type would have been declared.

Now that you know the basics about anonymous types, I want to show you an abbreviated syntax for declaring them. Pay attention to the bold statements in the following example:

using System;

public class ConventionalEmployeeInfo
{
    public ConventionalEmployeeInfo( string Name, int Id ) {
        this.name = Name;
        this.id = Id;
    }

    public string Name {
        get {
            return name;
        }

        set {
            name = value;
        }
    }

    public int Id {
        get {
            return id;
        }

        set {
            id = value;
        }
    }

    private string name;
    private int id;
}

public class EntryPoint
{
    static void Main() {
        ConventionalEmployeeInfo oldEmployee =
            new ConventionalEmployeeInfo( "Joe", 42 );

        var employeeInfo = new { oldEmployee.Name,
                                 oldEmployee.Id };

        string Name = "Jane";
        int Id = 1234;

        var customerInfo = new { Name, Id };

        Console.WriteLine( "employeeInfo Name: {0}, Id: {1}",
                           employeeInfo.Name,
                           employeeInfo.Id );
        Console.WriteLine( "customerInfo Name: {0}, Id: {1}",
                           customerInfo.Name,
                           customerInfo.Id );

        Console.WriteLine( "Anonymous Type is actually: {0}",
                           employeeInfo.GetType() );
    }
}

For illustration purposes, I have declared a type named ConventionalEmployeeInfo that is not an anonymous type. Notice that at the point where I instantiate the anonymous type for employeeInfo, I do not provide the names of the fields as before. In this case, the compiler uses the names of the properties of the ConventionalEmployeeInfo type, which is the source of the data. This same technique works using local variables, as you can see when I declare the customerInfo instance. In this case, customerInfo is an anonymous type that implements two read/write properties named Name and Id. Member declarators for anonymous types that use this abbreviated style are called projection initializers.[11]

If you inspect the compiled assembly in ILDASM, you'll notice that the generated types for anonymous types are of class type. The class is also marked private and sealed. However, the class is extremely basic and does not implement anything like a finalizer or IDisposable.

Note

Anonymous types, even though they are classes, do not implement the IDisposable interface. As I mention in Chapter 13, the general guideline for types that contain disposable types is that they, too, should be disposable. But because anonymous types are not disposable, you should avoid placing instances of disposable types within them.

Be careful not to strip the type off of anonymous types. For example, if you put instances of anonymous types in a System.List, how are you supposed to cast those instances back into the anonymous type when you reference them later? Remember, System.List stores references to System.Object. And even though the anonymous types derive from System.Object, how are you going to cast them back into their concrete types to access their properties? You could attempt to use reflection to overcome this. But then you introduce so much work that you lose any benefit from using anonymous types in the first place. Similarly, if you want to pass instances of anonymous types out of functions via out parameters or via a return statement, you must pass them out as references to System.Object, thus stripping the variables of their useful type information. In the previous example, if you need to pass instances out of a method, then you really should be using an explicitly defined type such as ConventionalEmployeeInfo instead of anonymous types.

After all of these restrictions placed on anonymous types, you may be wondering how they are useful except in rare circumstances within the local scope. It turns out that they are extremely useful when used with projection operators in LINQ (Language Integrated Query), which I will show you in Chapter 16.

Object Initializers

C# 3.0 introduced a shorthand you can use while instantiating new instances of objects. How many times have you written code similar to this?

Employee developer = new Employee();
developer.Name = "Fred Blaze";
developer.OfficeLocation = "B1";

Right after creating an instance of Employee, you immediately start initializing the accessible properties of the instance. Wouldn't it be nice if you could do this all in one statement? Of course, you could always create a specialized overload of the constructor that accepts the parameters to use while initializing the new instance. However, there may be times where it is more convenient not to do so.

The new object initializer syntax is shown below:

using System;

public class Employee
{
    public string Name {
        get; set;
    }

    public string OfficeLocation {
        get; set;
    }
}

public class InitExample
{
    static void Main() {
            Name = "Fred Blaze",
            OfficeLocation = "B1"
        };
}
}

Notice how the developer instance is initialized in the Main method. Under the hood, the compiler generates the same code it would have if you had initialized the properties manually after creating the Employee instance. Therefore, this technique only works if the properties, in this case Name and OfficeLocation, are accessible at the point of initialization.

You can even nest object initializers as shown in the example below:

using System;

public class Employee
{
    public string Name { get; set; }
    public string OfficeLocation { get; set; }
}

public class FeatureDevPair
{
    public Employee Developer { get; set; }
    public Employee QaEngineer { get; set; }
}

public class InitExample
{
    static void Main() {
        FeatureDevPair spellCheckerTeam = new FeatureDevPair {
            Developer = new Employee {
                Name = "Fred Blaze",
                OfficeLocation = "B1"
            },
            QaEngineer = new Employee {
                Name = "Marisa Bozza",
                OfficeLocation = "L42"
            }
        };
    }
}

Notice how the two properties of spellCheckerTeam are initialized using the new syntax. Each of the Employee instances assigned to those properties is itself initialized using an object initializer, too. Finally, let me show you an even more abbreviated way to initialize the object above that saves a bit more typing at the expense of hidden complexity:

using System;

public class Employee
{
    public string Name { get; set; }
    public string OfficeLocation { get; set; }
}
public class FeatureDevPair
{
    private Employee developer = new Employee();
    private Employee qaEngineer = new Employee();

    public Employee Developer {
        get { return developer; }
        set { developer = value; }
    }

    public Employee QaEngineer {
        get { return qaEngineer; }
        set { qaEngineer = value; }
    }
}

public class InitExample
{
    static void Main() {
        FeatureDevPair spellCheckerTeam = new FeatureDevPair {
            Developer = {
                Name = "Fred Blaze",
                OfficeLocation = "B1"
            },
            QaEngineer = {
                Name = "Marisa Bozza",
                OfficeLocation = "L42"
            }
        };
    }
}

Notice that I was able to leave out the new expressions when initializing the Developer and QaEngineer properties of spellCheckerTeam. However, this abbreviated syntax requires that the fields of spellCheckerTeam exist before the properties are set, that is, the fields cannot be null. Therefore, you see that I had to change the definition of FeatureDevPair to create the contained instances of the Employee type at the point of initialization.

Note

If you do not initialize fields exposed by properties during object initialization, and then later write code that initializes instances of those objects using the abbreviated syntax shown above, you will get a nasty surprise at run time. You might have guessed that your code will generate a NullReferenceException in those cases. Unfortunately, the compiler cannot detect this potential disaster at compile time. So be very careful when using the abbreviated syntax previously shown. For example, if you are using this syntax to initialize instances of objects that you did not write, then you should be even more careful because unless you look at the implementation of that third-party class using ILDASM or Reflector, you have no way of knowing if the fields are initialized at object initialization time or not.

Boxing and Unboxing

Allow me to introduce boxing and unboxing. All types within the CLR fall into one of two categories: reference types (objects) or value types (values). You define objects using classes, and you define values using structs. A clear divide exists between these two. Objects live on the garbage collected heap. Values normally live in temporary storage spaces, such as on the stack. The one notable exception already mentioned is that a value type can live on the heap as long as it is contained as a field within an object. However, it is not autonomous, and the GC doesn't control its lifetime directly. Consider the following code:

public class EntryPoint
{
   static void Print( object obj )
   {
      System.Console.WriteLine( "{0}", obj.ToString() );
   }
   static void Main()
   {
      int x = 42;
      Print( x );
   }
}

It looks simple enough. In Main, there is an int, which is a C# alias for System.Int32, and it is a value type. You could have just as well declared x as type System.Int32. The space allocated for x is on the local stack. You then pass it as a parameter to the Print method. The Print method takes an object reference and simply sends the results of calling ToString on that object to the console. Let's analyze this. Print accepts an object reference, which is a reference to a heap-based object. Yet, you're passing a value type to the method. What's going on here? How is this possible?

The key is a concept called boxing. At the point where a value type is defined, the CLR creates a runtime-created wrapper class to contain a copy of the value type. Instances of the wrapper live on the heap and are commonly called boxing objects. This is the CLR's way of bridging the gap between value types and reference types. In fact, if you use ILDASM to look at the IL code generated for the Main method, you'll see the following:

.method private hidebysig static void Main() cil managed
{
  .entrypoint
  // Code size       15 (0xf)
  .maxstack  1
  .locals init (int32 V_0)
  IL_0000:  ldc.i4.s   42
  IL_0002:  stloc.0
  IL_0003:  ldloc.0
  IL_0004:  box        [mscorlib]System.Int32
  IL_0009:  call       void EntryPoint::Print(object)
IL_000e:  ret
} // end of method EntryPoint::Main

Notice the IL instruction, box, which takes care of the boxing operation before the Print method is called. This creates an object, which Figure 4-2 depicts.

Result of boxing operation

Figure 4-2. Result of boxing operation

Figure 4-2 depicts the action of copying the value type into the boxing object that lives on the heap. The boxing object behaves just like any other reference type in the CLR. Also, note that the boxing type implements the interfaces of the contained value type. The boxing type is a class type that is generated internally by the virtual execution system of the CLR at the point where the contained value type is defined. The CLR then uses this internal class type when it performs boxing operations as needed.

The most important thing to keep in mind with boxing is that the boxed value is a copy of the original. Therefore, any changes made to the value inside the box are not propagated back to the original value. For example, consider this slight modification to the previous code:

public class EntryPoint
{
   static void PrintAndModify( object obj )
   {
      System.Console.WriteLine( "{0}", obj.ToString() );
      int x = (int) obj;
      x = 21;
   }
   static void Main()
   {
      int x = 42;
      PrintAndModify( x );
      PrintAndModify( x );
   }
}

The output from this code might surprise you:

42
42

The fact is, the original value, x, declared and initialized in Main, is never changed. As you pass it to the PrintAndModify method, it is boxed, because the PrintAndModify method takes an object as its parameter. Even though PrintAndModify takes a reference to an object that you can modify, the object it receives is a boxing object that contains a copy of the original value. The code also introduces another operation called unboxing in the PrintAndModify method. Because the value is boxed inside an instance of an object on the heap, you can't change the value because the only methods supported by that object are methods that System.Object implements. Technically, it also supports the same interfaces that System.Int32 supports. Therefore, you need a way to get the value out of the box. In C#, you can accomplish this syntactically with casting. Notice that you cast the object instance back into an int, and the compiler is smart enough to know that what you're really doing is unboxing the value type and using the unbox IL instruction, as the following IL for the PrintAndModify method shows:

.method private hidebysig static void PrintAndModify(object obj) cil managed
{
  // Code size       28 (0x1c)
  .maxstack  2
  .locals init (int32 V_0)
  IL_0000:  ldstr      "{0}"
  IL_0005:  ldarg.0
  IL_0006:  callvirt   instance string [mscorlib]System.Object::ToString()
  IL_000b:  call       void [mscorlib]System.Console::WriteLine(string,
                                                                object)
  IL_0010:  ldarg.0
  IL_0011:  unbox      [mscorlib]System.Int32
  IL_0016:  ldind.i4
  IL_0017:  stloc.0
  IL_0018:  ldc.i4.s   21
  IL_001a:  stloc.0
  IL_001b:  ret
} // end of method EntryPoint::PrintAndModify

Let me be very clear about what happens during unboxing in C#. The operation of unboxing a value is the exact opposite of boxing. The value in the box is copied into an instance of the value on the local stack. Again, any changes made to this unboxed copy are not propagated back to the value contained in the box. Now, you can see how boxing and unboxing can really become confusing. As shown, the code's behavior is not obvious to the casual observer who is not familiar with the fact that boxing and unboxing are going on internally. What's worse is that two copies of the int are created between the time the call to PrintAndModify is initiated and the time that the int is manipulated in the method. The first copy is the one put into the box. The second copy is the one created when the boxed value is copied out of the box.

Technically, it's possible to modify the value that is contained within the box. However, you must do this through an interface. The box generated at run time that contains the value also implements the interfaces that the value type implements and forwards the calls to the contained value. So, you could do the following:

public interface IModifyMyValue
{
   int X
{
      get;
      set;
   }
}

public struct MyValue : IModifyMyValue
{
   public int x;

   public int X
   {
      get
      {
         return x;
      }

      set
      {
         x = value;
      }
   }

   public override string ToString()
   {
      System.Text.StringBuilder output =
          new System.Text.StringBuilder();
      output.AppendFormat( "{0}", x );
      return output.ToString();
   }
}

public class EntryPoint
{
   static void Main()
   {
      // Create value
      MyValue myval = new MyValue();
      myval.x = 123;

      // box it
      object obj = myval;
      System.Console.WriteLine( "{0}", obj.ToString() );

      // modify the contents in the box.
      IModifyMyValue iface = (IModifyMyValue) obj;
      iface.X = 456;
      System.Console.WriteLine( "{0}", obj.ToString() );

      // unbox it and see what it is.
      MyValue newval = (MyValue) obj;
      System.Console.WriteLine( "{0}", newval.ToString() );
   }
}

You can see that the output from the code is as follows:

123
456
456

As expected, you're able to modify the value inside the box using the interface named IModifyMyValue. However, it's not the most straightforward process. And keep in mind that before you can obtain an interface reference to a value type, it must be boxed. This makes sense if you think about the fact that references to interfaces are object reference types.

Warning

I cannot think of a good design reason as to why you would want to define a special interface simply so you can modify the value contained within a boxed object.

When Boxing Occurs

C# handles boxing implicitly for you, therefore it's important to know the instances when C# boxes a value. Basically, a value gets boxed when one of the following conversions occurs:

  • Conversion from a value type to an object reference

  • Conversion from a value type to a System.ValueType reference

  • Conversion from a value type to a reference to an interface implemented by the value type

  • Conversion from an enum type to a System.Enum reference

In each case, the conversion normally takes the form of an assignment expression. The first two cases are fairly obvious, because the CLR is bridging the gap by turning a value type instance into a reference type. The third one can be a little surprising. Any time you implicitly cast your value into an interface that it supports, you incur the penalty of boxing. Consider the following code:

public interface IPrint
{
   void Print();
}

public struct MyValue : IPrint
{
   public int x;
public void Print()
   {
      System.Console.WriteLine( "{0}", x );
   }
}

public class EntryPoint
{
   static void Main()
   {
      MyValue myval = new MyValue();
      myval.x = 123;

      // no boxing
      myval.Print();

      // must box the value
      IPrint printer = myval;
      printer.Print();
   }
}

The first call to Print is done through the value reference, which doesn't incur boxing. However, the second call to Print is done through an interface. The boxing takes place at the point where you obtain the interface. At first, it looks like you can easily sidestep the boxing operation by not acquiring an explicit reference typed on the interface type. This is true in this case, because Print is also part of the public contract of MyValue. However, had you implemented the Print method as an explicit interface, which I cover in Chapter 5, then the only way to call the method would be through the interface reference type. So, it's important to note that any time you implement an interface on a value type explicitly, you force the clients of your value type to box it before calling through that interface. The following example demonstrates this:

public interface IPrint
{
   void Print();
}

public struct MyValue : IPrint
{
   public int x;

   void IPrint.Print()
   {
      System.Console.WriteLine( "{0}", x );
   }
}

public class EntryPoint
{
   static void Main()
   {
      MyValue myval = new MyValue();
myval.x = 123;

      // must box the value
      IPrint printer = myval;
      printer.Print();
   }
}

As another example, consider that the System.Int32 type supports the IConvertible interface. However, most of the IConvertible interface methods are implemented explicitly. Therefore, even if you want to call an IConvertible method, such as IConvertible.ToBoolean on a simple int, you must box it first.

Note

Typically, you want to rely upon the external class System.Convert to do a conversion like the one mentioned previously. I only mention calling directly through IConvertible as an example.

Efficiency and Confusion

As you might expect, boxing and unboxing are not the most efficient operations in the world. What's worse is that the C# compiler silently does the boxing for you. You really must take care to know when boxing is occurring. Unboxing is usually more explicit, because you typically must do a cast operation to extract the value from the box, but there is an implicit case I'll cover soon. Either way, you must pay attention to the efficiency aspect of things. For example, consider a container type, such as a System.Collections.ArrayList. It contains all of its values as references to type object. If you were to insert a bunch of value types into it, they would all be boxed! Thankfully, generics, which were introduced in C# 2.0 and .NET 2.0 and are covered in Chapter 11, can solve this inefficiency for you. However, note that boxing is inefficient and should be avoided as much as possible. Unfortunately, because boxing is an implicit operation in C#, it takes a keen eye to find all of the cases of boxing. The best tool to use if you're in doubt whether boxing is occurring or not is ILDASM. Using ILDASM, you can examine the IL code generated for your methods, and the box operations are clearly identifiable. You can find ILDASM.exe in the .NET SDK in folder.

As mentioned previously, unboxing is normally an explicit operation introduced by a cast from the boxing object reference to a value of the boxed type. However, unboxing is implicit in one notable case. Remember how I talked about the differences in how the this reference behaves within methods of classes vs. methods of structs? The main difference is that, for value types, the this reference acts as either a ref or an out parameter, depending on the situation. So when you call a method on a value type, the hidden this parameter within the method must be a managed pointer rather than a reference. The compiler handles this easily when you call directly through a value-type instance. However, when calling a virtual method or an interface method through a boxed instance—thus, through an object—the CLR must unbox the value instance so that it can obtain the managed pointer to the value type contained within the box. After passing the managed pointer to the contained value type's method as the this pointer, the method can modify the fields through the this pointer, and it will apply the changes to the value contained within the box. Be aware of hidden unboxing operations if you're calling methods on a value through a box object.

Note

Unboxing operations in the CLR are not inefficient in and of themselves. The inefficiency stems from the fact that C# typically combines that unboxing operation with a copy operation on the value.

System.Object

Every object in the CLR derives from System.Object. Object is the base type of every type. In C#, the object keyword is an alias for System.Object. It can be convenient that every type in the CLR and in C# derives from Object. For example, you can treat a collection of instances of multiple types homogenously simply by casting them to Object references.

Even System.ValueType derives from Object. However, some special rules govern obtaining an Object reference. On reference types, you can turn a reference of class A into a reference of class Object with a simple implicit conversion. Going the other direction requires a run time type check and an explicit cast using the familiar cast syntax of preceding the instance to convert with the new type in parentheses. Obtaining an Object reference directly on a value type is, technically, impossible. Semantically, this makes sense, because value types can live on the stack. It can be dangerous for you to obtain a reference to a transient value instance and store it away for later use if, potentially, the value instance is gone by the time you finally use the stored reference. For this reason, obtaining an Object reference on a value type instance involves a boxing operation, as described in the previous section.

The definition of the System.Object class is as follows:

public class Object
{
   public Object();

   public virtual void Finalize();

   public virtual bool Equals( object obj );
   public static bool Equals( object obj1,
                              object obj2 );

   public virtual int GetHashCode();
   public Type GetType();
   protected object MemberwiseClone();
   public static bool ReferenceEquals( object obj1,
                                       object obj2 );
   public virtual string ToString();
}

Object provides several methods, which the designers of the CLI/CLR deemed to be important and germane for each object. The methods dealing with equality deserve an entire discussion devoted to them; I cover them in detail in the next section. Object provides a GetType method to obtain the runtime type of any object running in the CLR. Such a capability is extremely handy when coupled with reflection—the capability to examine types in the system at run time. GetType returns an object of type Type, which represents the real, or concrete, type of the object. Using this object, you can determine everything about the type of the object on which GetType is called. Also, given two references of type Object, you can compare the result of calling GetType on both of them to find out if they're actually instances of the same concrete type.

System.Object contains a method named MemberwiseClone, which returns a shallow copy of the object. I have more to say about this method in Chapter 13. When MemberwiseClone creates the copy, all value type fields are copied on a bit-by-bit basis, whereas all fields that are references are simply copied such that the new copy and the original both contain references to the same object. When you want to make a copy of an object, you may or may not desire this behavior. Therefore, if objects support copying, you could consider supporting ICloneable and do the correct thing in the implementation of that interface. Also, note that MemberwiseClone is declared as protected. The main reason for this is so that only the class for the object being copied can call it, because MemberwiseClone can create an object without calling its instance constructor. Such behavior could potentially be destabilizing if it were made public.

Note

Be sure to read more about ICloneable in Chapter 13 before deciding whether to implement this interface.

Four of the methods on Object are virtual, and if the default implementations of the methods inside Object are not appropriate, you should override them. ToString is useful when generating textual, or human-readable, output and a string representing the object is required. For example, during development, you may need the ability to trace an object out to debug output at run time. In such cases, it makes sense to override ToString so that it provides detailed information about the object and its internal state. The default version of ToString simply calls the ToString implementation on the Type object returned from a call to GetType, thus providing the name of the object's type. It's more useful than nothing, but it's probably not useful enough for you if you need to call ToString on an object in the first place.[12] Try to avoid adding side effects to the ToString implementation, because the Visual Studio debugger can call it to display information at debug time. In fact, ToString is most useful for debugging purposes and rarely useful otherwise due to its lack of versatility and localization as I describe in Chapter 8.

The Finalize method deserves special mention. C# doesn't allow you to explicitly override this method. Also, it doesn't allow you to call this method on an object. If you need to override this method for a class, you can use the destructor syntax in C#. I have much more to say about destructors and finalizers in Chapter 13.

Equality and What It Means

Equality between reference types that derive from System.Object is a tricky issue. By default, the equality semantics provided by Object.Equals represent identity equivalence. What that means is that the test returns true if two references point to the same instance of an object. However, you can change the semantic meaning of Object.Equals to value equivalence. That means that two references to two entirely different instances of an object may equate to true as long as the internal states of the two instances match. Overriding Object.Equals is such a sticky issue that I've devoted several sections within Chapter 13 to the subject.

The IComparable Interface

The System.IComparable interface is a system-defined interface that objects can choose to implement if they support ordering. If it makes sense for your object to support ordering in collection classes that provide sorting capabilities, then you should implement this interface. For example, it may seem obvious, but System.Int32, aliased by int in C#, implements IComparable. In Chapter 13, I show how you can effectively implement this interface and its generic cousin, IComparable<T>.

Creating Objects

Object creation is a topic that looks simple on the surface, but in reality is relatively complex under the hood. You need to be intimately familiar with what operations take place during creation of a new object instance or value instance in order to write constructor code effectively and use field initializers effectively. Also, in the CLR, not only do object instances have constructors, but so do the types they're based on. By that, I mean that even the struct and the class types have a constructor, which is represented by a static constructor definition. Static constructors allow you to get work done at the point the type is loaded and initialized into the application domain.

The new Keyword

The new keyword lets you create new instances of objects or values. However, it behaves slightly different when used with value types than with object types. For example, new doesn't always allocate space on the heap in C#. Let's discuss what it does with value types first.

Using new with Value Types

The new keyword is only required for value types when you need to invoke one of the constructors for the type. Otherwise, value types simply have space reserved on the stack for them, and the client code must initialize them fully before you can use them. I covered this in the "Value Type Definitions" section on constructors in value types.

Using new with Class Types

You need the new operator to create objects of class type. In this case, the new operator allocates space on the heap for the object being created. If it fails to find space, it will throw an exception of type System.OutOfMemoryException, thus aborting the rest of the object-creation process.

After it allocates the space, all of the fields of the object are initialized to their default values. This is similar to what the compiler-generated default constructor does for value types. For reference-type fields, they are set to null. For value-type fields, their underlying memory slots are filled with all zeros. Thus, the net effect is that all fields in the new object are initialized to either null or 0. Once this is done, the CLR calls the appropriate constructor for the object instance. The constructor selected is based upon the parameters given and is matched using the overloaded method parameter matching algorithm in C#. The new operator also sets up the hidden this parameter for the subsequent constructor invocation, which is a read-only reference that references the new object created on the heap, and that reference's type is the same as the class type. Consider the following example:

public class MyClass
{
public MyClass( int x, int y )
   {
      this.x = x;
      this.y = y;
   }

   public int x;
   public int y;
}

public class EntryPoint
{
   static void Main()
   {
      // We can't do this!
      // MyClass objA = new MyClass();

      MyClass objA = new MyClass( 1, 2 );
      System.Console.WriteLine( "objA.x = {0}, objA.y = {1}",
                                objA.x, objA.y );
   }
}

In the Main method, notice that you cannot create a new instance of MyClass by calling the default constructor. The C# compiler doesn't create a default constructor for a class unless no other constructors are defined. The rest of the code is fairly straightforward. I create a new instance of MyClass and then output its values to the console. Shortly, in the section titled "Instance Constructor and Creation Ordering," I cover the minute details of object instance creation and constructors.

Field Initialization

When defining a class, it is sometimes convenient to assign the fields a value at the point where the field is declared. The fact is, you can assign a field from any immediate value or any callable method as long as the method is not called on the instance of the object being created. For example, you can initialize fields based upon the return value from a static method on the same class. Let's look at an example:

using System;

public class A
{
   private static int InitX()
   {
      Console.WriteLine( "A.InitX()" );
      return 1;
   }
   private static int InitY()
   {
      Console.WriteLine( "A.InitY()" );
      return 2;
   }
   private static int InitA()
   {
Console.WriteLine( "A.InitA()" );
      return 3;
   }
   private static int InitB()
   {
      Console.WriteLine( "A.InitB()" );
      return 4;
   }

   private int y = InitY();
   private int x = InitX();

   private static int a = InitA();
   private static int b = InitB();
}

public class EntryPoint
{
   static void Main()
   {
      A a = new A();
   }
}

Notice that you're assigning all of the fields using field initializers and setting the fields to the return value from the methods called. All of those methods called during field initialization are static, which helps reinforce a couple of important points regarding field initialization. The output from the preceding code is as follows:

A.InitA()
A.InitB()

A.InitY()
A.InitX()

Notice that two of the fields, a and b, are static fields, whereas the fields x and y are instance fields. The runtime initializes the static fields before the class type is used for the first time in this application domain. In the next section, "Static (Class) Constructors," I show how you can relax the CLR's timing of initializing the static fields.

During construction of the instance, the instance field initializers are invoked. As expected, proof of that appears in the console output after the static field initializers have run. Note one important point: Notice the ordering of the output regarding the instance initializers and compare that with the ordering of the fields declared in the class itself. You'll see that field initialization, whether it's static or instance initialization, occurs in the order in which the fields are listed in the class definition. Sometimes this ordering can be important if your static fields are based on expressions or methods that expect other fields in the same class to be initialized first. You should avoid writing such code at all costs. In fact, any code that requires you to think about the ordering of the declaration of your fields in your class is bad code. If initialization ordering matters, you should consider initializing all of your fields in the body of the static constructor. That way, people maintaining your code at a later date won't be unpleasantly surprised when they reorder the fields in your class for some reason.

Static (Class) Constructors

I already touched upon static constructors in the "Fields" section, but let's look at them in a little more detail. A class can have at most one static constructor, and that static constructor cannot accept any parameters. Static constructors can never be invoked directly. Instead, the CLR invokes them when it needs to initialize the type for a given application domain. The static constructor is called before an instance of the given class is first created or before some other static fields on the class are referenced. Let's modify the previous field initialization example to include a static constructor and examine the output:

using System;

public class A
{
   static A()
   {
      Console.WriteLine( "static A::A()" );
   }

   private static int InitX()
   {
      Console.WriteLine( "A.InitX()" );
      return 1;
   }
   private static int InitY()
   {
      Console.WriteLine( "A.InitY()" );
      return 2;
   }
   private static int InitA()
   {
      Console.WriteLine( "A.InitA()" );
      return 3;
   }
   private static int InitB()
   {
      Console.WriteLine( "A.InitB()" );
      return 4;
   }

   private int y = InitY();
   private int x = InitX();

   private static int a = InitA();
   private static int b = InitB();
}

public class EntryPoint
{
   static void Main()
   {
      A a = new A();
   }
}

I've added the static constructor and want to see that it has been called in the output. The output from the previous code is as follows:

A.InitA()
A.InitB()

static A::A()

A.InitY()
A.InitX()

Of course, the static constructor was called before an instance of the class was created. However, notice the important ordering that occurs. The static field initializers are executed before the body of the static constructor executes. This ensures that the instance fields are initialized properly before possibly being referenced within the static constructor body.

It is the default behavior of the CLR to call the type initializer (implemented using the static constructor syntax) before any member of the type is accessed. By that, I mean that the type initializers will execute before any code accesses a field or a method on the class or before an object is created from the class. However, you can apply a metadata attribute defined in the CLR, beforefieldinit, to the class to relax the rules a little bit. In the absence of the beforefieldinit attribute, the CLR is required to call the type initializer before any member on the class is touched. With the beforefieldinit attribute, the CLR is free to defer the type initialization to the point right before the first static field access and not any time sooner. This means that if beforefieldinit is set on the class, you can call instance constructors and methods all day long without requiring the type initializer to execute first. But as soon as anything tries to access a static field on the class, the CLR invokes the type initializer first. Keep in mind that the beforefieldinit attribute gives the CLR this leeway to defer the type initialization to a later time, but the CLR could still initialize the type long before the first static field is accessed.

The C# compiler sets the beforefieldinit attribute on all classes that don't specifically define a static constructor. To see this in action, you can use ILDASM to examine the IL generated for the previous two examples. For the example in the previous section, where I didn't specifically define a static constructor, the class A metadata looks like the following:

.class public auto ansi beforefieldinit A
       extends [mscorlib]System.Object
{
} // end of class A

For the class A metadata in the example in this section, the metadata looks like the following:

.class public auto ansi A
       extends [mscorlib]System.Object
{
} // end of class A

This behavior of the C# compiler makes good sense. When you explicitly define a type initializer, you usually want to guarantee that it will execute before anything in the class is utilized or before any instance of the class is created. However, if you don't provide an explicit type initializer and you do have static field initializers, the C# compiler will create a type initializer of sorts that merely initializes all of the static fields. Because you didn't provide user code for the type initializer, the C# compiler can let the class defer the static field initializers until one of the static fields is accessed.

After all of this discussion regarding beforefieldinit, you should make note of one important point. Suppose you have a class similar to the ones in the examples, where a static field is initialized based upon the result of a method call. If your class doesn't provide an explicit type initializer, it would be erroneous to assume that the code called during the static field initialization will be called prior to an object creation based on this class. For example, consider the following code:

using System;

public class A
{
   public A()
   {
      Console.WriteLine( "A.A()" );
   }

   static int InitX()
   {
      Console.WriteLine( "A.InitX()" );
      return 1;
   }

   public int x = InitX();
}

public class EntryPoint
{
   static void Main()
   {
      // No guarantee A.InitX() is called before this!
      A a = new A();
   }
}

If your implementation of InitX contains some side effects that are required to run before an object instance can be created from this class, then you would be better off putting that code in a static constructor so that the compiler will not apply the beforefieldinit metadata attribute to the class. Otherwise, there's no guarantee that your code with the side effect in it will run prior to a class instance being created.

Instance Constructor and Creation Ordering

Instance constructors follow a lot of the same rules as static constructors, except they're more flexible and powerful, so they have some added rules of their own. Let's examine those rules.

Instance constructors can have what's called an initializer expression. An initializer expression allows instance constructors to defer some of their work to other instance constructors within the class, or more importantly, to base class constructors during object initialization. This is important if you rely on the base class instance constructors to initialize the inherited members. Remember, constructors are never inherited, so you must go through explicit means such as this in order to call the base class constructors during initialization of derived types if you need to.

If your class doesn't implement an instance constructor at all, the compiler will generate a default parameterless instance constructor for you, which really only does one thing—it merely calls the base class default constructor through the base keyword. If the base class doesn't have an accessible default constructor, a compiler error is generated. For example, the following code doesn't compile:

public class A
{
   public A(int x) {
      this.x = x;
   }

   private int x;
}

public class B : A
{
}

public class EntryPoint
{
   static void Main()
   {
      B b = new B();
   }
}

Can you see why it won't compile? The problem is that a class with no explicit constructors is given a default parameterless constructor by the compiler; this constructor merely calls the base class parameterless constructor, which is exactly what the compiler tries to do for class B. However, the problem is that, because class A does have an explicit instance constructor defined, the compiler doesn't produce a default constructor for class A. So, there is no accessible default constructor available on class A for class B's compiler-provided default constructor to call. Therein lies another caveat to inheritance. In order for the previous example to compile, either you must explicitly provide a default constructor for class A, or class B needs an explicit constructor. Now, let's look at an example that demonstrates the ordering of events during instance initialization:

using System;

class Base
{
   public Base( int x )
   {
      Console.WriteLine( "Base.Base(int)" );
this.x = x;
   }

   private static int InitX()
   {
      Console.WriteLine( "Base.InitX()" );
      return 1;
   }

   public int x = InitX();
}

class Derived : Base
{
   public Derived( int a )
      :base( a )
   {
      Console.WriteLine( "Derived.Derived(int)" );
      this.a = a;
   }

   public Derived( int a, int b )
      :this( a )
   {
      Console.WriteLine( "Derived.Derived(int, int)" );
      this.a = a;
      this.b = b;
   }

   private static int InitA()
   {
      Console.WriteLine( "Derived.InitA()" );
      return 3;
   }

   private static int InitB()
   {
      Console.WriteLine( "Derived.InitB()" );
      return 4;
   }

   public int a = InitA();
   public int b = InitB();
}

public class EntryPoint
{
   static void Main()
   {
      Derived b = new Derived( 1, 2 );
   }
}

Before I start detailing the ordering of events here, look at the output from this code:

Derived.InitA()
Derived.InitB()

Base.InitX()

Base.Base(int)

Derived.Derived(int)
Derived.Derived(int, int)

Are you able to determine why the ordering is the way it is? It can be quite confusing upon first glance, so let's take a moment to examine what's going on here. The first line of the Main method creates a new instance of class Derived. As you see in the output, the constructor is called. But, it's called in the last line of the output! Clearly, a lot of things are happening before the constructor body for class Derived executes.

At the bottom, you see the call to the Derived constructor that takes two int parameters. Notice that this constructor has an initializer using the this keyword. This delegates construction work to the Derived constructor that takes one int parameter.

The Derived constructor that takes one int parameter also has an initialization list, except it uses the base keyword, thus calling the constructor for the class Base, which takes one int parameter. However, if a constructor has an initializer that uses the base keyword, the constructor will invoke the field initializers defined in the class before it passes control to the base class constructor. And remember, the ordering of the initializers is the same as the ordering of the fields in the class definition. This behavior explains the first two entries in the output. The output shows that the initializers for the fields in Derived are invoked first, before the initializers in Base.

After the initializers for Derived execute, control is then passed to the Base constructor that takes one int parameter. Notice that class Base has an instance field with an initializer, too. The same behavior happens in Base as it does in Derived, so before the constructor body for the Base constructor is executed, the constructor implicitly calls the initializers for the class. I have more to say about why this behavior is defined in this way later in this section, and it involves virtual methods. This is why the third entry in the output trace is that of Base.InitX.

After the Base initializers are done, you find yourself in the block of the Base constructor. Once that constructor body runs to completion, control returns to the Derived constructor that takes one int parameter, and execution finally ends up in that constructor's code block. Once it's done there, it finally gets to execute the body of the constructor that was called when the code created the instance of Derived in the Main method. Clearly, a lot of initialization work is going on under the covers when an object instance is created.

As promised, I'll explain why the field initializers of a derived class are invoked before the constructor for the base class is called through an initializer on the derived constructor, and the reason is subtle. Virtual methods, which I cover in more detail in the section titled "Inheritance and Virtual Methods," work inside constructors in the CLR and in C#.

Note

If you're coming from a C++ programming environment, you should recognize that this behavior of calling virtual methods in constructors is completely different. In C++, you're never supposed to rely on virtual method calls in constructors, because the vtable is not set up while the constructor body is running.

Let's look at an example:

using System;

public class A
{
   public virtual void DoSomething()
   {
      Console.WriteLine( "A.DoSomething()" );
   }

   public A()
   {
      DoSomething();
   }
}

public class B : A
{
   public override void DoSomething()
   {
      Console.WriteLine( "B.DoSomething()" );
      Console.WriteLine( "x = {0}", x );
   }

   public B()
      :base()
   {
   }

   private int x = 123;
}

public class EntryPoint
{
   static void Main()
   {
      B b = new B();
   }
}

The output from this code is as follows:

B.DoSomething()
x = 123

As you can see, the virtual invocation works just fine from the constructor of A. Notice that B.DoSomething uses the x field. Now, if the field initializers were not run before the base invocation, imagine the calamity that would ensue when the virtual method is invoked from the class A constructor. That, in a nutshell, is why the field initializers are run before the base constructor is called if the constructor has an initializer. The field initializers are also run before the constructor's body is entered, if there is no initializer defined for the constructor.

Destroying Objects

If you thought object creation was complicated, hold onto your hats. As you know, the CLR environment contains a garbage collector, which manages memory on your behalf. You can create new objects as much as you want, but you never have to worry about freeing their memory explicitly. A huge majority of bugs in native applications come from memory allocation/deallocation mismatches, otherwise known as memory leaks. Garbage collection is a technique meant to avoid that type of bug, because the execution environment now handles the tracking of object references and destroys the object instances when they're no longer in use.

The CLR tracks every single managed object reference in the system that is just a plain-old object reference that you're already used to. During a heap compaction, if the CLR realizes that an object is no longer reachable via a reference, it flags the object for deletion. As the garbage collector compacts the heap, these flagged objects either have their memory reclaimed or are moved over into a queue for deletion if they have a finalizer. It is the responsibility of another thread, the finalizer thread, to iterate over this queue of objects and call their finalizers before freeing their memory. Once the finalizers have completed, the memory for the object is freed on the next collection pass, and the object is completely dead, never to return.

Finalizers

There are many reasons why you should rarely write a finalizer. When used unnecessarily, finalizers can degrade the performance of the CLR, because finalizable objects live longer than their nonfinalizable counterparts. Even allocating finalizable objects is more costly. Additionally, finalizers are difficult to write, because you cannot make any assumptions about the state that other objects in the system are in.

When the finalization thread iterates through the objects in the queue of finalizable objects, it calls the Finalize method on each object. The Finalize method is an override of a virtual method on System.Object; however, it's illegal in C# to explicitly override this method. Instead, you write a destructor that looks like a method that has no return type, cannot have access modifiers applied to it, accepts no parameters, and whose identifier is the class name immediately prefixed with a tilde. Destructors cannot be called explicitly in C#, and they are not inherited, just as constructors are not inherited. A class can have only one destructor.

When an object's finalizer is called, each finalizer in an inheritance chain is called, from the most derived class to the least derived class. Consider the following example:

using System;

public class Base
{
~Base()
   {
      Console.WriteLine( "Base.~Base()" );
   }
}

public class Derived : Base
{
   ~Derived()
   {
      Console.WriteLine( "Derived.~Derived()" );
   }
}

public class EntryPoint
{
   static void Main()
   {
      Derived derived = new Derived();
   }
}

As expected, the result of executing this code is as follows:

Derived.~Derived()
Base.~Base()

Although the garbage collector now handles the task of cleaning up memory so that you don't have to worry about it, you have a whole new host of concerns to deal with when it comes to the destruction of objects. I've mentioned that finalizers run on a separate thread in the CLR. Therefore, whatever objects you use inside your destructor must be thread-safe, but the odds are you should not even be using other objects in your finalizer, because they may have already been finalized or destroyed. This includes objects that are fields of the class that contains the finalizer. You have no guaranteed way of knowing exactly when your finalizer will be called or in what order the finalizer will be called between two independent or dependent objects. This is one more reason why you shouldn't introduce interdependencies on objects in the destructor code block. After all this dust has settled, it starts to become clear that you shouldn't do much inside a finalizer except basic housecleaning, if anything.

Essentially, you only need to write a finalizer when your object manages some sort of unmanaged resource. However, if the resource is managed through a standard Win32 handle, I highly recommend that you use the SafeHandle type to manage it. Writing a wrapper such as SafeHandle is tricky business, mainly because of the finalizer and all of the things you must do to guarantee that it will get called in all situations, even the diabolical ones such as an out-of-memory condition or in the face of unexpected exceptions. Finally, any object that has a finalizer must implement the Disposable pattern, which I cover in the forthcoming section titled "Disposable Objects."

Deterministic Destruction

So far, everything that you've seen regarding destruction of objects in the garbage-collected environment of the CLR is known as nondeterministic destruction. That means that you cannot predict the timing of the execution of the destructor code for an object. If you come from a native C++ world, you'll recognize that this is completely different.

In C++, heap object destructors are called when the user explicitly deletes the object. With the CLR, the garbage collector handles that for you, so you don't have to worry about forgetting to do it. However, for a C++-based stack object, the destructor is called as soon as the execution scope in which that object is created is exited. This is known as deterministic destruction and is extremely useful for managing resources.

Let's examine the case of an object that holds a system file handle. You can use such a stack-based object in C++ to manage the lifetime of the file handle. When the object is created, the constructor of the object acquires the file handle, and as soon as the object goes out of scope, the destructor is called and its code closes the file handle. This frees the client code of the object from having to manage the resource explicitly. It also prevents resource leaks, because if an exception is thrown from that code block where the object is used, C++ guarantees that the destructors for all stack-based objects will be called no matter how the block is exited.

This idiom is called Resource Acquisition Is Initialization (RAII), and it's extremely useful for managing resources. C# has almost completely lost this capability of automatic cleanup in a timely manner. Of course, if you had an object that held a file open and closed it in the destructor, you wouldn't have to worry about whether the file gets closed or not, but you will definitely have to consider when it gets closed. The fact is, you don't know exactly when it will get closed if the code to close it is in the finalizer, which is fallout from nondeterministic finalization. For this very reason, it would be bad design to put resource management code, such as closing file handles, in the finalizer. What if the object is already marked for finalization but has not had its finalizer called yet, and you try to create a new instance of the object whose constructor tries to open the resource? Well, with an exclusive-access resource, the code will fail in the constructor for the new instance. I'm sure you'll agree that this is not desired, and most definitely would not be expected by the client of your object.

Let's revisit the finalization ordering problem mentioned a little while ago. If an object contains another finalizable object, and the outer object is put on the finalization queue, the internal objects possibly are, too. However, the finalizer thread just goes through the queue finalizing the objects individually. It doesn't care who was an internal object of whom. So clearly, it's possible that if destructor code accesses an object reference in a field, that object could already have been finalized. Accessing such a field produces the dreaded undefined behavior.

This is a perfect example of how the garbage collector removes one bit of complexity but replaces it with another. In reality, you should avoid finalizers if possible. Not only do they add complexity, but they hamper memory management, because they cause objects to live longer than objects with no finalizer. This is because they're put on the finalization list, and it is the responsibility of an entirely different thread to clean up the finalization list. In the "Disposable Objects" section and in Chapter 13, I describe an interface, IDisposable, that was included in the Framework Class Library in order to facilitate a form of deterministic destruction.

Exception Handling

It's important to note the behavior of exceptions when inside the scope of a finalizer. If you come from a native C++ world, you know that it is bad behavior to allow exceptions to propagate out from a destructor, because in certain situations, that may cause your application to abort. In C#, an exception thrown in a finalizer that leaves the finalizer uncaught will be treated as an unhandled exception, and by default, the process will be terminated after notifying you of the exception.

Note

This behavior starting with .NET 2.0 is a breaking change from .NET 1.1. Before .NET 2.0, unhandled exceptions in the finalization thread were swallowed after notifying the user, and the process was allowed to continue. The danger with this behavior is that the system could be running in a half-baked or inconsistent state. Therefore, it's best to kill the process rather than run the risk of it causing more damage. In Chapter 7, I show you how you can force the CLR to revert to the pre-2.0 behavior if you absolutely must.

Disposable Objects

In the previous section on finalizers, I discussed the differences between deterministic and nondeterministic finalization, and you also saw that you lose a lot of convenience along with deterministic finalization. For that reason, the IDisposable interface exists, and in fact, it was only added during beta testing of the first release of the .NET Framework when developers were shouting about not having any form of deterministic finalization built into the framework. It's not a perfect replacement for deterministic finalization, but it does get the job done at the expense of adding complexity to the client of your objects.

The IDisposable Interface

The IDisposable definition is as follows:

public interface IDisposable
{
   void Dispose();
}

Notice that it has only one method, Dispose, and it is within this method's implementation that the dirty work is done. Thus, you should completely clean up your object and release all resources inside Dispose. Even though the client code rather than the system calls Dispose automatically, it's the client code's way of saying, "I'm done with this object and don't intend to use it ever again."

Even though the IDisposable pattern provides a form of deterministic destruction, it is not a perfect solution. Using IDisposable, the onus is thrown on the client to ensure that the Dispose method is called. There is no way for the client to rely upon the system, or the compiler, to call it for them automatically. C# makes this a little easier to manage in the face of exceptions by overloading the using keyword, which I discuss in the next section.

When you implement Dispose, you normally implement the class in such a way that the finalizer code reuses Dispose. This way, if the client code never calls Dispose, the finalizer code will take care of it at finalization time. Another factor makes implementing IDisposable painful for objects, and that is that you must chain calls of IDisposable if your object contains references to other objects that support IDisposable. This makes designing classes a little more difficult, because you must know whether a class that you use for a field type implements IDisposable, and if it does, you must implement IDisposable and you must make sure to call its Dispose method inside yours.

Given all of this discussion regarding IDisposable, you can definitely start to see how the garbage collector adds complexity to design, even though it reduces the chance for memory bugs. I'm not trying to say the garbage collector is worthless; in fact, it's very valuable when used appropriately. However, as with any design, engineering decisions typically have pros and cons in both directions.

Let's look at an example implementation of IDisposable:

using System;

public class A : IDisposable
{
   private bool disposed = false;
   private void Dispose( bool disposing )
   {
      if( !disposed ) {
         if( disposing ) {
            // It is safe to access other objects here.
         }

         Console.WriteLine( "Cleaning up object" );
         disposed = true;
      }
   }
   public void Dispose()
   {
      Dispose( true );
      GC.SuppressFinalize( this );
   }

   public void DoSomething()
   {
      Console.WriteLine( "A.SoSomething()" );
   }

   ~A()
   {
      Console.WriteLine( "Finalizing" );
      Dispose( false );
   }
}

public class EntryPoint
{
   static void Main()
   {
      A a = new A();
      try {
         a.DoSomething();
      }
      finally {
         a.Dispose();
      }
   }
}

Let's go over this code in detail to see what's really going on. The first thing to notice in the class is an internal Boolean field that registers whether or not the object has been disposed. It's there because it's perfectly legal for client code to call Dispose multiple times. Therefore, you need some way to know that you've done the work already.

You'll also see that I've implemented the finalizer in terms of the Dispose implementation. Notice that I have two overloads of Dispose. I've done this so that I know inside the Dispose(bool) method whether I got here through IDisposable.Dispose or through the destructor. It tells me whether I can safely access contained objects inside the method.

One last point: The Dispose method makes a call to GC.SuppressFinalize. This method on the garbage collector allows you to keep the garbage collector from finalizing an object. If the client code calls Dispose, and if the Dispose method completely cleans up all resources, including all the work a finalizer would have done, then there is no need for this object to ever be finalized. You can call SuppressFinalize to keep this object from being finalized. This handy optimization helps the garbage collector get rid of your object in a timely manner when all references to it cease to exist.

Now, let's take a look at how to use this disposable object. Notice the try/finally block within the Main method. I cover exceptions in Chapter 7. For now, understand that this try/finally construct is a way of guaranteeing that certain code will be executed no matter how a code block exits. In this case, no matter how the execution flow leaves the try block—whether normally, through a return statement, or even by exception—the code in the finally block will execute. View the finally block as a sort of safety net. It is within this finally block that you call Dispose on the object. No matter what, Dispose will get called.

This is a perfect example of how nondeterministic finalization throws the onus on the client code, or the user, to clean up the object, whereas deterministic finalization doesn't require the user to bother typing these ugly try/finally blocks or to call Dispose. This definitely makes life harder on the user, as it makes it much more tedious to create exception-safe and/or exception-neutral code. The designers of C# have tried to lessen this load by overloading the using keyword. Although it lessens the load, it doesn't remove the burden put on the client code altogether.

Note

C++/CLI allows you to use RAII in a way familiar to C++ developers without requiring you to call Dispose explicitly or use a using block. It would be nice if C# could do the same, but it would cause too much of a calamity to introduce such a breaking change in the language at this point.

The using Keyword

The using keyword was overloaded to support the IDisposable pattern, and the general idea is that the using statement acquires the resources within the parentheses following the using keyword, while the scope of these local variables is confined to the declaration scope of the following curly braces.

Let's look at a modified form of the previous example:

using System;

public class A : IDisposable
{
   private bool disposed = false;
   private void Dispose( bool disposing )
   {
      if( !disposed ) {
         if( disposing ) {
            // It is safe to access other objects here.
         }
Console.WriteLine( "Cleaning up object" );
         disposed = true;
      }
   }
   public void Dispose()
   {
      Dispose( true );
      GC.SuppressFinalize( this );
   }

   public void DoSomething()
   {
      Console.WriteLine( "A.SoSomething()" );
   }

   ~A()
   {
      Console.WriteLine( "Finalizing" );
      Dispose( false );
   }
}

public class EntryPoint
{
   static void Main()
   {
      using( A a = new A() ) {
         a.DoSomething();
      }

      using( A a = new A(), b = new A() ) {
         a.DoSomething();
         b.DoSomething();
      }
   }
}

The meat of the changes is in the Main method. Notice that I've replaced the ugly try/finally construct with the cleaner using statement. Under the covers, the using statement expands to the try/finally construct I already had. Now, granted, this code is much easier to read and understand. However, it still doesn't remove the burden from the client code of having to remember to use the using statement in the first place.

The using statement requires that all resources acquired in the acquisition process be implicitly convertible to IDisposable. That is, they must implement IDisposable. If they don't, you'll see a compiler warning.

Method Parameter Types

Method parameters follow the same general rules as those of C/C++. That is, by default, parameters declare a variable identifier that is valid for the duration and scope of the method itself. There are no const parameters as in C++, and method parameters may be reassigned at will. Unless the parameter is declared a certain way as a ref or an out parameter, such reassignment will remain local to the method.

I have found that one of the biggest stumbling blocks for C++ developers in C# is dealing with the semantics of variables passed to methods. The dominant type of type instance within the CLR is a reference, so variables to such objects merely point to their instances on the heap—i.e., arguments are passed to the method using reference semantics. C++ developers are used to copies of variables being made as they're passed into methods by default, unless they're passed by reference or as pointers. In other words, arguments are passed using value semantics.

In C#, arguments are actually passed by value. However, for references, the value that is copied is the reference itself and not the object that it references. Changes in state that are made to the reference object within the method are visible to the caller of the method.

There is no notion of a const parameter within C#, thus you should create immutable objects to pass where you would have wanted to pass a const parameter. I have more to say about immutable objects in Chapter 13.

Note

Those C++ developers who are used to using handle/body idioms to implement copy-on-write semantics must take these facts into consideration. It doesn't mean that you cannot employ those idioms in C#; rather, it just means that you must implement them differently.

Value Arguments

In reality, all parameters passed to methods are value arguments, assuming they're normal, plain, undecorated parameters that get passed to a method. By undecorated, I mean they don't have special keywords such as out, ref, and params attached to them. They can, however, have metadata attributes attached to them just as almost everything else in the CLR type system can. As with all parameters, the identifier is in scope within the method block following the parameter list (i.e., within the curly braces), and the method receives a copy of the passed variable at invocation time. Be careful about what this means, though. If the passed variable is a struct, or value type, then the method receives a copy of the value. Any changes made locally to the value are not seen by the caller. If the passed variable is a reference to an object on the heap, as any variable for a class instance is, then the method receives a copy of the reference. Thus, any changes made to the object through the reference are seen by the caller of the method.

ref Arguments

Passing parameters by reference is indicated by placing the ref modifier ahead of the parameter type in the parameter list for the method. When a variable is passed by reference, a new copy of the variable is not made, and the caller's variable is directly affected by any actions within the method. As is usually the case in the CLR, this means two slightly different things, depending on whether the variable is an instance of a value type (struct) or an object (class).

When a value instance is passed by reference, a copy of the caller's value is not made. It's as if the parameter were passed as a C++ pointer, even though you access the methods and fields of the variable in the same way as value arguments. When an object (reference) instance is passed by reference, again, no copy of the variable is made, which means that a new reference to the object on the heap is not created. In fact, the variable behaves as if it were a C++ pointer to the reference variable, which could be viewed as a C++ pointer to a pointer. Additionally, the verifier ensures that the variable referenced by the ref parameter has been definitely assigned before the method call. Let's take a look at some examples to put the entire notion of ref parameters into perspective:

using System;

public struct MyStruct
{
    public int val;
}

public class EntryPoint
{
    static void Main() {
        MyStruct myValue = new MyStruct();
        myValue.val = 10;

        PassByValue( myValue );
        Console.WriteLine( "Result of PassByValue: myValue.val = {0}",
                           myValue.val );

        PassByRef( ref myValue );
        Console.WriteLine( "Result of PassByRef: myValue.val = {0}",
                           myValue.val );
    }

    static void PassByValue( MyStruct myValue ) {
        myValue.val = 50;
    }

    static void PassByRef( ref MyStruct myValue ) {
        myValue.val = 42;
    }
}

This example contains two methods: PassByValue and PassByRef. Both methods modify a field of the value type instance passed in. However, as the following output shows, the PassByValue method modifies a local copy, whereas the PassByRef method modifies the caller's instance as you would expect:

Result of PassByValue: myValue.val = 10
Result of PassByRef: myValue.val = 42

Also, pay attention to the fact that the ref keyword is required at the point of call into the PassByRef method. This is necessary because the method could be overloaded based upon the ref keyword. In other words, another PassByRef method could just as well have taken a MyStruct by value rather than by ref. Plus, the fact that you have to put the ref keyword on at the point of call makes the code easier to read in my opinion. When programmers read the code at the point of call, they can get a pretty clear idea that the method could make some changes to the object being passed by ref.

Now, let's consider an example that uses an object rather than a value type:

using System;
public class EntryPoint
{
    static void Main() {
        object myObject = new Object();

        Console.WriteLine( "myObject.GetHashCode() == {0}",
                           myObject.GetHashCode() );
        PassByRef( ref myObject );
        Console.WriteLine( "myObject.GetHashCode() == {0}",
                           myObject.GetHashCode() );
    }

    static void PassByRef( ref object myObject ) {
        // Assign a new instance to the variable.
        myObject = new Object();
    }
}

In this case, the variable passed by reference is an object. But, as I said, instead of the method receiving a copy of the reference, thus creating a new reference to the same object, the original reference is referenced instead. Yes, this can be confusing. In the previous PassByRef method, the reference passed in is reassigned to a new object instance. The original object is left with no references to it, so it is now available for collection. To illustrate that the myObject variable references two different instances between the point before it is called and the point after it is called, I sent the results of myObject.GetHashCode to the console to prove it and you can see the output that I got below.

myObject.GetHashCode() == 46104728
myObject.GetHashCode() == 12289376

out Parameters

Out parameters are almost identical to ref parameters, with two notable differences. First, instead of using the ref keyword, you use the out keyword, and you still have to provide the out keyword at the point of call as you do with the ref keyword. Second, the variable referenced by the out variable is not required to have been definitely assigned before the method is called as it is with ref parameters. That's because the method is not allowed to use the variable for anything useful until it has assigned the variable. For example, the following is valid code:

public class EntryPoint
{
    static void Main() {
        object obj;
        PassAsOutParam( out obj );
    }

    static void PassAsOutParam( out object obj ) {
        obj = new Object();
    }
}

Notice that the obj variable in the Main method is not directly assigned before the call to PassAsOutParam. That's perfectly fine, because it is marked as an out parameter. The PassAsOutParam method won't be referencing the variable unless it has already assigned it. If you were to replace the two occurrences of out with ref in the previous code, you would see a compiler error similar to the following:

error CS0165: Use of unassigned local variable 'obj'

param Arrays

C# makes it a snap to pass a variable list of parameters. Simply declare the last parameter in your parameter list as an array type and precede the array type with the params keyword. Now, if the method is invoked with a variable number of parameters, those parameters are passed to the method in the form of an array that you can easily iterate through, and the array type that you use can be based on any valid type. Here's a short example:

using System;

public class EntryPoint
{
    static void Main() {
        VarArgs( 42 );
        VarArgs( 42, 43, 44 );
        VarArgs( 44, 56, 23, 234, 45, 123 );
    }

    static void VarArgs( int val1, params int[] vals ) {
        Console.WriteLine( "val1: {0}", val1 );
        foreach( int i in vals ) {
            Console.WriteLine( "vals[]: {0}",
                               i );
        }
        Console.WriteLine();
    }
}

In each case, VarArgs is called successfully, but in each case, the array referenced by the vals parameter is different. As you can see, referencing a variable number of parameters is pretty easy in C#. You can code an efficient Add method to a container type using parameter arrays where only one call is necessary to add a variable number of items.

Method Overloading

C# overloading is a compile-time technique in which, at a call point, the compiler chooses a method from a set of methods with the same name. The compiler uses the argument list of the method to choose the method that fits best. The argument types and the ref, out, and param parameter modifiers play a part in method overloading, because they form part of the method signature. Methods without variable-length parameter arrays get preference over those that have them. Similar to C++, the method return type is not part of the signature (except in one rare case of conversion operators, which I cover in Chapter 6). So you cannot have methods within an overloaded class where the only difference is the return type. Finally, if the compiler gets to a point where multiple methods are ambiguous with respect to overloading, it stops with an error.

Overall, there's really nothing different about method overloading in C# compared to C++. Normally, it can't possibly cause any runtime exceptions, because the entire algorithm is applied at compile time. When the compiler fails to find an exact match based on the parameters given, it then starts hunting for a best match based on implicit convertibility of the instances in the parameter list. Thus, if a single parameter method accepts an object of type A, and you have passed an object of type B that is derived from type A, in the absence of a method that accepts type B, the compiler will implicitly convert your instance into a type A reference to satisfy the method call. Depending on the situation and the size of the overloaded method set, the selection process can still be a tricky one. I've found that it's best to minimize too many confusing overloads where implicit conversion is necessary to satisfy the resolution. Too many implicit conversions can make code difficult to follow, requiring you to actually execute it in a debugger to see what happens. That makes it hard on maintenance engineers who need to come in behind you and figure out what you were doing. It's not to say that implicit conversion is bad during overload resolution, but just use it judiciously and sparingly to minimize future surprises.

Starting with C# 4.0, overload resolution can generate exceptions at run time when using the new dynamic type. You can reference Chapter 17 for all of the details on dynamic types.

Optional Arguments

The C# designers always consider a list of feature requests when designing the next version of the language. Over the years, optional method arguments has been near the top of that list. But until C# 4.0, there were never enough compelling reasons to add it to the language. In the spirit of greater interoperability, C# 4.0 introduced optional arguments as well as named arguments (covered in the next section) and the dynamic type (covered in Chapter 17). These three features greatly enhance the interoperability experience between C# and other technologies such as COM and bring C# up to the same level of interoperability convenience as Visual Basic.

Optional arguments in C# work very similarly to the way they work in C++. In the method declaration, you may provide a default value for a method parameter at the point where the parameter is declared. Any parameter without a default value is considered a required parameter. Additionally, no required parameters may follow default parameters in the method declaration. This has the effect of placing all of the default parameters at the end of the parameter list for a method. Consider the following contrived example:

using System;

class TeamMember
{
    public TeamMember( string     fullName,
                       string     title = "Unknown",
                       string     team = "Unknown",
                       bool       isFullTime = false,
                       TeamMember manager = null ) {
        FullName = fullName;
        Title = title;
        Team = team;
        IsFullTime = isFullTime;
        Manager = manager;
    }

    public string FullName { get; private set; }
    public string Title { get; private set; }
    public string Team { get; private set; }
    public bool IsFullTime{ get; private set; }
public TeamMember Manager { get; private set; }
}

static class EntryPoint
{
    static void Main() {
        TeamMember tm = new TeamMember( "Milton Waddams" );
    }
}

In this example, I am using optional parameters in the constructor declaration and you can see in the Main method, I rely on the associated optional arguments when initializing an instance of TeamMember. Notice that all of the default parameter values are constants. It is not permitted to provide anything other than a compile-time constant for a default parameter value. If you do, you will be greeted with the following compiler error:

error CS1736: Default parameter value for 'title' must be a compile-time constant

Named Arguments

Named arguments are a new feature that was introduced in C# 4.0 and actually complements optional arguments. Consider the TeamMember example class from the previous section. Suppose you wanted to create an instance of TeamMember and accept all of the default arguments, except the isFullTime argument to the constructor. In order to do so, you must utilize named arguments as shown in the following example:

using System;

class TeamMember
{
    public TeamMember( string     fullName,
                       string     title = "Unknown",
                       string     team = "Unknown",
                       bool       isFullTime = false,
                       TeamMember manager = null ) {
        FullName = fullName;
        Title = title;
        Team = team;
        IsFullTime = isFullTime;
        Manager = manager;
    }

    public string FullName { get; private set; }
    public string Title { get; private set; }
    public string Team { get; private set; }
    public bool IsFullTime{ get; private set; }
    public TeamMember Manager { get; private set; }
}

static class EntryPoint
{
    static void Main() {
        TeamMember tm = new TeamMember( "Peter Gibbons",
                                        isFullTime : true );
    }
}

Notice how I have provided the isFullTime argument to the constructor. Using named arguments is easy. Simply provide the name of the argument followed by a colon and then the value you want to be passed for that argument. The rest of the arguments will contain their default values when the constructor is invoked. In fact, I could have used a named argument for the required argument in the constructor above. If I had done that, I could have swapped the order of the arguments in the argument list entirely as shown below:

static void Main() {
    TeamMember tm = new TeamMember(
                           isFullTime : true,
                           fullName : "Peter Gibbons" );
}

Named arguments have a minor effect on method overloading. Essentially, a positional list of arguments is constructed by combining the given positional arguments together with the named arguments and any applicable default arguments. Once this positional list of arguments is constructed, overload resolution proceeds as normal. About the only place where named arguments get tricky is with virtual overrides. Consider the following contrived example:

using System;

class A
{
    public virtual void DoSomething( int x, int y ) {
        Console.WriteLine( "{0}, {1}", x, y );
    }
}

class B : A
{
    public override void DoSomething( int y, int x ) {
        base.DoSomething( y, x );
    }
}

static class EntryPoint
{
    static void Main() {
        B b = new B();
        b.DoSomething( x : 1, y : 2 );

        A a = b;
        a.DoSomething( x : 1, y : 2 );
    }
}

This example is completely diabolical and something that should be avoided entirely! If you execute the code above, you will see the following output:

2, 1
1, 2

When you override a virtual method in a derived type, the only requirement is that the parameter types and their positions must match the base method's declaration. The actual names of the arguments can change as I have shown above. The problem is that the compiler must have a reliable methodology for mapping a named argument to the positional argument of virtual methods. The way it does this is it uses the method declaration of the static type of the variable reference.

In the example above, the two references, a and b, both reference the same instance of B. However, when DoSomething is called with named arguments, it's the declaration of DoSomething associated with the static type of the reference that matters. In the first call to DoSomething via the b variable, because the static type of the variable is B, the named arguments are resolved using the B.DoSomething definition. But in the second call to DoSomething via the a variable, because the variable is of static type A, then the named arguments are deduced by referencing the definition for A.DoSomething. As you can see, this must be avoided at all costs and certainly introduce some unnecessary confusion into the code.

Finally, one more thing to consider is that any time you have expressions in an argument list, they are evaluated in the order in which they appear even if that order is different from the positional parameters of the method being called. Consider the following example:

using System;

class A
{
    public void DoSomething( int x, int y ) {
        Console.WriteLine( "{0}, {1}", x, y );
    }
}

static class EntryPoint
{
    static void Main() {
        A a = new A();

        a.DoSomething( GenerateValue1(),
                       GenerateValue2() );

        // Now use named arguments
        a.DoSomething( y : GenerateValue2(),
                       x : GenerateValue1() );

    }

    static int GenerateValue1() {
        Console.WriteLine( "GenerateValue1 called" );
        return 1;
    }

    static int GenerateValue2() {
Console.WriteLine( "GenerateValue2 called" );
        return 2;
    }
}

When you run the code above, you will get the following output sent to the console:

GenerateValue1 called
GenerateValue2 called

1, 2

GenerateValue2 called

GenerateValue1 called
1, 2

Notice that the order of calling the GenerateValue1 and GenerateValue2 methods depends on the order in which they appear in the argument list, regardless of which positional parameter they are associated with. After the expressions are evaluated, in this case, after GenerateValue1 and GenerateValue2 are called, then the arguments are placed in their respective positions in order to find the best method.

Warning

Prior to the existence of named arguments, you could create code where the order of expression evaluation in parameter lists could be relied upon. Doing so is poor design with or without named arguments. In the previous example, imagine the methods were coded with side effects such that GenerateValue2 always assumed GenerateValue1 was called prior to it executing. And suppose you called a method such as A.DoSomething using positional arguments back when C# 3.0 was current. Later on, once named arguments exist, a maintenance engineer decides to change the code and pass the arguments in the opposite order using named arguments simply because it produces prettier looking code. Now you have a serious problem! The moral of the story is to avoid the situation entirely by not relying on the order of expression evaluation in argument lists.

Inheritance and Virtual Methods

C# implements the notion of virtual methods just as the C++ and Java languages do. That's no surprise at all, because C# is an object-oriented language, and virtual methods are the primary mechanism for implementing dynamic polymorphism. That said, some notable differences from those languages deserve special mention.

Virtual and Abstract Methods

You declare a virtual method using either the virtual or abstract modifiers on the method at the point of declaration. They both introduce the method into the declaration space as one that a deriving class can override. The difference between the two is that abstract methods are required to be overridden, whereas virtual methods are not. Abstract methods are similar to C++ pure virtual methods, except that C++ pure virtual methods may have an implementation associated with them, whereas C# abstract methods may not. Additionally, classes that contain abstract methods must also be marked abstract themselves. Virtual methods, in contrast to abstract methods, are required to have an implementation associated with them. Virtual methods, along with interfaces, are the only means of implementing polymorphism within C#.

Note

Under the hood, the CLR implements virtual methods differently from C++. Whereas C++ can create multiple vtables (dynamic method tables pointing to virtual methods) for an individual object of a class depending on its static hierarchical structure, CLR objects have only one method table that contains both virtual and nonvirtual methods. Additionally, the table in the CLR is built early on in the lifetime of the object. Not only does the creation order of objects affect the ordering of static initializers and constructor calls in a hierarchy, but it also gives C# a capability that C++ lacks. In C#, virtual method calls work when called inside constructor bodies, whereas they don't in C++. For more information on how the CLR manages method tables for object instances, read Essential .NET, Volume 1: The Common Language Runtime by Don Box and Chris Sells (Addison-Wesley Professional, 2002).

override and new Methods

To override a method in a derived class, you must tag the method with the override modifier. If you don't, you'll get a compiler warning telling you that you need to provide either the new modifier or the override modifier in the derived method declaration. The compiler defaults to using the new modifier, which probably does the exact opposite of what you intended. This behavior is different than C++, because in C++, once a method is marked as virtual, any derived method of the same name and signature is automatically an override of the virtual method, and the virtual modifier on those derived methods is completely optional. Personally, I prefer the fact that C# requires you to tag the overriding method simply for the purpose of code readability. I cannot tell you how many poorly designed C++ code bases I've worked on with deep hierarchies where developers were too lazy to keep tagging the virtual override methods with the virtual keyword. I had no way of knowing if a method overrides a virtual in a base class without looking at the base class declaration. These terribly designed code bases had such deep hierarchies that they forced me to rifle through a whole plethora of files just to find the answer. C# drives a stake through the heart of this problem. Check out the following code:

using System;

public class A
{
    public virtual void SomeMethod() {
        Console.WriteLine( "A.SomeMethod" );
    }
}

public class B : A
{
    public void SomeMethod() {
        Console.WriteLine( "B.SomeMethod" );
    }
}

public class EntryPoint
{
    static void Main() {
        B b = new B();
        A a = b;

        a.SomeMethod();
    }
}

This code compiles, but not without the following warning:

test.cs(12,17): warning CS0114: 'B.SomeMethod()' hides inherited member 'A.SomeMethod()'.
To make the current member override that implementation, add the override keyword.
Otherwise add the new keyword.

When the code is executed, A.SomeMethod gets called. So what does the new keyword do? It breaks the virtual chain at that point in the hierarchy. When a virtual method is called through an object reference, the method called is determined from the method tables at run time. If a method is virtual, the runtime searches down through the hierarchy looking for the most derived version of the method, and then it calls that one. However, during the search, if it encounters a method marked with the new modifier, it backs up to the method of the previous class in the hierarchy and uses that one instead. That is why A.SomeMethod is the method that gets called. Had B.SomeMethod been marked as override, then the code would have called B.SomeMethod instead. Because C# defaults to using the new modifier when none of them are present, it throws off the warning possibly to get the attention of those of us who are used to the C++ syntax. Finally, the new modifier is orthogonal to the virtual modifier in meaning, in the sense that the method marked new could either also be virtual or not. In the previous example, I did not also attach the virtual modifier to B.SomeMethod, so there cannot be a class C derived from B that overrides B.SomeMethod, because it's not virtual. Thus, the new keyword not only breaks the virtual chain, but it redefines whether the class and the derived classes from class B will get a virtual SomeMethod.

Another issue to consider with regard to overriding methods is whether to call the base class version of the method and when. In C#, you call the base class version using the base identifier as shown:

using System;

public class A
{
    public virtual void SomeMethod() {
Console.WriteLine( "A.SomeMethod" );
    }
}

public class B : A
{
    public override void SomeMethod() {
        Console.WriteLine( "B.SomeMethod" );
        base.SomeMethod();
    }
}


public class EntryPoint
{
    static void Main() {
        B b = new B();
        A a = b;

        a.SomeMethod();
    }
}

As expected, the output of the previous code prints A.SomeMethod on the line after it prints B.SomeMethod. Is this the correct ordering of events? Should it not be the other way around? Shouldn't B.SomeMethod call the base class version before it does its work? The point is that you don't have enough information to answer this question. Therein lies a problem with inheritance and virtual method overrides. How do you know when and if to call the base class method? The answer is that the method should be well documented so that you know how to do the right thing. Thus, inheritance with virtual methods increases your documentation load, because now you must provide the consumers of your class with information above and beyond just the public interface. For example, if you follow the Non-Virtual Interface (NVI) pattern that I describe in Chapter 13, the virtual method in question is protected, so now you must document both public methods and some protected methods, and the virtual methods must clearly state whether the base class should call them and when. Ouch!

sealed Methods

For the reasons stated previously, I believe you should seal your classes by default and only make classes inheritable in well-thought-out circumstances. Many times I see hierarchies where the developer was thinking, "I'll just mark all of my methods as virtual to give my deriving classes the most flexibility." All this does is create a rat's nest of bugs later down the line. This thought pattern is typical of less-experienced designers who are grappling with the complexities of inheritance and virtual methods. The fact is that inheritance coupled with virtual methods is so surprisingly complex that it's best to explicitly turn off the capability rather than leave it wide open for abuse. Therefore, when designing classes, you should prefer to create sealed, noninheritable classes, and you should document the public interface well. Consumers who need to extend the functionality can still do so, but through containment rather than inheritance. Extension through containment coupled with crafty interface definitions is far more powerful than class inheritance.

In rare instances, you're deriving from a class with virtual methods and you want to force the virtual chain for a specific method to end at your override. In other words, you don't want further derived classes to be able to override the virtual method. To do so, you also mark the method with the sealed modifier. As is obvious from the name, it means that no further derived classes can override the method. They can, however, provide a method with the same signature, as long as the method is marked with the new modifier, as discussed in the previous section. In fact, you could mark the new method as virtual, thus starting a new virtual chain in the hierarchy. This is not the same as sealing the entire class, which doesn't even allow a class to derive from this one in the first place. Therefore, if the deriving class is marked as sealed, then marking override methods within that class with sealed is redundant.

A Final Few Words on C# Virtual Methods

Clearly, C# provides a lot of flexible keywords to make some interesting things happen when it comes to inheritance and virtual methods. However, just because the language provides them does not mean that it's wise to use them. Over the past decade, many experts have published countless books describing how to design C++- and Java-based applications safely and effectively. Many times, those works indicate things that you should not do rather than things that you should do. That's because C++, along with C#, provides you with the power to do things that don't necessarily fall within the boundaries of what's considered good design. In the end, you want to strive for classes and constructs that are intuitive to use and carry few hidden surprises.

The savvy reader probably noticed that the new modifier is the quickest way to introduce some serious surprises into a class hierarchy. If you ever find yourself using that modifier on a method, you're most likely using a class in a way it was not intended to be used. You could be deriving from a class that should have been marked sealed in the first place. And you may be cursing the developer of that class for not marking a particular method virtual so you can easily override it. Therefore, you resort to using the new modifier. Just because it exists, don't assume it's wise to use it. The designer of the class you're deriving from probably never intended you to derive from it and just forgot to mark it sealed. And if the designer intentionally left it unsealed, he probably did not intend for you to replace the method you're trying to override. Therefore, always strive to follow time-tested design techniques and avoid the whiz-bang features of the language that go against that grain of good design.

Inheritance, Containment, and Delegation

When many people started programming in object-oriented languages some years ago, they thought inheritance was the greatest thing since sliced bread. In fact, many people consider it an integral, important part of object-oriented programming. Some argue that a language that doesn't support inheritance is not an object-oriented language at all. This arguing point for many people over the years has almost taken on the form of a religious war at times. As time went on, though, some astute designers started to notice the pitfalls of inheritance.

Choosing Between Interface and Class Inheritance

When you first discover inheritance, you have a tendency to overuse it and abuse it. This is easy to do. Misuse can make software designs hard to understand and maintain, especially in languages such as C++ that support multiple inheritance. It can also make it hard for those designs to adapt to future needs, thus forcing them to be thrown out and replaced with a completely new design. In languages that only support single inheritance, such as C# and Java, you're forced to apply more diligence to your use of inheritance.

For example, when modeling a human-resources system at company XYZ, one naïve designer could be inclined to introduce classes such as Payee, BenefitsRecipient, and Developer. Then, using multiple inheritance, he could build or compose a full-time developer, represented by the class FulltimeDeveloper, by inheriting from all three, as in Figure 4-3.

Example of bad inheritance

Figure 4-3. Example of bad inheritance

As you can see, this forces our designer to create a new class for contract developers, where the concrete class doesn't inherit from BenefitsRecipient. After the system grows by leaps and bounds, you can quickly see the flaw in the design when the inheritance lattice becomes complex and deep. Now he has two classes for types of developers, thus making the design hard to manage. Now, let's look at a bad attempt of the same problem with a language that supports only single inheritance. Figure 4-4 shows you that this solution is hardly a good one.

Example of bad single-inheritance hierarchy

Figure 4-4. Example of bad single-inheritance hierarchy

If you look closely, you can see the ambiguity that is present. It's impossible that the Developer class can be derived from both Payee and BenefitsRecipient in an environment where only single inheritance is allowed. Because of that, these two hierarchies cannot live within the same design. You could create two different variants of the Developer class—one for FulltimeDeveloper to derive from, and one for ContractDeveloper to derive from. However, that would be a waste of time. More importantly, code reuse—the main benefit of inheritance—is gone if you have to create two versions of essentially the same class.

A better approach is to have a Developer class that contains various properties that represent these qualities of developers within the company. For example, the support of a specific interface could represent the support of a certain property. An inheritance hierarchy that is multiple levels deep is a telltale sign that the design needs some rethinking.

To see what's really going on here, let's take a moment to analyze what inheritance does for you. In reality, it allows you to get a little bit of work for free by inheriting an implementation. There is an important distinction between inheritance and interface implementation. Although the object-oriented languages, including C#, typically use a similar syntax for the two, it's important to note that classes that implement an interface don't get any implementation at all. When using inheritance, not only do you inherit the public contract of the base class, but you also inherit the layout, or the guts.

A good rule of thumb is that when your purpose is primarily to inherit a contract, choose interface implementation over inheritance. This will guarantee that your design has the greatest flexibility. To understand more why that's the case, let's investigate more pitfalls of inheritance.

Delegation and Composition vs. Inheritance

Another very important aspect of inheritance that is unfavorable: Inheritance can break encapsulation and always increases coupling. I'm sure we all agree, or at least we should all agree, that encapsulation is the most fundamental and important object-oriented concept. If that's the case, then why would you want to break it? Yet any time you use encapsulation where the base type contains protected fields, you're cracking the shell of encapsulation and exposing the internals of the base class. This cannot be good. Let me explain why it's not and what sorts of alternatives you have at your disposal that can create better designs.

Many describe inheritance as white-box reuse. A better form of reuse is black-box reuse, meaning that the internals of the object are not exposed to you. You can achieve this by using containment. Yes, that's correct. Instead of inheriting your new class from another, you can contain an instance of the other class in your new class, thus reusing the class of the contained type without cracking the encapsulation. The downside to this technique is that in most languages, including C#, it requires a little more coding work, but not too much. In the end, it can provide a much more adaptable design.

For a simple example of what I'm talking about, consider a problem domain where a class handles some sort of custom network communications. Let's call this class NetworkCommunicator, and let's say it looks like this:

public class NetworkCommunicator
{
   public void SendData( DataObject obj )
   {
      // Send the data over the wire.
   }

   public DataObject ReceiveData()
   {
      // Receive data over the wire.
   }
}

Now, let's say that you come along later and decide it would be nice to have an EncryptedNetworkCommunicator object, where the data transmission is encrypted before it is sent. A common approach would be to derive EncryptedNetworkCommunicator from NetworkCommunicator. Then, the implementation could look like this:

public class EncryptedNetworkCommunicator : NetworkCommunicator
{
   public override void SendData( DataObject obj )
   {
      // Encrypt the data.
      base.SendData( obj );
   }

   public override DataObject ReceiveData()
   {
      DataObject obj = base.ReceiveData();

      // Decrypt data.

      return obj;
   }
}

There is a major drawback here. First of all, good design dictates that if you're going to modify the functionality of the base class methods, you should override them. To override them properly, you need to declare them as virtual in the first place. This requires you to be able to tell the future when you design the NetworkCommunicator class and mark the methods as virtual, but since we could not tell the future we did not mark it virtual and therefore EncryptedNetworkCommunicator above will not compile. Yes, you can hide them in C# using the new keyword when you define the method on the derived class. But if you do that, you're breaking the tenet that the inheritance relationship models an is-a relationship. Now, let's look at the containment solution:

public class EncryptedNetworkCommunicator
{
   public EncryptedNetworkCommunicator()
   {
      contained = new NetworkCommunicator();
   }

   public void SendData( DataObject obj )
   {
      // Encrypt the data.
      contained.SendData( obj );
   }

   public DataObject ReceiveData()
   {
      DataObject obj = contained.ReceiveData();

      // Decrypt data

      return obj;
   }

   private NetworkCommunicator contained;
}

As you can see, it's only slightly more work. But the good thing is, you're able to reuse the NetworkCommunicator as if it were a black box. The designer of NetworkCommunicator could have created the thing sealed, and you would still be able to reuse it. Had it been sealed, you definitely could not have inherited from it. While reusing NetworkCommunicator via containment, one could even provide a public contract on the container such that it looks slightly different than the one NetworkCommunicator implements. Such a technique is commonly referred to as the Facade pattern.

Another downfall of using inheritance is that it is not dynamic. It is static by the very fact that it is determined at compile time. This can be very limiting, to say the least. You can remove this limitation by using containment. However, in order to do that, you have to also employ your good friend, polymorphism. By doing so, the contained type can be, say, an interface type. Then, the contained object merely has to support the contract of that interface in order to be reused by the container. Moreover, you can change this object at run time. Think about this for a moment and let it sink in. Consider an object that represents a container of sortable objects. Let's say that this container type comes with a default sort algorithm. If you implement this default algorithm as a contained type that you can swap at run time, then if the problem domain required it, you could replace it with a custom sort algorithm as long as the new sort algorithm object implements the required interface that the container type expects. This technique is known as the Strategy design pattern.

In conclusion, you can see that designs are much more flexible if you favor dynamic rather than static constructs. This includes favoring containment over inheritance in many reuse cases. This type of reuse is also known as delegation, because the work is delegated to the contained type. Containment also preserves encapsulation, whereas inheritance breaks encapsulation. One word of caution is in order, though. As with just about anything, you can overdo containment. For smaller utility classes, it may not make sense to go to too much effort to favor containment. And in some cases, you need to use inheritance to implement specialization. But, in the grand scheme of things, designs that favor containment over inheritance as a reuse mechanism are magnitudes more flexible and stand the test of time much better. Always respect the power of inheritance, including the damage it can cause through its misuse.

Summary

In this very long chapter, I've covered the important points regarding the C# type system, which allows you to create new types that have all of the capabilities of implicit types defined by the runtime. I started out by covering class definitions used to define new reference types, then I followed that with struct definitions used to create instances of new value types within the CLR, and I described the major differences between the two. Related to the topic of value types is that of boxing and unboxing, which I showed can introduce unintended inefficiencies when you don't understand all of the places boxing can be introduced by the compiler. (In Chapter 11, which covers generics, you'll see how you can eliminate boxing and unboxing entirely in some cases.)

I then turned to the complex topics of object creation and initialization, as well as object destruction. Destruction is a rather tricky topic in the CLR, because your reference types can support either deterministic or nondeterministic destruction. (I cover destruction in more detail with more examples in Chapter 13.) Then, I quickly discussed method overloading in C# and the various modifiers you can place on methods to control whether they're modified as virtual, override, or sealed. Finally, I spent some time discussing inheritance, polymorphism, and containment, and I provided some pointers for choosing when to use them.

The last sections in this chapter lead right into the next chapter, where I'll cover the all-important topic of interface-based, or contract-based, programming and how to use it in the CLR.



[10] Unsafe coding within C# is outside of the scope of this book. For more information, I suggest that you reference the MSDN documentation.

[11] Projection initializers are very handy when used together with LINQ (Language-Integrated Query) which I cover in Chapter 16.

[12] Be sure to read Chapter 8, where I give reasons why Object.ToString is not what you want when creating software for localization to various locales and cultures.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.218.151.44