In Chapter 2, we covered all the C# predefined types and briefly touched on the topic of reference types versus value types. In this chapter, we continue the discussion of data types with further explanation of the categories of types.
In addition, we delve into the details of combining data elements together into tuples—a feature introduced in C# 7.0—followed by grouping data into sets called arrays. To begin, let’s delve further into understanding value types and reference types.
All types fall into one of two categories: value types and reference types. The differences between the types in each category stem from how they are copied: Value type data is always copied by value, whereas reference type data is always copied by reference.
Except for string
, all the predefined types in the book so far have been value types. Variables of value types contain the value directly. In other words, the variable refers to the same location in memory where the value is stored. Because of this, when a different variable is assigned the same value, a copy of the original variable’s value is made to the location of the new variable. A second variable of the same value type cannot refer to the same location in memory as the first variable. Consequently, changing the value of the first variable will not affect the value in the second variable, as Figure 3.1 demonstrates. In the figure, number1
refers to a location in memory that contains the value 42
. After assigning number1
to number2
, both variables will contain the value 42
. However, modifying either variable’s value will not affect the other.
Similarly, passing a value type to a method such as Console.WriteLine()
will result in a memory copy, and any changes to the parameter inside the method will not affect the original value within the calling function. Since value types require a memory copy, they generally should be defined to consume a small amount of memory; value types should almost always be less than 16 bytes in size.
By contrast, the value of a variable of reference type is a reference to a storage location that contains data. Reference types store the reference where the data is located instead of storing the data directly, as value types do. Therefore, to access the data, the runtime reads the memory location out of the variable and then “jumps” to the location in memory that contains the data, an operation known as dereferencing. The memory area of the data a reference type points to is called the heap (see Figure 3.2).
A reference type does not require the same memory copy of the data that a value type does, which makes copying reference types far more efficient than copying large value types. When assigning the value of one reference type variable to another reference type variable, only the reference is copied, not the data referred to. In practice, a reference is always the same size as the “native size” of the processor—a 32-bit processor will copy a 32-bit reference, a 64-bit processor will copy a 64-bit reference, and so on. Obviously, copying the small reference to a large block of data is faster than copying the entire block, as a value type would.
Since reference types copy a reference to data, two different variables can refer to the same data. If two variables refer to the same object, changing data in the object via one variable causes the effect to be seen when accessing the same data via another variable. This happens both for assignment and for method calls. Therefore, a method can affect the data of a reference type, and that change can be observed when control returns to the caller. For this reason, a key factor when choosing between defining a reference type or a value type is whether the object is logically like an immutable value of fixed size (and therefore possibly a value type), or logically a mutable thing that can be referred to (and therefore likely to be a reference type).
Besides string
and any custom classes such as Program
, all types discussed so far are value types. However, most types are reference types. Although it is possible to define custom value types, it is relatively rare to do so in comparison to the number of custom reference types.
Begin 8.0
Begin 2.0
null
Often it is desirable to represent values that are “missing.” When specifying a count
, for example, what do you store if the count
is unknown or unassigned? One possible solution is to designate a “magic” value, such as -1
or int.MaxValue
. However, these are valid integers, so it can be ambiguous as to when the magic value is a normal int
or when it implies a missing value. A preferable approach is to assign null
to indicate that the value is invalid or that the value has not been assigned. Assigning null
is especially useful in database programming. Frequently, columns in database tables allow null
values. Retrieving such columns and assigning them to corresponding variables within C# code is problematic, unless the data type in C# can contain null
as well.
You can declare a type as either nullable or not nullable, meaning you can declare a type to allow a null
value or not, with the nullable modifier. (Technically, C# only includes support for the nullable modifier with value types in C# 2.0 and reference types in C# 8.0.) To enable nullability, simply follow the type declaration with a nullable modifier—a question mark immediately following the type name. For example, int? number = null
will declare a variable of type int
that is nullable and assign it the value null
. Unfortunately, nullability includes some pitfalls, requiring the use of special handling when nullability is enabled.
8.0
null
ReferenceWhile support for assigning null
to a variable is invaluable (pun intended), it is not without its drawbacks. While copying or passing a null
value to other variables and methods is inconsequential, dereferencing (invoking a member on) an instance of null
will throw a System.NullReferenceException
—for example, invoking text.GetType()
when text
has the value null
. Anytime production code throws a System.NullReferenceException
, it is always a bug. This exception indicates that the developer who wrote the code did not remember to check for null
before the invocation. Further exacerbating the problem, checking for null
requires an awareness on the developer’s part that a null
value is possible and, therefore, an explicit action is necessary. It is for this reason that declaring of a nullable variable requires explicit use of the nullable modifier—rather than the opposite approach where null
is allowed by default (see “Nullability of Reference Types before C# 8.0” later in the section). In other words, when the programmer opts in to allow a variable to be null
, he or she takes on the additional responsibility of being sure to avoid dereferencing a variable whose value is null
.
Since checking for null
requires the use of statements and/or operators that we haven’t discussed yet, the details on how to check for null
appear in Advanced Topic: Checking for null
. Full explanations, however, appear in Chapter 4.
2.0
Since a value type refers directly to the actual value, value types cannot innately contain a null
because, by definition, they cannot contain references, including references to nothing. Nonetheless, we still use the term “dereferencing a value type” when invoking members on the value type. In other words, while not technically correct, using the term “dereferencing” when invoking a member, regardless of whether it is a value type, is common.1
1. Nullable value types were introduced with C# 2.0.
2.0
8.0
Prior to C# 8.0, all reference types allowed null
. Unfortunately, this resulted in numerous bugs because avoiding a null
reference exception required the developer to realize the need to check for null
and defensively program to avoid dereferencing the null
value. Further exacerbating the problem, reference types are nullable by default. If no value is assigned to a variable of reference type, the value will default to null
. Moreover, if you dereferenced a reference-type local variable whose value was unassigned, the compiler would (appropriately) issue an error, "Use of unassigned local variable 'text'"
, for which the easiest correction was to simply assign null
during declaration, rather than to ensure a more appropriate value was assigned regardless of the path that execution mgiht follow (see Listing 3.2). In other words, developers would easily fall into the trap of declaring a variable and assigning a null
value as the simplest resolution to the error, (perhaps mistakenly) expecting the code would reassign the variable before it was dereferenced.
#nullable eneable static void Main() { string? text; // ... // Compile error: Use of unassigned local variable 'text' System.Console.WriteLine(text.length); }
In summary, the nullability of reference types by default was a frequent source of defects in the form of System.NullReferenceException
s, and the behavior of the complier led developers astray unless they took explicit actions to avoid the pitfall.
To improve this scenario significantly, the C# team introduced the concept of nullability to reference types in C# 8.0—a feature known as nullable reference types (implying, of course, that reference types could be non-nullable as well). Nullable reference types bring reference types on par with value types, in that reference type declarations can occur with or without a nullable modifier. In C# 8.0, declaring a variable without the nullable modifier implies it is not nullable.
Unfortunately, supporting the declaration of a reference type with a nullable modifier and defaulting the reference type declaration with no null
modifier to non-nullable has major implications for code that is upgraded from earlier versions of C#. Given that C# 7.0 and earlier supported the assignment of null
to all reference type declarations (i.e., string text = null
), does all the code fail compilation in C# 8.0?
Fortunately, backward compatibility is extremely important to the C# team, so support for reference type nullability is not enabled by default. Instead, there are a couple of options to enable it: the #nullable
directive and project properties.
First, the null
reference type feature is activated in this example with the #nullable
directive:
#nullable enable
The directive supports values of enable
, disable
, and restore
—the last of which restores the nullable context to the project-wide setting. Listing 3.2 provides an example that sets nullable
to enabled
with a nullable
directive. In so doing, the declaration of text
as string?
is enabled and no longer causes a compiler warning.
Alternatively, programmers can use project properties to enable reference type nullability. By default, a project file’s (*.csproj
) project-wide setting has nullable
disabled. To enable it, add a Nullable
project property whose value is enable
, as shown in Listing 3.3.
<Project Sdk="Microsoft.NET.Sdk"> <PropertyGroup> <OutputType>Exe</OutputType> <TargetFramework>netcoreapp3.0</TargetFramework> <Nullable>enable</Nullable> </PropertyGroup> </Project>
All sample code for this book (available at https://github.com/EssentialCSharp) has nullable
enabled at the project level. Alternatively, you can also set the project properties on the dotnet command line with the /p
argument:
dotnet build /p:Nullable=enable
Specifying the value for Nullable
on the command line will override any value set in the project file.
End 8.0
Begin 3.0
C# 3.0 added a contextual keyword, var
, for declaring an implicitly typed local variable. If the code initializes a variable at declaration time with an expression of unambiguous type, C# 3.0 and later allow for the variable data type to be implied rather than stated, as shown in Listing 3.4.
class 3.2->3.Uppercase { static void Main() { System.Console.Write("Enter text: "); var text = System.Console.ReadLine(); // Return a new string in uppercase var uppercase = text.ToUpper(); System.Console.WriteLine(uppercase); } }
This listing differs from Listing 2.18 in two ways. First, rather than using the explicit data type string
for the declaration, Listing 3.4 uses var
. The resultant CIL code is identical to using string
explicitly. However, var
indicates to the compiler that it should determine the data type from the value (System.Console.ReadLine()
) that is assigned within the declaration.
Second, the variables text
and uppercase
are initialized by their declarations. To not do so would result in an error at compile time. As mentioned earlier, the compiler determines the data type of the initializing expression and declares the variable accordingly, just as it would if the programmer had specified the type explicitly.
Although using var
rather than the explicit data type is allowed, consider avoiding such use when the data type is not obvious—for example, use string
for the declaration of text
and uppercase
. Not only does this make the code more understandable, but it also allows the compiler to verify your intent, that the data type returned by the right-hand side expression is the type expected. When using a var
-declared variable, the right-hand side data type should be obvious; if it isn’t, use of the var
declaration should be avoided.
3.0
End 3.0
Begin 7.0
On occasion, you will find it useful to combine data elements together. Consider, for example, information about a country such as the poorest country in the world in 2019: South Sudan, whose capital is Juba, with a GDP per capita of $275.18. Given the constructs we have established so far, we could store each data element in individual variables, but the result would be no association of the data elements together. That is, $275.18 would have no association with South Sudan, except perhaps by a common suffix or prefix in the variable names. Another option would be to combine all the data into a single string, albeit with the disadvantage that to work with each data element individually would require parsing it out.
C# 7.0 provides a third option, a tuple. Tuples allow you to combine the assignment to each variable in a single statement, as shown here for the country data:
(string country, string capital, double gdpPerCapita) = ("South Sudan", "Juba", 275.18);
Tuples have several additional syntax possibilities, as shown in Table 3.1.
In the first four examples, and although the right-hand side represents a tuple, the left-hand side still represents individual variables that are assigned together using tuple syntax, a syntax involving two or more elements separated by commas and associated together with parentheses. (The term tuple syntax is used here because the underlying data type that the compiler generates on the left-hand side isn’t technically a tuple.) The result is that although we start with values combined as a tuple on the right, the assignment to the left deconstructs the tuple into its constituent parts. In example 2, the left-hand side assignment is to pre-declared variables. However, in examples 1, 3, and 4, the variables are declared within the tuple syntax. Given that we are only declaring variables, the naming and casing convention follows the guidelines we discussed in Chapter 1—“DO use camelCase for local variable names,” for example.
Note that although implicit typing (var
) can be distributed across each variable declaration within the tuple syntax, as shown in example 4, you cannot do the same with an explicit type (such as string
). Since tuples allow each item to be a different data type, distributing the explicit type name across all elements wouldn’t necessarily work unless all the item data types were identical (and even then, the compiler doesn’t allow it).
Table 3.1: Sample Code for Tuple Declaration and Assignment
Example |
Description |
Example Code |
1. |
Assign a tuple to individually declared variables. |
(string country, string capital, double gdpPerCapita) = ("South Sudan", "Juba", 275.18); System.Console.WriteLine( $@"The poorest country in the world in 2017 was { country}, {capital}: {gdpPerCapita}"); |
2. |
Assign a tuple to individually declared variables that are pre-declared. |
string country; string capital; double gdpPerCapita; (country, capital, gdpPerCapita) = ("South Sudan", "Juba", 275.18); System.Console.WriteLine( $@"The poorest country in the world in 2017 was { country}, {capital}: {gdpPerCapita}"); |
3. |
Assign a tuple to individually declared and implicitly typed variables. |
(var country, var capital, var gdpPerCapita) = ("South Sudan", "Juba", 275.18); System.Console.WriteLine( $@"The poorest country in the world in 2017 was { country}, {capital}: {gdpPerCapita}"); |
4. |
Assign a tuple to individually declared variables that are implicitly typed with a distributive syntax. |
var (country, capital, gdpPerCapita) = ("South Sudan", "Juba", 275.18); System.Console.WriteLine( $@"The poorest country in the world in 2017 was { country}, {capital}: {gdpPerCapita}"); |
5. |
Declare a named item tuple and assign it tuple values, and then access the tuple items by name. |
(string Name, string Capital, double GdpPerCapita) countryInfo = ("South Sudan", "Juba", 275.18); System.Console.WriteLine( $@"The poorest country in the world in 2017 was { countryInfo.Name}, {countryInfo.Capital}: { countryInfo.GdpPerCapita}"); |
6. |
Assign a named item tuple to a single implicitly typed variable that is implicitly typed, and then access the tuple items by name. |
var countryInfo = (Name: "South Sudan", Capital: "Juba", GdpPerCapita: 275.18); System.Console.WriteLine( $@"The poorest country in the world in 2017 was { countryInfo.Name}, {countryInfo.Capital}: { countryInfo.GdpPerCapita}"); |
7. |
Assign an unnamed tuple to a single implicitly typed variable, and then access the tuple elements by their item-number property. |
var countryInfo = ("South Sudan", "Juba", 275.18); System.Console.WriteLine( $@"The poorest country in the world in 2017 was { countryInfo.Item1}, {countryInfo.Item2}: { countryInfo.Item3}"); |
8. |
Assign a named item tuple to a single implicitly typed variable, and then access the tuple items by their item-number property. |
var countryInfo = (Name: "South Sudan", Capital: "Juba", GdpPerCapita: 275.18); System.Console.WriteLine( $@"The poorest country in the world in 2017 was { countryInfo.Item1}, {countryInfo.Item2}: { countryInfo.Item3}"); |
9. |
Discard portions of the tuple with underscores. |
(string name, _, double gdpPerCapita) = ("South Sudan", "Juba", 275.18); |
10. |
Tuple element names can be inferred from variable and property names (starting in C# 7.1). |
string country = "South Sudan"; string capital = "Juba"; double gdpPerCapita = 275.18; var countryInfo = (country, capital, gdpPerCapita); System.Console.WriteLine( $@"The poorest country in the world in 2017 was { countryInfo.country}, {countryInfo.capital}: { countryInfo.gdpPerCapita}"); |
|
In example 5, we declare a tuple on the left-hand side and then assign the tuple on the right. Note that the tuple has named items—names that we can then reference to retrieve the item values from the tuple. This is what enables the countryInfo.Name
, countryInfo.Capital
, and countryInfo.GdpPerCapita
syntax in the System.Console.WriteLine
statement. The result of the tuple declaration on the left is a grouping of the variables into a single variable (countryInfo
) from which you can then access the constituent parts. This is useful because, as we discuss in Chapter 4, you can then pass this single variable around to other methods, and those methods will also be able to access the individual items within the tuple.
As already mentioned, variables defined using tuple syntax use camelCase. However, the convention for tuple item names is not well defined. Suggestions include using parameter naming conventions when the tuple behaves like a parameter—such as when returning multiple values that before tuple syntax would have used out parameters. The alternative is to use PascalCase, following the naming convention for members of a type (properties, functions, and public fields, as discussed in Chapters 5 and 6). I strongly favor the latter approach of PascalCase, consistent with the casing convention of all member identifiers in C# and .NET. Even so, since the convention isn’t broadly agreed upon, I use the word CONSIDER rather than DO in the guideline, “CONSIDER using PascalCasing for all tuple item names.”
Example 6 provides the same functionality as Example 5, although it uses named tuple items on the right-hand side tuple value and an implicit type declaration on the left. The items’ names are persisted to the implicitly typed variable, however, so they are still available for the WriteLine
statement. Of course, this opens up the possibility that you could name the items on the left-hand side with different names than what you use on the right. While the C# compiler allows this, it will issue a warning that the item names on the right will be ignored, as those on the left will take precedence.
If no item names are specified, the individual elements are still available from the assigned tuple variable. However, the names are Item1
, Item2
, ...
, as shown in Example 7. In fact, the ItemX
names are always available on the tuple even when custom names are provided (see Example 8). However, when using integrated development environment (IDE) tools such as one of the recent flavors of Visual Studio (one that supports C# 7.0), the ItemX
property will not appear within the IntelliSense dropdown—a good thing, since presumably the provided name is preferable. As shown in Example 9, portions of a tuple assignment can be discarded using an underscore—referred to as a discard.
The ability to infer the tuple item names as shown in example 10 isn’t introduced until C# 7.1. As the example demonstrates, the item name within the tuple can be inferred from a variable name or even a property name.
Tuples are a lightweight solution for encapsulating data into a single object in the same way that a bag might capture miscellaneous items you pick up from the store. Unlike arrays (which we discuss next), tuples contain item data types that can vary without constraint,2 except that they are identified by the code and cannot be changed at runtime. Also, unlike with arrays, the number of items within the tuple is hardcoded at compile time. Lastly, you cannot add custom behavior to a tuple (extension methods notwithstanding). If you need behavior associated with the encapsulated data, then leveraging object-oriented programming and defining a class is the preferred approach—a concept we begin exploring in depth in Chapter 6.
2. Technically, they can’t be pointers—a topic we introduce in Chapter 23.
Begin 8.0
One particular aspect of variable declaration that Chapter 1 didn’t cover is array declaration. With array declaration, you can store multiple items of the same type using a single variable and still access them individually using the index when required. In C#, the array index starts at zero. Therefore, arrays in C# are zero based.
8.0
Arrays are a fundamental part of nearly every programming language, so they are required learning for virtually all developers. Although arrays are frequently used in C# programming, and necessary for the beginner to understand, most C# programs now use generic collection types rather than arrays when storing collections of data. Therefore, readers should skim over the following section, “Declaring an Array,” simply to become familiar with their instantiation and assignment. Table 3.2 provides the highlights of what to note. Generic collections are covered in detail in Chapter 15.
Begin 3.0
Table 3.2: Array Highlights
Description |
Example |
Declaration Note that the brackets appear with the data type. Multidimensional arrays are declared using commas, where the comma+1 specifies the number of dimensions. |
string[] languages; // one-dimensional int[,] cells; // two-dimensional |
Assignment The If not assigned during declarations, the Arrays can be assigned without literal values. As a result, the value of each item in the array is initialized to its default. If no literal values are provided, the size of the array must be specified. (The size does not have to be a constant; it can be a variable calculated at runtime.) Starting with C# 3.0, specifying the data type is optional. |
string[] languages = { "C#", "COBOL", "Java", "C++", "TypeScript", "Pascal", "Python", "Lisp", "JavaScript"}; languages = new string[9]; languages = new string[]{"C#", "COBOL", "Java", "C++", "TypeScript", "Pascal", "Python", "Lisp", "JavaScript" }; // Multidimensional array assignment // and initialization int[,] cells = new int[3,3] { {1, 0, 2}, {1, 2, 0}, {1, 2, 1} }; |
Forward Accessing an Array Arrays are zero based, so the first element in an array is at index 0. The square brackets are used to store and retrieve data from an array. |
string[] languages = new string[]{ "C#", "COBOL", "Java", "C++", "TypeScript", "Visual Basic", "Python", "Lisp", "JavaScript"}; // Retrieve fifth item in languages array // (TypeScript) string language = languages[4]; // Write "TypeScript" System.Console.WriteLine(language); // Retrieve second item from the end (Python) language = languages[^3]; // Write "Python" System.Console.WriteLine(language); |
Reverse Accessing an Array Starting in C# 8.0, you can also index an array from the end. For example, item |
|
Ranges C# 8.0 allows you to identify and extract an array of elements using the range operator, which identifies the starting item up to but excluding the end item. |
System.Console.WriteLine($@"^3..^0: { // Python, Lisp, JavaScript string.Join(", ", languages[^3..^0]) }"); System.Console.WriteLine($@"^3..: { // Python, Lisp, JavaScript string.Join(", ", languages[^3..]) }"); System.Console.WriteLine($@" 3..^3: { // C++, TypeScript, Visual Basic string.Join(", ", languages[3..^3]) }"); System.Console.WriteLine($@" ..^6: { // C#, COBOL, Java string.Join(", ", languages[..^6]) }"); |
|
End 3.0
In addition, the final section of this chapter, “Common Array Errors,” provides a review of some of the array idiosyncrasies.
End 8.0
In C#, you declare arrays using square brackets. First, you specify the element type of the array, followed by open and closed square brackets; then you enter the name of the variable. Listing 3.7 declares a variable called languages
to be an array of strings.
string[] languages;
Obviously, the first part of the array identifies the data type of the elements within the array. The square brackets that are part of the declaration identify the rank, or the number of dimensions, for the array; in this case, it is an array of rank 1. These two pieces form the data type for the variable languages
.
Listing 3.7 defines an array with a rank of 1. Commas within the square brackets define additional dimensions. Listing 3.8, for example, defines a two-dimensional array of cells for a game of chess or tic-tac-toe.
// / / // ---+---+--- // / / // ---+---+--- // / / int[,] cells;
In Listing 3.8, the array has a rank of 2. The first dimension could correspond to cells going across and the second dimension to cells going down. Additional dimensions are added, with additional commas, and the total rank is one more than the number of commas. Note that the number of items that occur for a particular dimension is not part of the variable declaration. This is specified when creating (instantiating) the array and allocating space for each element.
Once an array is declared, you can immediately fill its values using a comma-delimited list of items enclosed within a pair of curly braces. Listing 3.9 declares an array of strings and then assigns the names of nine languages within curly braces.
string[] languages = { "C#", "COBOL", "Java", "C++", "TypeScript", "Visual Basic", "Python", "Lisp", "JavaScript"};
The first item in the comma-delimited list becomes the first item in the array, the second item in the list becomes the second item in the array, and so on. The curly brackets are the notation for defining an array literal.
The assignment syntax shown in Listing 3.9 is available only if you declare and assign the value within one statement. To assign the value after declaration requires the use of the keyword new
, as shown in Listing 3.10.
string[] languages; languages = new string[]{"C#", "COBOL", "Java", "C++", "TypeScript", "Visual Basic", "Python", "Lisp", "JavaScript" };
Begin 3.0
Starting in C# 3.0, specifying the data type of the array (string
) following new
is optional as long as the compiler is able to deduce the element type of the array from the types of the elements in the array initializer. The square brackets are still required.
End 3.0
C# also allows use of the new
keyword as part of the declaration statement, so it allows the assignment and the declaration shown in Listing 3.11.
string[] languages = new string[]{ "C#", "COBOL", "Java", "C++", "TypeScript", "Visual Basic", "Python", "Lisp", "JavaScript"};
The use of the new
keyword tells the runtime to allocate memory for the data type. It instructs the runtime to instantiate the data type—in this case, an array.
Whenever you use the new
keyword as part of an array assignment, you may also specify the size of the array within the square brackets. Listing 3.12 demonstrates this syntax.
string[] languages = new string[9]{ "C#", "COBOL", "Java", "C++", "TypeScript", "Visual Basic", "Python", "Lisp", "JavaScript"};
The array size in the initialization statement and the number of elements contained within the curly braces must match. Furthermore, it is possible to assign an array but not specify the initial values of the array, as demonstrated in Listing 3.13.
string[] languages = new string[9];
Assigning an array but not initializing the initial values will still initialize each element. The runtime initializes array elements to their default values, as follows:
Reference types—whether nullable or not (such as string
and string?
)—are initialized to null
.
Nullable value types are all initialized to null
.
Non-nullable numeric types are initialized to 0
.
bool
is initialized to false
.
char
is initialized to