Chapter 2. Introducing LINQ to Objects

Goals of this chapter:

• Define the capabilities of LINQ to Objects.

• Define the C# language enhancements that make LINQ possible.

• Introduce the main features of LINQ to Objects through a brief overview.

LINQ to Objects allows us to query in-memory collections and any type that implements the IEnumerable<T> interface. This chapter gives you a first real look at the language enhancements that support the LINQ story and introduces you to the main features of LINQ to Objects with a short overview. By the end of this chapter, the query syntax should be more familiar to you, and then the following chapters bring you deeper into the query syntax and features.

LINQ Enabling C# 3.0 Language Enhancements

Many new language C# language constructs were added in version 3.0 to improve the general coding experience for developers. Almost all the C# features added relate in some way to the realization of an integrated query syntax within called LINQ.

The features added in support of the LINQ syntax fall into two categories. The first is a set of compiler syntax additions that are shorthand for common constructs, and the second are features that alter the way method names are resolved during compilation. All these features, however, combine to allow a fluent query experience when working with structured data sources.

To understand how LINQ to Object queries compile, it is necessary to have some understanding of the new language features. Although this chapter will only give you a brief overview, the following chapters will use all these features in more advanced ways.

Note

There are a number of other new language features added in both C# 3.0 and C# 4.0 that don’t specifically add to the LINQ story covered in this introduction. The C# 4.0 features are covered in Chapter 8. C# 4.0 does require the .NET Framework 4 to be installed on machines executing the compiled code.

Extension Methods

Extension methods allow us to introduce additional methods to any type without inheriting or changing the source code behind that type. Methods introduced to a given type using extension methods can be called on an instance of that type in the same way ordinary instance methods are called (using the dot notation on an instance variable of a type).

Extension methods are built as static methods inside a static class. The first argument in the method has the this modifier, which tells the compiler that the following type is to be extended. Any following arguments are treated as normal, other than the second argument becomes the first and so on (the argument prefixed by the this modifier is skipped).

The rules for defining an extension method are

  1. The extension method needs to be defined in a nongeneric static class
  2. The static class must be at the root level of a namespace (that is, not nested within another class)
  3. The extension method must be a static method (which is enforced by the compiler due to the class also having to be marked static)
  4. The first argument of the extension method must be prefixed with the this modifier; this is the type being extended

To demonstrate the mechanics of declaring an extension method, the following code extends the System.String type, adding a method called CreateHyperlink. Once this code is compiled into a project, any class file that has a using MyNamespace; declaration can simply call this method on any string instance in the following fashion:

image

Listing 2-1 demonstrates how to create an extension method that returns the SHA1 Hash value for a string (with and without extra arguments). The output of this code can be seen in Output 2-1.

Listing 2-1. Adding a GetSHA1Hash method to the String type as an example extension method—see Output 2-1

image

Output 2-1

image

Extension methods declared in a namespace are available to call from any file that includes a using clause for that namespace. For instance, to make the LINQ to Objects extension methods available to your code, include the using System.Linq; clause at the top of the class code file.

The compiler will automatically give precedence to any instance methods defined for a type, meaning that it will use a method defined in a class if it exists before it looks for an extension method that satisfies the method name and signature.

When making the choice on whether to extend a class using object-oriented principles of inheritance or extension methods, early drafts of the “Microsoft C# 3.0 Language Specification”1 had the following advice (although the warning was removed in the final specification,2 it is still good advice in my opinion):

Extension methods are less discoverable and more limited in functionality than instance methods. For those reasons, it is recommended that extension methods be used sparingly and only in situations where instance methods are not feasible or possible.

The set of standard query operators that form the inbuilt query functionality for LINQ to Objects are made entirely using extension methods that extend any type that implements IEnumerable<T> and in some rare cases IEnumerable. (Most .NET collection classes and arrays implement IEnumerable<T>; hence, the Standard Query Operators are introduced to most of the built-in collection classes.) Although LINQ to Objects would be possible without extension methods, Microsoft would have had to add these operators to each collection type individually, and custom collections of our own type wouldn’t benefit without intervention. Extension methods allow LINQ to apply equally to the built-in collection types, and any custom collection type, with the only requirement being the custom collection must implement IEnumerable<T>. The current Microsoft-supplied extension methods and how to create new extension methods are covered in detail throughout this book. Understanding extension methods and how the built-in standard Query operators work will lead to a deeper understanding of how LINQ to Objects is built.

Object Initializers

C# 3.0 introduced an object initialization shortcut syntax that allows a single C# statement to both construct a new instance of a type and assign property values in one statement. While it is good programming practice to use constructor arguments for all critical data in order to ensure that a new type is stable and ready for use immediately after it is initialized (not allow objects to be instantiated into an invalid state), Object Initializers reduce the need to have a specific parameterized constructor for every variation of noncritical data argument set needed over time.

Listing 2-2 demonstrates the before and after examples of Object Initializers. Any public field or property can be assigned in the initialization statement by assigning that property name to a value; multiple assignments can be made by separating the expressions with a comma. The C# compiler behind the scenes calls the default constructor of the object and then calls the individual assignment statements as if you had previously assigned properties in subsequent statements manually. (See the C# 3.0 Language Specification in endnote 2 for a more precise description of how this initialization actually occurs.)

Listing 2-2. Object Initializer syntax—before and after

image

Although it seems to be a trivial (albeit useful) improvement in syntax in the general form, it is essential when you begin to write LINQ queries and need to construct object instances as part of their result set by setting property values on-the-fly. This is one of the more common scenarios when working with LINQ, and without this feature you would need to define a specific constructor for every set of properties you would like to initialize with a query.

Collection Initializers

With similar ambitions as the Object Initializer syntax, collection initialization was given similar functionality to improve the common construct and then add pattern. The collection must implement System.Collections.IEnumerable and have an appropriate overload for an Add method to support the new initialization syntax. Listing 2-3 demonstrates the use of the new collection initialization syntax and shows the before and after equivalent code patterns. It also demonstrates how to combine collection initialization with Object Initialization, which helps keep the code cleaner and generally easier to read.

Listing 2-3. Collection initialization syntax—before and after

image

Implicitly Typed Local Variables

When a local variable is defined with the keyword var instead of a concrete type, the declared type of the new variable is inferred from the initialization expression. This removes the need to duplicate the type name when it can be inferred by the initial value assignment or initialization expression. Variables initialized using the var keyword are strongly-typed; the variable is simply assigned a compile-time type of the initialization value expression. Listing 2-4 demonstrates the use of implicit typing of local variables.

Listing 2-4. Local variable declaration, implicitly typed examples

image

To use Implicitly Typed Local Variables, the declaration must follow these rules:

  1. No user defined local type called var can exist; otherwise, that type is used.
  2. The variable declaration must have an initialization expression (that is, an equal sign followed by an expression, which can be a constant).
  3. The initialize expression (the expression to the right side of the equal sign), must have a compile-time type.
  4. The declaration cannot include multiple declarations (for example, var x = 10, y = 42;).
  5. The variable cannot refer to itself.

Although the usefulness of declaring a variable in this way seems minor, it is a necessary feature for declaring some types that have no other legal way of being declared (Anonymous Types, for instance).

Anonymous Types

Anonymous types are compile-time generated types where the public properties are inferred from the object initialization expression at compile-time. These types can then be used locally within a method as temporary data storage classes, avoiding having to build a specific data class for any and every set of properties. The new class inherits directly from System.Object and for all practical purposes has no name (none that can be referenced from code anyway). What run-time type do you assign anonymous types? They have to be declared using the implicitly typed local variable construct (var) mentioned earlier.

An anonymous type is defined by omitting the type name after a new statement and providing a valid Object Initializer expression that contains one or more assignments. Listing 2-5 demonstrates how to create an anonymous type and how it can be used in subsequent code, the output of which is shown in Output 2-2.

Listing 2-5. Declaring and using anonymous types—see Output 2-2

image

Output 2-2

image

The importance of anonymous types becomes evident when writing queries that build collections using a subset of properties from an existing type (known as a projection). Imagine if when working with a relational database you couldn’t define the column list to be used as the return of an SQL query. The same issue occurs with LINQ; without anonymous types, you would need to define a concrete type for each and every return result set that altered the columns of interest. Optional parameters added in C# 4.0 (covered in Chapter 4, “Grouping and Joining Data”) make this somewhat easier, but anonymous types should be considered when results are needed temporarily.

Beyond their usefulness of supporting LINQ, anonymous types allow temporary data structures to be created during processing of that data. It was often necessary to build a plain, old class with nothing more than property declarations to support holding data; anonymous types fulfill this need perfectly. Although it is possible to pass anonymous types beyond the method that declared them by declaring an argument type as Object, this is highly discouraged. When using anonymous types outside of LINQ query functionality, keep the scoping to the method they are declared, otherwise define a concrete type or struct.

Lambda Expressions

Lambda expressions build upon the anonymous methods feature added in C# 2.0. Anonymous methods added the ability to inline the code by using the delegate keyword, which avoids the need to declare a delegate method in a separate body of code. C# 3.0 introduces the lambda operator =>, which can be used to create delegates or expression trees in an even briefer syntax.

Note

Delegate is a fancy term for a type-safe reference (pointer) to a function or method. For example, when you need to call code when someone clicks a button, you actually attach a Delegate to the event. The Delegate is the actual method implementing the code elsewhere. However, having to locate Delegate code separately (such as within a concrete method) was a pain, and in C# 2.0 the ability to inline that code was added. In C# 3.0, lambda expressions clean the syntax of writing that in-line code even more.

This C# 2.0 syntax for an anonymous method is

image

The same delegate using the C# 3.0 lambda expression syntax is

[parameter] => [expression]

Lambda expressions are used when passing method-based arguments to any of the LINQ standard Query operators, like Where for instance. A before and after example of how a Where clause would need to be formatted with and without the new syntax would be

image

Lambda expressions build upon the original anonymous method syntax by improving compile time type inference (the compiler works out the types of the arguments in most cases, eliminating the need to specify the input argument parameter types) and simplifying the syntax by eliminating the delegate keyword and the return statement for Expression bodies.

Note

Lambda expressions are generally single-expression delegates (only contain a single expression) when used with LINQ due to their limitation of not being convertible to expression trees. Anonymous methods can have multiple expressions and span multiple lines.

Some more general forms of lambda expressions taking input arguments and specifying explicit types when necessary (the compiler often can infer the type without assistance) are

image

Surrounding parentheses for lambdas with a single parameter is optional. Lambda expressions infer the parameter type and return types in most cases, but if the compiler cannot determine the types, you must explicitly type them by adding a type before the parameter identifier.

image

Lambda expressions will become second nature when you begin writing LINQ code; for now, just get comfortable with the fact that a lambda expression is simply a nicer way of passing code to another function to execute. Listing 2-6 demonstrates some before and after shots of inlining code as a delegate.

Listing 2-6. Anonymous method (C# 2.0) and Lambda Expression (C# 3.0) examples

image

Query Expressions

Query expressions is the feature where all of the previous new language constructs merge, and their pivotal role in LINQ can be seen. The following is a brief overview; future chapters cover how to write query expressions in detail, but for now focus on how the language enhancements combine into a query language integrated into C#.

Query expressions are a language syntax enhancement only; the query expression is mapped into a series of extension method calls and then into a series of static method calls by the compiler (for LINQ to Objects, other LINQ providers such as LINQ to SQL do not operate this way—they build to expression trees and are beyond the scope of this book). The advantage of the query expression syntax however, is its clarity and how familiar it appears to anyone who has worked in other query languages. Both query syntax styles and their benefits and tradeoffs are covered in Chapter 3, “Writing Basic Queries,” and throughout the rest of the book.

Listing 2-7 shows two functionally identical LINQ queries; the first query1 uses the query expression language syntax, and query2 uses pure extension method calls to compose the query. query1 makes use of anonymous types (line 7), Object Initialization (line 9 and 10), and implicitly typed local variables (line 5). The code shown in query2 is similar to how the compiler converts the query expression syntax into extension method calls, and free-form expressions throughout the query into lambda expressions. This step is transparent and automatic during compilation and is shown here to give you a glimpse into how LINQ query syntax is decomposed into more traditional code constructs. Lambda expressions can be seen where expressions were used (lines 15 and 16), and you can also see extension methods taking the place of the keywords where and select (lines 15 and 16).

Listing 2-7. Example query expression showing the new language constructs in unison

image

A full explanation of the query expression syntax is forthcoming in Chapter 3; for now, the goal is to demonstrate how the language features build the foundation for a solid query language system, as the name suggests—Language Integrated Query.

LINQ to Objects Five-Minute Overview

LINQ to Objects allows .NET developers to write “queries” over collections of objects. Microsoft provides a large set of query operators out of the box, and these operators offer a similar depth of functionality to what is expected from any SQL language working with a relational database.

Traditionally, working with collections of objects meant writing a lot of looping code using for loops or foreach loops to iterate through a collection, carrying out filtering using if statements, while performing other computations like keeping a running sum of a total property.

LINQ frees you from having to write looping code; it allows you to write queries that filter a list or calculate aggregate functions on elements in an in-memory collection, among many other capabilities. You can write queries against any collection type that implements an interface called IEnumerable<T>, which most built-in collection classes available in the .NET Framework class libraries certainly do, including simple arrays.

To understand the basic query syntax, Listings 2-8 and 2-9 demonstrate simple LINQ to Object queries, with their respective console output shown in Outputs 2-3 and 2-4.

Listing 2-8. Simple LINQ query over an array of integers—see Output 2-3

image

Output 2-3

image

Listing 2-9. Simple LINQ that calculates the sum of all values in an integer array—see Output 2-4

image

Output 2-4

image

LINQ to Objects extends any type that inherits from IEnumerable<T> (using extension methods as described earlier), introducing a set of query operations similar to those in an SQL language. All basic collection types and arrays built into the .NET Base Class Libraries implement the IEnumerable<T> interface, so with one set of extension methods, Microsoft has added query ability to all collections. Table 1-1 in the previous chapter listed the built-in standard Query operators. Each operator is covered in detail later in this book (Chapters 3 to 6). Most of the operators should be familiar if you have ever worked with a relational database, writing queries in SQL. One important distinction between writing SQL queries and LINQ queries is that the operator order is reversed. If you are used to Select-From-Where-OrderBy, it might take some time to overcome the muscle memory and move to From-Where-OrderBy-Select. The keyword order difference is for a good reason, and although initially the VB.Net team implemented LINQ in the SQL order, they moved to the C# keyword ordering after realizing the benefits of Intellisense support. Specifying the collection first (with the from clause) allows Visual Studio to pop up the public fields and properties in a very powerful tooltip when entering the where and select clauses, as shown in Figure 2-1. If select came first, no pop-up assistance can be offered because the editor won’t yet know what object type will be used throughout the query.

Figure 2-1. By specifying the from clause first, the editor can offer a field list when specifying the select clause. This would not be possible if the select clause came first.

image

To demonstrate some of LINQ’s query capabilities, the rest of this overview works with the sample data shown in Tables 2-1 and 2-2. The examples in this chapter and subsequent chapters query this data in various ways to explore LINQ to Objects in detail.

Table 2-1. Sample Contact Data Used Throughout the Rest of This Book

image

Table 2-2. Sample Call Log Data Used Throughout the Rest of This Book

image

The Contact and CallLog types used throughout this book are simple types declared in the following way:

image

The example shown in Listing 2-10 demonstrates how to retrieve a list of contacts who are less than 35 years of age, sorted in descending order by age. This query builds a list of formatted strings as the result set as shown in Output 2-5 (although any type concretely defined or anonymous can be returned from a select expression).

Listing 2-10. Query returning a list of formatted strings based on data from a query over a collection of Contacts records—see Output 2-5

image

Output 2-5

image

In addition to filtering items in a collection and projecting the results into a new form, LINQ offers the ability to group the collection items in any form you need. Listing 2-11 and Output 2-6 demonstrate the simplest form of grouping, using the group by construct to create a sub-collection of elements from the original collection.

Listing 2-11. Query groups Contact records into sub-collections based on their State—see Output 2-6

image

Output 2-6

image

A key aspect of accessing relational data is the concept of joining to related data. Relational database systems (such as Microsoft SQL Server) have powerful join capabilities built in to allow queries to be written against normalized data, which is a fancy term for “not repeating data” by separating data across multiple tables and linking by a common value. LINQ allows you to join multiple object collections together using syntax similar to SQL, as shown in Listing 2-12 and Output 2-7, which joins the call log data from Table 2-2 with the Contact data from Table 2-1. In this example, the telephone number is used as common matching criteria, and a temporary anonymous type is used to hold the combined result before writing to the console window.

Listing 2-12. Query joins two collections based on the phone number—see Output 2-7. Notice the use of the temporary anonymous type.

image

Output 2-7

image

To demonstrate the full power of LINQ to Objects query functionality, the query shown in Listing 2-13 summarizes the data from two in-memory collections by using joins, groups, and also aggregate operators in the select clause. The final output of this query is shown in Output 2-8.

Listing 2-13. Incoming call log summary shows filtering, ordering, grouping, joining, and selection using aggregate values—see Output 2-8

image

Output 2-8

image

Summary

This concludes our introduction to the language enhancements supporting LINQ and our five-minute overview of LINQ to Objects. The intention of this overview was to give you enough familiarity with LINQ to Objects and the language syntax to delve deeper into writing LINQ queries, which the following chapters encourage.

References

1. C# Version 3.0 Specification 3.0, September 2005. This draft version of the C# specification has advice on when to use Extension Methods, http://download.microsoft.com/download/9/5/0/9503e33e-fde6-4aed-b5d0-ffe749822f1b/csharp%203.0%20specification.doc.

2. C# Language Specification Version 3.0, Microsoft 2007, http://download.microsoft.com/download/3/8/8/388e7205-bc10-4226-b2a8-75351c669b09/CSharp%20Language%20Specification.doc.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.14.141.115