Chapter 1. Introducing LINQ

Goals of this chapter:

• Define “Language Integrated Query” (LINQ) and why it was built.

• Define the various components that make up LINQ.

• Demonstrate how LINQ improves existing code.

This chapter introduces LINQ—from Microsoft’s design goals to how it improves the code we write for data access-based applications. By the end of this chapter, you will understand why LINQ was built, what components makeup the LINQ family, and LINQ’s advantages over previous technologies. And you get a chance to see the LINQ syntax at work while reviewing some before and after code makeovers.

Although this book is primarily about LINQ to Objects, it is important to have an understanding of the full scope and goals of all LINQ technologies in order to make better design and coding decisions.

What Is LINQ?

Language Integrated Query, or LINQ for short (pronounced “link”), is a set of Microsoft .NET Framework language enhancements and libraries built by Microsoft to make working with data (for example, a collection of in-memory objects, rows from a database table, or elements in an XML file) simpler and more intuitive. LINQ provides a layer of programming abstraction between .NET languages and an ever-growing number of underlying data sources.

Why is this so inviting to developers? In general, although there are many existing programming interfaces to access and manipulate different sources of data, many of these interfaces use a specific language or syntax of their own. If applications access and manipulate data (as most do), LINQ allows developers to query data using similar C# (or Visual Basic.NET [VB.NET]) language syntax independent of the source of that data. This means that whereas today different languages are used when querying data from different sources (Transact-SQL for Microsoft SQL Server development, XPath or XQuery for XML data, and code nested for/if statements when querying in-memory collections), LINQ allows you to use C# (or VB.Net) in a consistent type-safe and compile-time syntax checked way.

One of Microsoft’s first public whitepapers on the LINQ technology, “LINQ Project Overview”1 authored by Don Box and Anders Hejlsberg, set the scene as to the problem the way they see it and how they planned to solve that problem with LINQ.

After two decades, the industry has reached a stable point in the evolution of object-oriented (OO) programming technologies. Programmers now take for granted features like classes, objects, and methods. In looking at the current and next generation of technologies, it has become apparent that the next big challenge in programming technology is to reduce the complexity of accessing and integrating information that is not natively defined using OO technology. The two most common sources of non-OO information are relational databases and XML.

Rather than add relational or XML-specific features to our programming languages and runtime, with the LINQ project we have taken a more general approach and are adding general purpose query facilities to the .NET Framework that apply to all sources of information, not just relational or XML data. This facility is called .NET Language Integrated Query (LINQ).

We use the term language integrated query to indicate that query is an integrated feature of the developer’s primary programming languages (e.g., C#, Visual Basic). Language integrated query allows query expressions to benefit from the rich metadata, compile-time syntax checking, static typing and IntelliSense that was previously available only to imperative code. Language integrated query also allows a single general-purpose declarative query facility to be applied to all in-memory information, not just information from external sources.

A single sentence pitch describing the principles of LINQ is simply: LINQ normalizes language and syntax for writing queries against many sources, allowing developers to avoid having to learn and master many different domain-specific languages (DSLs) and development environments to retrieve and manipulate data from different sources.

LINQ has simple goals on the surface, but it has massive impact on the way programs are written now and how they will be written in the future. A foundational piece of LINQ technology (although not directly used when executing LINQ to Object queries) is a feature that can turn C# and VB.Net code into a data-structure. This intermediate data-structure called an expression tree, although not covered in this book, allows code to be converted into a data structure that can be processed at runtime and be used to generate statements for a specific domain query language, such as pure SQL statements for example. This layer of abstraction between developer coding language, and a domain-specific query language and execution runtime, allows an almost limitless ability for LINQ to expand as new sources of data emerge or new ways to optimize access to existing data sources come into reality.

The (Almost) Current LINQ Story

The current LINQ family of technologies and concepts allows an extensible set of operators that work over structured data, independent of how that data is stored or retrieved. The generalized architecture of the technology also allows the LINQ concepts to be expanded to almost any data domain or technology.

The loosely coupled product names that form the marketed LINQ family can distract from the true story. Each specific flavor of LINQ carries out its own underlying query mechanism and features that often aren’t LINQ-specific, but they all eventually build and converge into a standard C# or VB.Net programming query interface for data—hence, these products get the LINQ moniker. The following list of Microsoft-specific products and technologies form the basis of what features currently constitute LINQ. This list doesn’t even begin to cover the community efforts contributing to the overall LINQ story and is intended to just broadly outline the current scope:

• LINQ Language Compiler Enhancements

• C# 3.0 and C# 4.0; New language constructs in C# to support writing queries (these often build on groundwork laid in C# 2.0, namely generics, iterators, and anonymous methods)

• VB.Net 9; New language constructs in VB.Net to support writing queries

• A mechanism for storing code as a data structure and a way to convert user code into this data structure (called an expression tree)

A mechanism for passing the data structure containing user code to a query implementation engine (like LINQ to SQL, which converts code expressions into Transact SQL, Microsoft SQL Server’s native language)

• LINQ to Objects

• A set of standard query operators for working with in-memory data (normally any collection implementing the IEnumerable<T> interface) using LINQ language syntax

• LINQ to XML

• A new API for creating, importing, and working with XML data

• A set of query operators for working with XML data using LINQ language syntax

• LINQ to Entities (part of the Entity Framework)

• A mechanism for connecting to any ADO.Net-enabled data source to support the Entity Framework features

• A set of query operators for querying any ADO.Net Entity Framework-enabled data source

• LINQ to SQL (Microsoft has chosen to focus on the LINQ to Entities API predominately going forward; this API will be maintained but not expanded in features with any vigor.)

• A set of query operators for working the SQL Server data using LINQ language syntax

• A mechanism that SQL data can be retrieved from SQL Server and represented as in-memory data

• An in-memory data change tracking mechanism to support adding, deleting, and updating records safely in a SQL database

• A class library for creating, deleting, and manipulating databases in SQL Server

• Parallel Extensions to .NET and Parallel LINQ (PLINQ)

• A library to assist in writing multi-threaded applications that utilize all processor cores available, called the Task Parallel Library (TPL)

• Implementations of the standard query operators that fully utilize concurrent operations across multiple cores, called Parallel LINQ

• LINQ to Datasets

• Query language over typed and untyped DataSets

• A mechanism for using LINQ in current DataSet-based applications without rewriting using LINQ to SQL

• A set of extensions to the DataRow and DataTable that allow to and from LINQ sequence support (for full details see http://msdn.microsoft.com/en-us/library/bb387004.aspx)

This list may be out of date and incomplete by the time you read this book. Microsoft has exposed many extension points, and both Microsoft and third parties are adding to the LINQ story all the time. These same extension points form the basis of Microsoft’s specific implementations; LINQ to SQL for instance is built upon the same interface that is available for any developer to extend upon. This openness ensures that the open-source community, Microsoft, and even its competitors have equal footing to embrace LINQ and its essence—the one query language to rule them all.

LINQ Code Makeover—Before and After Code Examples

The following examples demonstrate the approach to a coding problem both with and without using LINQ. These examples offer insight into how current coding practices are changed with the introduction of language-supported query constructs. The intention of these examples is to help you understand how LINQ will change the approach to working with data from different sources, and although you may not fully understand the LINQ syntax at this time, the following chapters cover this gap in understanding.

LINQ to Objects—Grouping and Sorting Contact Records

The first scenario to examine is one in which a set of customer records in a List<Contact> collection are grouped by their State (states ordered alphabetically), and each contact ordered alphabetically by the contact’s last name.

C# 2.0 Approach

Listing 1-1 shows the code required to sort and group an in-memory collection of the type Contact. It makes use of the new features of C# 2.0, being inline Delegates and Generic types. Its approach is to first sort the collection by the LastName property using a comparison delegate, and then it groups the collection by State property in a SortedDictionary collection.

Note

All of the code displayed in the listings in this book is available for download from http://hookedonlinq.com/LINQBook.ashx. The example application is fully self-contained and allows each example to be run and browsed while you read along with the book.

Listing 1-1. C# 2.0 code for grouping and sorting contact records—see Output 1-1

image

image

LINQ Approach

LINQ to Objects, the LINQ features designed to add query functionality over in-memory collections, makes this scenario very easy to implement. Although the syntax is foreign at the moment (all will be explained in subsequent chapters), the code in Listing 1-2 is much shorter, and the coding gymnastics of sorting and grouping far less extreme.

Listing 1-2. C# 3.0 LINQ to objects code for grouping and sorting contact records—see Output 1-1

image

The Result

The outputs for both solutions are identical and shown in Output 1-1. The advantages of using LINQ in this scenario are clearly seen in code readability and far less code. In the traditional pre-LINQ code, it was necessary to explicitly choose how data was sorted and grouped; there was substantial “how to do something” code. LINQ does away with the “how” code, requiring the minimalist “what to do” code.

Output 1-1. The console output for the code in Listings 1-1 and 1-2

image

LINQ to Objects—Summarizing Data from Two Collections and Writing XML

The second scenario to examine summarizes incoming calls from a List<CallLog> collection. The contact names for a given phone number is looked up by joining to a second collection of List<Contact>, which is sorted by last name and then first name. Each contact that has made at least one incoming call will be written to an XML document, including their number of calls, the total duration of those calls, and the average duration of the calls.

C# 2.0 Approach

Listing 1-3 shows the hefty code required to fulfill the aforementioned scenario. It starts by grouping incoming calls into a Dictionary keyed by the phone number. Contacts are sorted by last name, then first name, and this list is looped through writing out call statistics looked up by phone number from the groups created earlier. XML is written out using the XmlTextWriter class (in this case, to a string so that it can be written to the console), which creates a well structured, nicely indented XML file.

Listing 1-3. C# 2.0 code for summarizing data, joining to a second collection, and writing out XML—see Output 1-2

image

image

image

LINQ Approach

LINQ to Objects and the new XML programming interface included in C# 3.0 (LINQ to XML, but this example uses the generation side of this API rather than the query side) allows grouping, joining, and calculating the numerical average and sum into two statements. Listing 1-4 shows the LINQ code that performs the scenario described. LINQ excels at grouping and joining data, and when combined with the XML generation capabilities of LINQ to XML, it creates code that is far smaller in line count and more comprehensible in intention.

Listing 1-4. C# 3.0 LINQ to Objects code for summarizing data, joining to a second collection, and writing out XML—see Output 1-2

image

The Result

The outputs for both of these solutions are identical and shown in Output 1-2. The advantage of using LINQ syntax when working with data from multiple collections, grouping, and aggregating results and writing those to XML can clearly be seen given the reduction of code and the improved comprehensibility.

Output 1-2. The console output for the code in Listings 1-3 and 1-4

image

Benefits of LINQ

LINQ appeals to different people for different reasons. Some benefits might not be completely obvious with the current state of the many LINQ elements that have shipped. The extensibility designed into the LINQ libraries and compilers will ensure that LINQ will grow over time, remaining a current and important technology to understand for many years to come.

Single Query Language to Remember

This is the prime advantage LINQ offers developers day to day. Once you learn the set of Standard Query Operators that LINQ makes available in either C# or VB, only minor changes are required to access any LINQ-enabled data source.

Compile-Time Name and Type Checking

LINQ queries are fully name and type-checked at compile-time, reducing (or eliminating) runtime error surprises. Many domain languages like T-SQL embed the query text within string literals. These strings are beyond the compiler for checking, and errors are often only found at runtime (hopefully during testing). Many type errors and mistyped field names will now be found by the compiler and fixed at that time.

Easier to Read Code

The examples shown in this chapter show how code to carry out common tasks with data is simplified, even if unfamiliar with LINQ syntax at the moment. The removal of complex looping, sorting, grouping, and conditional code down to a single query statement means fewer logic errors and simpler debugging.

It is possible to misuse any programming language construct. LINQ queries offer far greater ability to write human- (and compiler-) comprehensible code when working with structured data sources if that is the author’s intention.

Over Fifty Standard Query Operators

The built-in set of Standard Query Operators make easy work of grouping, sorting, joining, aggregating, filtering, or selecting data. Table 1-1 lists the set of operators available in the .NET Framework 4 release (these operators are covered in upcoming chapters of this book; for now I just want to show you the range and depth of operators).

Table 1-1. Standard Query Operators in the .NET Framework 4 Release

image

Many of the standard Query operators are identical to those found in database query languages, which makes sense; if you were going to design what features a query language should have, looking at the current implementations that have been refined over 30 years is a good starting point. However, some of the operators introduce new approaches to working with data, simplifying what would have been complex traditional code into a single statement.

Open and Extensible Architecture

LINQ has been designed with extensibility in mind. Not only can new operators be added when a need arises, but entire new data sources can be added to the LINQ framework (caveat: operator implementation often needs to consider data source, and this can be complex—my point is that it’s possible, and for LINQ to Objects, actually pretty simple).

Not only are the LINQ extension points exposed, Microsoft had implemented their specific providers using these same extension points. This will ensure that any provider, whether it be from open-source community projects to competitive data-access platforms, will compete on a level playing field.

Expressing Code as Data

Although not completely relevant to the LINQ to Objects story at this time, the ability to express LINQ queries as a data-structure opens new opportunities as to how that query might be optimized and executed at runtime. Beyond the basic features of LINQ providers that turn your C# and VB.Net code into a specific domain query language, the full advantage of code built using data or changed at runtime hasn’t been fully leveraged at this time. One concept being explored by Microsoft is the ability to build and compile snippets of code at runtime; this code might be used to apply custom business rules, for instance. When code is represented as data, it can be checked and modified depending on its security implications or how well it might operate concurrently based on the actual environment that code is executed in (whether that be your laptop or a massive multi-core server).

Summary

Defining LINQ is a difficult task. LINQ is a conglomerate of loosely labeled technologies released in tandem with the .NET Framework 3.5 and further expanded in .NET Framework 4. The other complexity of answering the question of “What is LINQ?” is that it’s a moving target. LINQ is built using an open and extensible architecture, and new operators and data sources can be added by anyone.

One point is clear: LINQ will change the approach to writing data-driven applications. Code will be simpler, often faster, and easier to read. There is no inherent downside to using the LINQ features; it is simply the next installment of how the C# and VB.Net languages are being improved to support tomorrow’s coding challenges.

The next chapter looks more closely at how to construct basic LINQ queries in C#, a prerequisite to understanding the more advanced features covered in later chapters.

References

1. Box, Don and Hejlsberg, Anders. 2006. LINQ Project Overview, May. Downloaded from http://download.microsoft.com/download/5/8/6/5868081c-68aa-40de-9a45-a3803d8134b8/LINQ_Project_Overview.doc.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.97.187