Chapter 14. An Introduction to LINQ

In Chapter 12, "Storing Data in Collections," you learned how to create collections, from simple arrays to specialized collections such as HashTables and Lists, and how to iterate through them with loops to locate items. Typical collections contain objects, and you already know how to create and manipulate custom objects. In Chapter 13, "XML In Modern Programming," you learned how to serialize these collections into XML documents, as well as how to create XML documents from scratch. And in Part V of this book, you'll learn how to manipulate large amounts of data stored in databases.

Each data source provides its own technique for searching and manipulating individual items. What is common in all data sources are the operations we perform with the data; we want to be able to query the data, select the values we're interested in, and update the data sources by adding new data or editing the existing data. It's therefore reasonable to assume a common query language for all data sources. This common query language was introduced with version 3.5 of the Framework and is now part of all .NET languages. It's the LINQ component.

LINQ stands for Language Integrated Query, a small language for querying data sources. For all practical purposes, it's an extension to Visual Basic. More specifically, LINQ consists of statements that you can embed into a program to select items from a collection based on various criteria. Unlike a loop that examines each object's properties and either selects or rejects the object, it, LINQ is a declarative language: It allows you to specify the criteria, instead of specifying how to select the objects. A declarative language, as opposed to a procedural language, specifies the operation you want to perform, and not the steps to take. VB is a procedural language; the language of SQL Server, T-SQL, is a declarative language.

In this chapter, you'll learn how to do the following:

  • Perform simple LINQ queries.

  • Create and process XML files with LINQ to XML.

  • Process relational data with LINQ to SQL.

What Is LINQ?

Although defining LINQ is tricky, a simple example will demonstrate the structure of LINQ and what it can do for your application. Let's consider an array of integers:

Dim data() As Int16 = {3, 2, 5, 4, 6, 4, 12, 43, 45, 42, 65}

To select specific elements of this array, you'd write a For...Next loop, examine each element of the array, and either select it by storing it into a new array or ignore it. To select the elements that are numerically smaller than 10, you'd write a loop like the following:

Dim smallNumbers As New ArrayList
Dim itm As Integer
For Each itm In data
    If itm < 10 Then
        smallNumbers.Add(itm)
    End If
Next

Can you see why I'm using an ArrayList to store the selected values? I could have used another array, but arrays must be declared with dimensions, and I'd have to resize the array or start with an array large enough to store all the elements. Let's do the same with LINQ:

Dim smallNumbers = From n In data
                    Where n < 10
                    Select n

This is a peculiar statement indeed, unless you're familiar with SQL, in which case you can easily spot the similarities. LINQ, however, is not based on SQL, and not every operation has an equivalent in both. Both SQL and LINQ, however, are declarative languages that have many similarities. If you're familiar with SQL, you have already spotted the similarities and the fact that LINQ rearranges the basic elements. Whereas SQL can be used only with database tables, LINQ can be applied to collections, XML files, and SQL tables. It's a uniform language for querying data from several data sources.

Let's look at the LINQ query in greater detail:

Dim smallNumbers = From n In data
                    Where n < 10
                    Select n

Start with the structure where the selected elements will be stored; this structure will hold the result of the query. The smallNumbers variable is declared without a type, because its type is determined by the type of the collection you're querying. Type inference, which must be turned on for this statement to work, makes this possible. If type inference is off, you must declare the variable that will accept the result of the query.

Because the query selects elements from the data array, the smallNumbers variable can't represent anything but an array of integers. Actually, it's not exactly an array of integers; it's a typed collection of integers that implements the IEnumerable interface. If you hover the pointer over the name of the collection, you'll see that its type is IEnumerable(Of Short).

The LINQ query starts after the equals sign with the From keyword, which is followed by a variable that represents the current item in the collection, followed by the In keyword and the name of the collection. The variable is called the control variable, just like the variable of a For...Next or For Each loop, and it represents the entity you're working with. If the collection is an array of integers, the control variable is an integer; if the collection is a List of Customer objects, then the control variable is a Customer object. The first part of the query specifies the collection we're going to query.

Then comes the Where clause that limits the selection. The Where keyword is followed by an expression that involves the control variable; this expression limits your selection. In this extremely trivial example, I select the elements that are less than 10. The expression of the Where clause can become quite complicated, as you will see shortly. It's a VB-like expression that's evaluated for every element in the collection; it returns a True/False value; and it determines which elements will be selected, and which elements will be ignored.

The last keyword in the expression, the Select keyword, determines what you're selecting. In most cases, you will select the control variable itself but not always. Here's a variation of the previous query expression:

Dim evenValues = From n In data _
       Where m mod 2 = 0
       Select "Number " & n.ToString & " is even"

Here, I select the even numbers from the original array and then form a string for each of the selected values. This time, the result of the query (the evenValues variable) is an IEnumerable of strings, as determined by the compiler. The Select clause is optional; if you omit it, the query will return the control variable itself.

But why bother with a new component to select values from an array? A For Each loop that processes each item in the collection is not really complicated and is quite efficient. LINQ is a universal approach to querying data, and it can be used to query all kinds of data, from XML documents to databases. When you work with databases, for example, you can't afford to move all the data to the client, select a few of them from within a loop, and ignore the rest. A database table may contain thousands and thousands of rows, and you may end up selecting a few hundred of them. LINQ is smart enough to execute the appropriate statements against the database and return to the client only the data you're interested in. I'll discuss these aspects of LINQ and databases (LINQ to SQL) in much detail later in this chapter.

As I have mentioned in the introduction, LINQ isn't limited to arrays and collections. Although iterating through an array of integers, or even objects, may be trivial, the same isn't true for XML documents. LINQ allows you to select the elements of an XML document you're interested in without having to navigate through the nodes of the document, as we did in the previous chapter. At any rate, LINQ is the latest and hottest technology from Redmond—it may very well become a universal language, just like SQL. It's a very powerful and flexible tool that can be used in many diverse situations. It's certainly much more flexible than straight VB code, because it's a declarative language. Remember that with LINQ, you specify what you want to do, not how the computer will do it.

LINQ Components

To support such a wide range of data sources, LINQ is comprised of multiple components, which are the following:

LINQ to XML

This component enables you to create and process XML documents. In effect, it replaces XQuery expressions that are used today to select the items of interest in an XML document. It also replaces the Where method of the Elements and Descendants properties. In effect, the compiler calls these methods behind the scenes, and LINQ is only "syntactic sugar" that hides much of the complexity of querying collections with extension methods. Because of LINQ to XML, some new classes that support XML were introduced to Visual Basic, and XML has become a basic data type of the language, as discussed in the preceding chapter. In this chapter, you'll learn how to use LINQ to XML to create, query, and transform XML files.

LINQ to Objects

This component enables you to search any collection of built-in or custom objects. If you have a collection of Color objects, for example, you can select the colors with a brightness of 0.5 or more via the following expression:

Dim colors() As Color = {Color.White, _
                         Color.LightYellow, Color.Cornsilk, _
                         Color.Linen, Color.Blue, Color.Violet}
Dim brightColors = From c In colors _
                   Where c.GetBrightness > 0.5

Likewise, you can select the rectangles with a minimum or maximum area by using a query like the following:

Dim rects() As Rectangle = _
            {New Rectangle(0, 0, 100, 120), _
             New Rectangle(10, 10, 6, 8)}
Dim query = From R In rects _
            Where R.Width * R.Height > 100

In addition to querying data, you can also group related data together, calculate aggregates, and even transform the original data (creating an HTML or XML file with the data of a collection is a trivial process, as you will see later in this chapter).

LINQ to SQL

This component enables you to query relational data by using LINQ rather than SQL. If you're familiar with SQL, you recognize the following statement that retrieves the products in a specific category of the Northwind database:

SELECT * FROM Products
WHERE Products.CategoryID = 2

To execute this statement against the Northwind database, however, you must set up a Connection object to connect to the database, a Command object to execute the command, and a DataReader object to read the results returned by the query. (Don't worry if you're not familiar with SQL and databases, which are discussed later in this chapter.) With LINQ to SQL, you can set up a DataContext object, which is roughly equivalent to the Connection object, and then use it to execute LINQ queries against the database.

customers = From prod In db.GetTable(Of Product)()
            Where prod.CategoryID = 2
            Select New Product With _
                        {.ID = prod.ProductID,
                         .Name = prod.ProductName,
                         .Price = prod.UnitPrice}

In the preceding statement, db is a properly initialized DataContext object. This query returns a collection of an anonymous type with three properties (ID, Name, and Price). This is a trivial example, meant to demonstrate how you can access diverse data sources and query them in a consistent way with the basic LINQ syntax; the details of LINQ to SQL are discussed in detail later in this chapter. The point I'm trying to make here is that LINQ allows you to work against database tables as if they were collections of typed objects. The same query would have worked with a collection of Product objects. All you have to do is change the source of the data by replacing the expression db.GetTable(Of Product) with the name of the collection.

LINQ to DataSet

This component is similar to LINQ to SQL, in the sense that they both query relational data. The LINQ to DataSet component allows you query data that have already been stored in a DataSet at the client. DataSets are discussed in detail later in this book, but I won't discuss the LINQ to DataSet component, because the DataSet is an extremely rich object and quite functional on its own.

LINQ to Entities

This is similar to the LINQ to Objects component, only the objects are based on relational data. Entities are discussed in the last part of this book, and they're classes that reflect the tables in a database. These classes are generated automatically, and they handle the basic data operations: querying tables and updating, deleting or inserting rows in the tables. The LINQ to Entities component is discussed in detail in Chapter 17, "Using the Data Entity Model," of this book. Actually, LINQ is one of the empowering technologies for data entities.

New flavors of LINQ keep coming up—you may end up designing your own flavor of LINQ to support your custom data so that other developers can use them with standard tools. There's already a LINQ to Amazon component that allows you to query the books at Amazon.com using LINQ. The essence of LINQ is that it allows developers to query widely differently data sources using the same syntax. You can even query database using LINQ.

LINQ to Objects

This section focuses on querying collections of objects. Although you can query many types of collections, including arrays, in this section I'll focus on List collections to promote the use of strongly typed data and to show you how nicely typed data integrate with LINQ. As you can guess, the most interesting application of LINQ to Objects is to select items from a collection of custom objects. Let's create a custom class to represent products:

Public Class Product
Public Property ProductID() As String
Public Property ProductName() As String
Public Property ProductPrice() As Decimal
Public Property ProductExpDate() As Date
End Class

I'm not showing the implementation of various properties; let's work with auto-implemented properties for now. This class definition may appear in a class file or in a form outside any procedure. The Products collection is a List(Of Product) collection, and it's populated with statements like the following:

Dim Products As New List(Of Product)
Dim P As Product
P = New Product
P.ProductID = "10A-Y"
P.ProductName = "Product 1"
P.ProductPrice = 21.45
P.ProductExpDate = #8/1/2009#
Products.add(P)

Assuming that you have added a few more custom objects to your list, you can use LINQ to query your collection of products based on any property (or combination of properties) of its items. To find the products that cost more than $20 and are expired already, you can formulate the following query:

Dim query = From prod In products
            Where prod.ProductPrice < 20
                And Year(prod.ProductExpDate) < 2010
            Select prod

The result of the query is also a List collection, and it contains the products that meet the specified criteria. To iterate through the selected items and display them in a TextBox control, we use a For...Each loop, as shown next:

For Each prod In query
    TextBox1.AppendText(prod.ProductID & vbTab &
        prod.ProductName & vbTab &
        prod.ProductPrice.ToString("##0.00") & vbTab &
        prod.ProductExpDate.ToShortDateString & vbCrLf)
Next

Another component of a LINQ expression is the Order By clause, which determines how the objects will be ordered in the output list. To sort the output of the preceding example in descending order, append the following Order By clause to the expression:

Dim query = From prod In products
            Where prod.ProductPrice < 20
                And Year(prod.ProductExpDate) < 2010
             Select prod
             Order By prod.ProductName

Anonymous Types and Extension Methods

LINQ is based on three pillars that are not related to each other—they aren't even part of LINQ. These pillars are anonymous types, extension methods, and object and collection initializers. You're already familiar with all three of them. Object initializers and anonymous types were presented in Chapter 8, "Working with Objects." Extension methods were introduced in Chapter 13, "XML in Modern Programming." I'll now review these topics and how they're combined in LINQ.

Anonymous types allow you to create typed objects on the fly; this is what the Select keyword of LINQ does. It allows you to create arbitrary objects and form collections of anonymous types. The compiler knows their properties (it actually generates classes that describe these arbitrary objects on the fly) and incorporates them in IntelliSense. Object initializers allow you to create anonymous types using the With keyword right in the Select clause of a LINQ query. You may be querying a collection of Product objects but are only interested in their IDs and prices. Instead of selecting the control variable that contains all the fields of each product, you can create a new anonymous type on the fly with an expression like the following:

Select New With {.ProductName = prod.Name, .ProductPrice = prod.UnitPrice}

As you will see in the following sections, it's quite common to create anonymous types with selected fields on the fly in LINQ queries, because we rarely need all the fields of the objects we're querying. If you're familiar with SQL, you already know that it's a bad practice to select all fields from one or more tables. They are called anonymous because they have no name you can use to create instances of these types; other than that, they're regular types with strongly typed properties.

Extension methods are methods that you implement for specific types without seeing the code of the original type. The Array class, for example, was extended by the addition of several such methods, including the Count, Skip, and Take methods. These methods are not actual members of the array; they're extension methods that apply to all collections that implement the IEnumerable interface.

As you recall from the previous chapter, the collections expose the Where method, which accepts a lambda expression as an argument to select specific elements. The syntax of the Where method, however, is quite awkward. The Where keyword of LINQ does the same, but it does so in a more elegant and intuitive manner. In effect, the compiler translates the LINQ queries into expressions with extension methods.

As you recall from the previous chapter, lambda expressions are what make extension methods so powerful. Let's look at a few more interesting extension methods that make collections, including arrays, extremely flexible and powerful. The Sum method returns the sum of the elements in a collection that holds numeric data. This is a pretty limited aggregate operator, however. What if you wanted the sum of the squares or another metric of the collection's members? An overloaded form of the Sum method accepts a lambda expression as an argument; this allows you to specify your own aggregate. To calculate the sum of the squares, for example, use the following expression:

Dim numbers = {1.01, 31.93, 8.12, 5.05, 21.81, 17.33, 2.45}
Dim sumOfSquares = numbers.Sum(Function(n) n ˆ 2)

This returns the sum of the squares in the numbers array. If you wanted to calculate the sum of the logarithms, you'd use a lambda function like this: Function(n) Math.Log(n).

To calculate averages instead of sums, use the Average extension method. The average of the squares of the numeric values is as follows:

Dim averageOfSquares = numbers.Average(Function(n) n ˆ 2)

Experiment with the extension methods of the Array class, or any collection you wish. The extension methods are shown in the IntelliSense box with the same icon as the regular method and a little arrow pointing down.

Querying Arbitrary Collections

As I mentioned already, you can apply LINQ to all classes that implement the IEnumerable interface. Many methods of the Framework return their results as a collection that implements the IEnumerable interface. As discussed in the tutorial "Accessing Folders and Files," (available for download from www.sybex.com/go/masteringvb2010), the GetFiles method of the IO.Directory class retrieves the files of a specific folder and returns them as a collection of strings:

Dim files = Directory.GetFiles("C:")

I'm assuming that you have turned on type inference for this project (it's on by default), so I'm not declaring the type of the files collection. If you hover over the files keyword, you'll see that its type is String()—an array of strings. This is the GetFiles method's return type, so you need not declare the files variable with the same type. The variable type is inferred from its value.

To find out the properties of each file, create a new FileInfo object for each file, and then examine the values of the FileInfo object's properties. To create an instance of the FileInfo class that represents a file, you'd use the following statement:

Dim FI As New FileInfo(file_name)

(As a reminder, the FileInfo class as well as the DirectoryInfo class belong to the IO namespace. You must either import the namespace into the current project or prefix the class names with the IO namespace: IO.FileInfo.) The value of the FI variable can now be used in the Where clause of the expression to specify a filter for the query:

Dim smallFiles = _
             From file In Directory.GetFiles("C:")
             Where New FileInfo(file).Length < 10000
             Order By file
             Select file

The file control variable is local to the query, and you cannot access it from the rest of the code. You can actually create a new file variable in the loop that iterates through the selected files, as shown in the following code segment:

For Each file In smallFiles
    Debug.WriteLine(file)
Next

The selection part of the query is not limited to the same variable as specified in the From clause. To select the name of the qualifying files, instead of their paths, use the following selection clause:

Select New FileInfo(file).Name

The smallFiles variable should still be an array of strings, right? Not quite. This time, if you hover the pointer over the name of the smallFiles variable, you'll see that its type is IEnumerable(Of String). And this makes sense, because the result of the query is not of the same type as its source. This time we created a new string for each of the selected items, so smallFiles is an IEnumerable of strings. Let's select each file's name and size with the following query:

Dim smallFiles = _
         From file In Directory.GetFiles("C:")
         Where New FileInfo(file).Length > 10000
         Select New FileInfo(file).Name,
                New FileInfo(file).Length

This time, smallFiles is of the IEnumerable(Of <anonymous type>) type. The anonymous type isn't really anonymous; the compiler generates a class to represent the anonymous type internally and gives it a name like VB_AnonymousType_1'3 or something—and yes, the single quote is part of the name, not a typo. However, you can't use this class to declare new objects. Anonymous types are very convenient (like auto-implemented properties, in a way), but they can't be used outside their scope (the module in which they were created).

Because I selected the two properties of interest in the query (the file's name and size), I can display them with the following loop:

For Each file In smallFiles
    Debug.WriteLine(file.Name & vbTab & file.Length.ToString)
Next

As soon as you type in the name of the file variable and the following period, you will see the Name and Length properties of the anonymous type in the IntelliSense box (proof that the compiler has generated a new type on the fly).

The properties of the new type are named after the items specified in the Select clause, and they have the same type. You can also control the names of the properties of the anonymous type with the following syntax:

Select New With {.FileName = New FileInfo(file).Name,
                 .FileSize = New FileInfo(file).Length}

This time you select a new object, and a new variable is created on the fly and has two properties named FileName and FileSize. The values of the two properties are specified as usual. Note that, although you can specify the names of the properties of the anonymous type, you can't specify the anonymous type's name. To display each selected file's name and size, modify the For...Each loop as follows:

For Each file In smallFiles
    Debug.WriteLine(file.FileName & vbTab &
                    file.FileSize.ToString)
Next

As you can see, LINQ is not a trivial substitute for a loop that examines the properties of the collection items; it's a powerful and expressive syntax for querying data in your code that creates data types on the fly and exposes them in your code.

You can also limit the selection by applying the Where method directly to the collection:

Dim smallFiles = Directory.GetFiles("C:").Where (
                         Function(file) (New FileInfo(file).Length > 10000))

The functions you specify in certain extended methods are called lambda functions, and they're declared either inline, if they're single-line functions, or as delegates.

Let me explain how the Where extension method of the previous sample code segment works. The Where method should be followed by an expression that evaluates to a True/False value—the lambda function. First specify the signature of the function; in our case, the function accepts a single argument, which is the current item in the collection. Obviously, the Where clause will be evaluated for each item in the collection, and for each item, the function will accept a different object as an argument. In the following section, you'll see lambda functions that accept two arguments. The name of the argument can be anything; it's a name that you will use in the definition of the function to access the current collection item. Then comes the definition of the function, which is the expression that compares the current file's size to 10,000 bytes. If the size exceeds 10,000 bytes, the function will return True—otherwise, False.

In this example, the lambda function is implemented in-line. To implement more complicated logic, you can write a function and pass the address of this function to the Where clause. Let's consider that the function implementing the filtering is the following:

Private Function IsLargeTIFFFile(ByVal fileName As String) As Boolean
    Dim file As FileInfo
    file = New FileInfo(fileName)
    If file.Length > 10000 And file.Extension.ToUpper = ".TIF" Then
            Return True
        Else
            Return False
        End If
    End Function

To call this function from within a LINQ expression, use the following syntax:

Dim largeImages =
        Directory.GetFiles("C:").Where(AddressOf IsLargeTIFFFile)
    MsgBox(smallFiles.Count)

Aggregating with LINQ

Another very important aspect of LINQ is that it allows you to query for aggregates. More often than not, you're not interested in the qualifying rows in a database (or objects in a collection) but rather in their aggregates. In many scenarios, the orders placed by each customer are not nearly as important as the total monthly revenue or the average of the orders in a given period. Aggregate data can be easily compared and used to allow managers to adjust their marketing or selling strategies. Likewise, the total cost of discontinued items is a useful metric, as opposed to a list of discontinued items and prices, and so on. Now that you can write LINQ queries to select objects based on any criteria, you can certainly iterate through the selected objects and calculate all the metrics you're interested in. LINQ can do better than that; it allows you to select the desired aggregates, rather than individual objects. And it's much faster to let LINQ perform the calculations, rather than doing it yourself by looping through the data. If the data resides in a database, LINQ can calculate the aggregates over millions of records, without moving all the qualifying rows from the database to the client computer and processing them there.

By default, LINQ adds a few extension methods for calculating aggregates to any collection that supports the IEnumerable interface. Let's return to the array of integers, the data array. To calculate the count of all values, call the Count method of the data array. The count of elements in the data array is given with the following expression:

Dim count = data.Count

In addition to the Count method, any LINQ-capable class exposes the Sum method, which sums the values of a specific element or attribute in the collection. To calculate the sum of the selected values from the data array, use the following LINQ expression:

Dim sum = From n data
          Where n > 10
          Select n.Sum

You can also calculate arbitrary aggregates by using the Aggregate method, which accepts a lambda expression as an argument. This expression, in turn, accepts two arguments: the current value and the aggregate. The implementation of the function, which is usually a single-liner, calculates the aggregate. Let's consider a lambda expression that calculates the sum of the squares over a sequence of numeric values. The declaration of the function is as follows:

Function(aggregate, value)

Its implementation is shown here. (This is just the code that increases the aggregate at each step; no function declaration is needed.)

aggregate + value ˆ 2

Note that the preceding expression evaluates the aggregate by adding a new value (the square of the current number in the collection) to the aggregate; it doesn't actually update the aggregate. In other words, you can't use a statement like this one (as intuitive as it may look):

aggregate += value ˆ 2

To calculate the sum of the squares of all items in the data array, use the following LINQ expression:

Dim sumSquares = data.Aggregate(
                         Function(sumSquare As Long, n As Integer)
                                  sumSquare + n ˆ 2

The single statement that implements the aggregate adds the square of the current element to the sumSquare argument. When you're done, the sumSquare variable holds the sum of the squares of the array's elements. If you hover the pointer over the sumSquare variable, you'll see that it is an Integer or Double type, depending on the type of the values in the collection you're aggregating. In short, the lambda expression for evaluating aggregates accepts two arguments: the aggregate and the current value. The code is an expression that updates the aggregate, and this is the function's return value, which is assigned to the aggregate.

Aggregates are not limited to numeric values. Here's an interesting example of a LINQ expression that reverses the words in a sentence. The code starts by splitting the sentence into words, which are returned in an array of strings. Then, it calls the Aggregate method, passing a lambda expression as an argument. This expression is a function that prefixes the aggregate (a new string with the words in reverse order) with the current word:

Dim sentence =
    "The quick brown fox jumped over the lazy dog"
Dim reverseSentence =
           sentence.Split(" "c).Aggregate(
           Function(newSentence, word) (
           word & " " & newSentence))

This code segment is a bit complicated and calls for an explanation. It starts by creating an array of strings: The words in the sentence are extracted by the Split method into an array of strings. Then we apply the Aggregate extension method to the array. The Aggregate method accepts the lambda expression as an argument, which creates a new string, the newSentence string, by inserting each new word at the beginning of the sentence. If you were appending the words to the newSentence variable, you'd be re-creating the original string from its pieces; here, I reversed the order of the words.

The lambda expression you pass to the Aggregate method is a function that accepts two arguments: the control variable and the aggregate variable. The function is executed for each item in the collection, and the value of the current item is the control variable (in our case, each consecutive word in the sentence). The aggregate variable newSentence is the variable in which you aggregate the desired quantity. At the end of the loop, the newSentence variable contains the words extracted from the string but in reverse order.

The following are a few more interesting extension methods:

Take (N), TakeWhile (Expression)

This method selects a number of elements from the collection. The first form selects the first n elements, while the second form continues selecting elements from the collection while the expression is True. To select values while they're smaller than 10, use the following lambda expression:

Function(n) n < 10

This expression selects values until it finds one that exceeds 10. The selection stops there, regardless of whether some of the following elements drop below 10.

Skip and SkipWhile

The Skip and SkipWhile methods are equivalent to the Take and TakeWhile methods: They skip a number of items and select the remaining ones. You can use the Skip and Take methods in tandem to implement paging. Assuming that pageSize is the number of items you want to display at once and page is the number of the current page, you can retrieve any page's data with a statement like the following:

Data.Skip((pageSize – 1) * page).Take(pageSize)
Distinct

The Distinct method, finally, returns the distinct values in the collection:

Dim uniqueValues = data.Distinct

Some Practical LINQ Examples

At this point, you can create an array of simple objects to explore some of the more advanced features of LINQ to Objects. The sample code presented in this section comes from the IEnumerables project (available for download from www.sybex.com/go/masteringvb2010), whose interface is shown in Figure 14.1. Each button generates some output, which is displayed on the TextBox that takes up most of the form.

The IEnumerables sample application

Figure 14.1. The IEnumerables sample application

The following array contains a number of objects representing people, but they're anonymous types (there's no collection that implements the specific type). In the IEnumerables project, the ArrayOfPersons array is declared outside any procedure so that multiple event handlers can access it.

Dim ArrayOfPersons() = {New With {.SSN = "555-00-9001",
                                  .First = "Peter", .Last = "Evans", .Age = 27},
                        New With {.SSN = "555-00-9002", .
                                  First = "James", .Last = "Dobs", .Age = 42},
                        New With {.SSN = "555-00-9003", .First = "Richard",
                                  .Last = "Manning", .Age = 33},
                        New With {.SSN = "555-00-9004",
                                  .First = "Rob", .Last = "Johnson", .Age = 52},
                        New With {.SSN = "555-00-9005",
                                  .First = "Peter", .Last = "Smith", .Age = 38},
                        New With {.SSN = "555-00-9006",
                                  .First = "John", .Last = "Tall", .Age = 22},
                        New With {.SSN = "555-00-9007",
                                  .First = "Mike", .Last = "Johnson", .Age = 33},
                        New With {.SSN = "555-00-9008",
                                  .First = "Peter", .Last = "Larson", .Age = 43},
                        New With {.SSN = "555-00-9009",
                                  .First = "James", .Last = "Smith", .Age = 37}}

The basic age statistics for the array (minimum, maximum, and average values) can be obtained with the following expressions:

ArrayOfPersons.Average(Function(p) p.Age)
ArrayOfPersons.Min(Function(p) p.Age)
ArrayOfPersons.Max(Function(p) p.Age)

There's no need to iterate through the array's elements; the Average, Min, and Max extension methods return the desired aggregates. To select items from the array based on some criteria, use the Where method. The following queries extract the young and not-so-young people in the array (the threshold is 30 years):

Dim youngPersons = From p In ArrayOfPersons Where p.Age < 30 Select p
Dim notSoYoungPersons = From p In ArrayOfPersons Where p.Age >= 30 Select p

To combine two IEnumerable collections, use the Concat method, which is applied to a collection and accepts as an argument another collection:

Dim allPersons = youngPersons.Concat(notSoYoungPersons)

If you don't care about the names and other data for the "not-so-young" people, you can calculate the average age in this age group with the following query:

Dim youngPersonsAvgAge =
        (From p In ArrayOfPersons Where p.Age < 30 Select p.Age).Average

Pretty simple. First determine the segment of the population you want to aggregate over, and then call the Average method. Another, less obvious and certainly less recommended approach is to calculate the sum of the ages and their count and divide the two values:

Dim youPersonsAvgAge2 =
            (From p In ArrayOfPersons Where p.Age < 30 Select p.Age).Sum /
            (From p In ArrayOfPersons Where p.Age < 30).Count

Aggregates are the most useful operations you can perform on a collection, but you'll rarely aggregate over the entire collection. The average age of an entire population or the purchases of an entire population often are not as useful as the aggregates of specific segments of a population. In the preceding example, I created a second collection made up only of the people we were interested in and then applied the aggregation operators over that entire population. Let's do the same without an intermediate collection.

Grouping and Aggregating

To demonstrate the grouping operator of LINQ, let's break down the population according to age. Specifically, let's break the population into segments by decade (people in their 20s, 30s, and so on). Breaking a set of data into groups of related items is known as grouping, and it's supported by the GroupBy extension method. The GroupBy extension method accepts as an argument a lambda expression, which indicates the various groups. To group ages according to their decade, you'd use an expression like the following:

Math.Floor(Age / 10) * 10

Consider the ages 41 and 49. The Floor function returns the largest integer that doesn't exceed its argument, and in this case it will return 4 for both values (4.1 and 4.9). Then, multiply this integer value by 10 to obtain the actual decade (40). Once you have come up with an expression to group people according to their age, you can write the query to retrieve the groups:

Dim grouped = ArrayOfPersons.GroupBy(Function(p) Math.Floor(p.Age / 10) * 10)

If you hover the cursor over the grouped variable, you will see that its type is IEnumerable. But what does it contain? The exact type of the return value is IEnumerable(Of Double, anonymous_type). The return value of the GroupBy method is an IEnumerable collection with the keys of the groups (the age decade) and the actual items. As you can see, the compiler has figured out that the key is a Double value (the age), and the actual object is of an anonymous type—the exact same type as the elements of the array.

The following loop prints the names of the groups and their count (number of people in that age decade):

For Each grp In grouped
     TextBox1.AppendText("Group: " & grp.Key &
                         " contains " & grp.Count.ToString & " persons" & vbCrLf)
Next

Each item in the collection has a Key property (the value according to which the items are grouped) and a Count property. The actual value of the grp variable is a collection of Person objects. To print the people in each age group, write a loop like the following:

TextBox1.AppendText("GROUPS AND THEIR MEMBERS" & vbCrLf)
For Each group In grouped
    TextBox1.AppendText("GROUP OF " & group.Key.ToString & vbCrLf)
    TextBox1.AppendText("Group Members:" & vbCrLf)
    For Each person In group.OrderBy(Function(p) p.Age)
        TextBox1.AppendText(vbTab & person.First & " " &
                            person.Last & " : " & person.Age.ToString & vbCrLf)
    Next
Next

The outer loop goes through each group. At this level, you access the Key property and use it as a header. The inner loop goes through items in the current group. The preceding statements will generate the following output:

GROUP OF 20
Group Members:
    John Tall : 22
    Peter Evans : 27
GROUP OF 30
Group Members:
    Richard Manning : 33
    Mike Johnson : 33
    James Smith : 37
    Peter Smith : 38
GROUP OF 40
Group Members:
    James Dobs : 42
    Peter Larson : 43
GROUP OF 50
Group Members:
    Rob Johnson : 52

How about calculating aggregates on each group? To perform aggregates on the groups, you must create another collection of anonymous types, just like the original, with an extra field that groups the items together. The following statement generates a new collection of objects that includes each person's age group (the ageGroup property):

Dim tmpGroups = From person In ArrayOfPersons
                Select New With {.SSN = person.SSN, .First = person.First,
                                 .Last = person.Last, .Age = person.Age,
                                 .ageGroup = Math.Floor(person.Age / 10) * 10}

Now, you can calculate the average age in each group. The following query goes through each group and calculates the average age in each group. To make the query a little more interesting, I've added the number of members in each group:

Dim ageGroups = From P In tmpGroups
                Order By P.ageGroup
                Group By P.ageGroup Into G = Group,
                   AgeGroupCount = Count(),
                   ageGroupAverage =  Average(System.Convert.ToDecimal(P.Age))

This query groups the members of the tmpGroups collection into groups based on the ageGroup property and then calculates the count of the members and the average age in each group. If you iterate through the members of the collection and print the age group and the two statistics, you'll get something like this:

AGE GROUP:  20    COUNT:  2   AVERAGE AGE IN GROUP: 24.50
AGE GROUP:  30    COUNT:  4   AVERAGE AGE IN GROUP: 35.25
AGE GROUP:  40    COUNT:  2   AVERAGE AGE IN GROUP: 42.50
AGE GROUP:  50    COUNT:  1   AVERAGE AGE IN GROUP: 52.00

If you're wondering why you had to create an intermediate collection to add the grouping field instead of grouping by the expression Math.Floor(P.Age / 10) * 10, the answer is that you can't use functions or methods in a group's control variable. Some readers may have noticed that the collection should hold birth dates, not ages. Try to perform the same queries after replacing the Age property with a BirthDate property.

You have seen quite a few nontrivial LINQ queries, and you should have a good idea about LINQ and how flexible it is. Writing efficient queries will take a bit of effort, and you should explore on your own as many sample queries as you can. A good starting point is the 101 LINQ Samples page at http://msdn.microsoft.com/en-us/vbasic/bb688088.aspx (a great resource on LINQ by Microsoft). You won't find any explanations on the queries (not as of this writing, at least), but try to understand as many of the queries as you can. Personally, I find it very convenient to use LINQ in my code whenever possible, even with anonymous types.

Keep in mind that LINQ is a declarative language: It tells the compiler what to do, not how to do it. And that's what makes it very flexible. LINQ statements are executed in a single step, and you can't place a breakpoint inside a LINQ query to follow its execution. You just try to write a syntactically correct query and then test the results to make sure you got what you aimed for from the collection.

Transforming Objects with LINQ

In this section, you'll learn how to use LINQ to transform collections of objects. I'll start with a collection of custom objects and transform them into objects with a different structure. You'll also see how easy it is to create HTML or XML files to describe the same collection using LINQ.

In the examples so far, I used anonymous types, which are quite common and convenient. In large applications, however, where the same data may be reused in several parts of the application or by many developers, it's strongly recommended that you work with typed data. Let's start with a class that represents books, the Books class. It's a simple but not trivial class, because its definition contains two custom types and an array. Author information is stored in an array of Author objects. Each book's price and stocking information is stored in two custom objects; they're the Price and Stock properties. The Price property can have multiple values (retail and wholesale prices or any other price that suits you), and the Stock property is an object with two properties, the InStock and OnOrder properties. Here's the definition of the Books class:

Public Class Books
    Public Class Book
        Public Property ISBN As String
        Public Property Title As String
        Public Property Publisher As String
        Public Property Authors As Author()
        Public Property Price As Prices
        Public Property Stock As Stock
    End Class

    Public Class Prices
        Private _RetailPrice? As Decimal
        Private _StorePrice? As Decimal
        Private _OnlinePrice? As Decimal

        Public Property RetailPrice As Decimal
        ...
        End Property

        Public Property StorePrice As Decimal
        ...
        End Property

        Public Property OnlinePrice As Decimal
        ...
        End Property

    End Class

    Public Class Stock
        Private _OnOrder? As Integer
        Public Property InStock As Integer

        Public Property OnOrder As Integer
        ...
        End Property
    End Class

    Public Class Author
Public Property FirstName As String
        Public Property LastName As String
    End Class
End Class

Some of properties are auto-implemented, while others are implemented with the usual setters and getters. The latter includes some business logic, such that prices can't be negative, but I'm not showing the equivalent code (which you can find in the IEnumerables project).

Create a few objects of the Book type and add them to a List collection. I'm using object initializers in the code, because that's the most compact notation, but you can create individual Book objects and add them to the collection if you want. Listing 14.1 shows the statements that initialize the allBooks collection.

Example 14.1. Initializing a list of Book objects

Dim allBooks As New List(Of Books.Book)
        allBooks.Add(New Books.Book With
                    {.ISBN = "1000000100", .Title = "Book 100",
                     .Publisher = "Wiley",
                     .Stock = New Books.Stock With
                              {.InStock = 14, .OnOrder = 20},
                     .Authors = {New Books.Author With
                                {.FirstName = "Richard",
                                 .LastName = "Dobson"}},
                     .Price = New Books.Prices With
                                {.StorePrice = 12.95,
                                 .RetailPrice = 15.95}})

        allBooks.Add(New Books.Book With
                    {.ISBN = "1000000101", .Title = "Book 101",
                     .Publisher = "Sybex",
                     .Stock = New Books.Stock With
                              {.InStock = 8, .OnOrder = 25},
                     .Authors = {New Books.Author With
                                {.FirstName = "Bob",
                                 .LastName = "Smith"}},
                     .Price = New Books.Prices With
                               {.StorePrice = 24.95,
                                .RetailPrice = 29.99}})

Let's start by transforming the list of books into an HTML document, like the one shown in Figure 14.2. We'll write a LINQ query as usual, but we'll embed it into an XMLElement variable. I'll start with the query that transforms the data and then explain it in detail. Listing 14.2 shows the code that creates the HTML document that produced the page shown in Figure 14.2.

Example 14.2. Transforming a collection into an XML segment

Dim html = <html>
               <h2>List of Books</h2>
               <table border="all">
                   <tr>
                       <td margin="10" width="90px">ISBN</td>
                       <td margin="10" width="280px">Title</td>
                       <td margin="10" width="140px">Author(s)</td>
                       <td margin="10" width="70px">Price</td>
                   </tr>
                   <%= From bk As Books.Book In allBooks
                       Select <tr>
                                  <td><%= bk.ISBN %></td>
                                  <td><%= bk.Title %></td>
                                  <td>
                                  <table>
                       <%= From au As Books.Author In bk.Authors
                           Order By au.LastName, au.FirstName
                           Select <tr><td>
                           <%= au.LastName & ", " &
                               au.FirstName %>
                           </td></tr> %>
                           </table>
                       </td>
                           <td align="right" valign="middle">
                           <%= bk.Price.RetailPrice.ToString %></td>
                   </tr>
                   %>
               </table>
           </html>
A collection of simple objects rendered as an HTML document

Figure 14.2. A collection of simple objects rendered as an HTML document

The code is really simple. It starts by generating an XML document and embeds the actual data in expression holes, which are delimited with the <%= and %> tags. The html variable is an XML document, and the first expression hole is a LINQ query. In the Select clause of the query, I created another XML document, which in turn contains expression holes. If you find this code segment complicated, please refer to the material in Chapter 12, where I discuss in detail how to build XML documents with expression holes. To read an expression like the one that defines the html variable, keep in mind the following simple rules:

  • Every time you see the opening angle bracket, you switch into XML mode.

  • Every time you see the <% tag, you switch into VB mode.

Note that while in VB mode, you can switch to XML mode without the closing %> tag; all it takes is another opening bracket to switch into XML mode. Of course in the end, all opening <% tags must be closed with the matching %> tag.

HTML tags are no different to the compiler than XML tags, so you easily create an XML document with the same data and the structure you want. After all, HTML documents are special cases of XML documents. You can use the XMLSerializer class to convert the collection of custom objects to an XML document with a single statement, but LINQ allows you to transform your data in just about any way you see fit. Try to create a comma-delimited file with the book data or a file with fixed-length columns for each of the properties.

Our next LINQ stop is LINQ to XML. In the following section, you'll learn how to combine the basic methods of the XElement class with LINQ to retrieve the desired elements and/or attributes from an XML document using LINQ. As you will see, you can use LINQ to query XML documents just as you did with collections.

LINQ to XML

The second component of LINQ is the LINQ to XML component. XML is gaining in popularity and acceptance, and Microsoft has decided to promote XML to a basic data type. Yes, XML is a data type like integers and strings! To understand how far VB is taking XML, type the following in a procedure or event handler:

Dim products = <Books>
                <Book ISBN="0000000000001">
                 <Name>Book Title 1</Name>
                 <Price>11.95</Price>
                </Book>
                <Book ISBN="000000000002">
                 <Name>Book Title 2</Name>
                 <Price>10.25</Price>
                </Book>
               </Books>

You need not worry too much about getting the document exactly right, because the editor works just like the XML editor. Every time you type an opening tag, it inserts the matching closing tag and ensures that what you're typing is a valid XML document. You can't apply a schema to the XML document you're creating, but you should expect this feature in a future version of Visual Studio.

You can create a new XML document in your code, but what can you do with it? You need a mechanism to manipulate the XML document with simple tools, and these tools are available through the following XML helper objects:

  • XDocument represents the XML document.

  • XComment represents a comment in the XML document.

  • XElement represents an XML element.

  • XAttribute represents an attribute in an XML element.

These objects can be used to access the document but also to create it. Instead of creating an XML document directly in your code, you can use the XML helper objects and a structural approach to create the same document. A simple XML document consists of elements, which may include attributes. To create a new XElement object, pass the element's name and value to its constructor:

New XElement(element_name, element_value)

The following statement will create a very simple XML document:

Dim XmlDoc = New XElement("Books")
MsgBox(XmlDoc.ToString)

You will see the string <Books /> in a message box. This is a trivial, yet valid, XML document. To create the same book collection as we did earlier by using the helper objects, insert the following statements in a button's Click event handler:

Dim doc = _
     New XElement("Books", _
         New XElement("Book", _
             New XAttribute("ISBN", "0000000000001"), _
             New XElement("Price", 11.95), _
             New XElement("Name", "Book Title 1"), _
             New XElement("Stock", _
                 New XAttribute("InStock", 12), _
                 New XAttribute("OnOrder", 24))), _
             New XElement("Book", _
                 New XAttribute("ISBN", "0000000000002"), _
             New XElement("Price", 10.25), _
             New XElement("Name", "Book Title 2"), _
             New XElement("Stock", _
                 New XAttribute("InStock", 7), _
                 New XAttribute("OnOrder", 10))))

I've added a twist to the new document to demonstrate the use of multiple attributes in the same element. The Stock element contains two attributes, InStock and OnOrder. Each element's value can be a basic data type, such as a string or a number, or another element. The Price element is a decimal value, and the Name element is a string. The Book element, however, contains three subelements: the Price, Name, and Stock elements.

The doc variable is of the XElement type. An XML document is not necessarily based on the XDocument class. The two basic operations you can perform with an XElement (and XDocument) object are to save it to a file and reload an XElement object from a file. The operations are performed with the Save and Load methods, which accept the file's name as an argument.

Adding Dynamic Content to an XML Document

The XML documents we've built in our code so far were static. Because XML support is built into VB, you can also create dynamic context, and this is where things get quite interesting. To insert some dynamic content into an XML document, insert the characters <%=. The editor will automatically insert the closing tag, which is %>. Everything within these two tags is treated as VB code and compiled. The two special tags create a placeholder in the document (or an expression hole), and the expression you insert in them is an embedded expression: You embed a VB expression in your document, and the compiler evaluates the expression and inserts the result in the XML document.

Here's a trivial XML document with an embedded expression. It's the statement that creates a document with a Book element (I copied it from a code segment presented in the preceding chapter), and I inserted the current date as an element:

Dim doc = _
   New XElement("Books", _
     New XElement("Book", _
         New XAttribute("ISBN", "0000000000001"), _
         New XAttribute("RecordDate", <%= Today %>), _
         New XElement("Price", 11.95), _
         New XElement("Name", "Book Title 1"), _
         New XElement("Stock", _
             New XAttribute("InStock", 12), _
    New XAttribute("OnOrder", 24))))

Let's say you have an array of Product objects and you want to create an XML document with these objects. Listing 14.3 shows the array with the product names.

Example 14.3. An array of Product objects

Dim Products() As Product = _
      {New Product With
         {.ProductID = 3, .ProductName = "Product A", _
          .ProductPrice = 8.75, _
          .ProductExpDate = #2/2/2009#}, _
        New Product With _
         {.ProductID = 4, .ProductName = "Product B", _
          .ProductPrice = 19.5}, _
        New Product With _
         {.ProductID = 5, .ProductName = "Product C", _
          .ProductPrice = 21.25, _
          .ProductExpDate = #12/31/2010#}}

The code for generating an XML document with three elements is quite short, but what if you had thousands of products? Let's assume that the Products array contains instances of the Product class. You can use the XMLSerializer class to generate an XML document with the array's contents. An alternative approach is to create an inline XML document with embedded expressions, as shown in Listing 14.4.

Example 14.4. An XML document with Product objects

Dim prods = <Products>
       <%= From prod In Products _
       Select <Product>
              <ID><%= prod.ProductID %></ID>
              <Name><%= prod.ProductName %></Name>
              <Price><%= prod.ProductPrice %></Price>
              <ExpirationDate>
                   <%= prod.ProductExpDate %></ExpirationDate>
              </Product> %>
              </Products>

This code segment looks pretty ugly, but here's how it works: In the first line, we start a new XML document. (The prods variable is actually of the XElement type, but an XElement is in its own right an XML document.) Notice that there's no line continuation character at the end of the first line of the XML document. Then comes a LINQ query embedded in the XML document with the <%= and %> tags. Notice the line continuation symbol at the end of this line(_). When we're in an expression hole, we're writing VB code, so line breaks matter. That makes the line continuation symbol necessary. Here's a much simplified version of the same code:

Dim prods = <Products>
            <%= From prod In Products _
                Select <Product>some product</Product> %>
            </Products>

This code segment will generate the following XML document:

<Products>
   <Product>some product</Product>
   <Product>some product</Product>
   <Product>some product</Product>
</Products>

The file contains no real data but is a valid XML document. The two tags with the percent sign switch into VB code, and the compiler executes the statements embedded in them. The embedded statement of our example is a LINQ query, which iterates through the elements of the Products array and selects literals (the XML tags shown in the output). To insert data between the tags, we must switch to VB again and insert the values we want to appear in the XML document. In other words, we must replace the string some product in the listing with some embedded expressions that return the values you want to insert in the XML document. These values are the properties of the Product class, as shown in Listing 14.3. The code shown in Listing 14.4 will produce the output shown in Listing 14.5.

Example 14.5. An XML document with the data of the array initialized in Listing 14.4

<Products>
   <Product>
     <ID>3</ID>
     <Name>Product A</Name>
     <Price>8.75</Price>
     <ExpirationDate>2009-02-02T00:00:00</ExpirationDate>
   </Product>
   <Product>
     <ID>4</ID>
     <Name>Product B</Name>
     <Price>19.5</Price>
     <ExpirationDate>0001-01-01T00:00:00</ExpirationDate>
   </Product>
   <Product>
     <ID>5</ID>
     <Name>Product C</Name>
     <Price>21.25</Price>
     <ExpirationDate>2010-12-31T00:00:00</ExpirationDate>
   </Product>
 </Products>

Transforming XML Documents

A common operation is the transformation of an XML document. If you have worked with XML in the past, you already know Extensible Stylesheet Language Transformations (XSLT), which is a language for transforming XML documents. If you're new to XML, you'll probably find it easier to transform XML documents with the LINQ to XML component. Even if you're familiar with XSLT, you should be aware that transforming XML documents with LINQ is straightforward. The idea is to create an inline XML document that contains HTML tags and an embedded LINQ query, like the following:

Dim HTML = <htlm><b>Products</b>
           <table border="all"><tr>
           <td>Product</td><td>Price</td>
           <td>Expiration</td></tr>
           <%= From item In Products.Descendants("Product") _
               Select <tr><td><%= item.<Name> %></td>
                      <td><%= item.<Price> %></td>
                      <td><%= Convert.ToDateTime( _
                      item.<ExpirationDate>.Value). _
                            ToShortDateString %>
            </td></tr> %></table>
            </htlm>
HTML.Save("Products.html")
Process.Start("Products.html")

The HTML variable stores plain HTML code. HTML is a subset of XML, and the editor will treat it like XML: It will insert the closing tags for you and will not let you nest tags in the wrong order. The Select keyword in the query is followed by a mix of HTML tags and embedded holes for inline expressions, which are the fields of the item object. Note the VB code for formatting the date in the last inline expression. The output of the previous listing is shown in Figure 14.3.

A simple XML segment (top) viewed as an HTML table (bottom). Transformation courtesy of LINQ.

Figure 14.3. A simple XML segment (top) viewed as an HTML table (bottom). Transformation courtesy of LINQ.

The last two statements save the HTML file generated by our code and then open it in Internet Explorer (or whichever application you've designated to handle by default the HTML documents).

Another interesting application of XML transformation is the transformation of XML data into instances of custom objects. Let's say you need to work with an XML file that contains product information, and you want to create a list of Product objects out of it. Let's also assume that the XML file has the following structure:

<Products>
<product ProductID="1" ProductName="Chai"
         CategoryID="1"  UnitPrice="18.0000"
         UnitsInStock="39" UnitsOnOrder="0" >
</product>
<product ProductID="2" ProductName="Chang"
         CategoryID="1" QuantityPerUnit="24 - 12 oz bottles"
         UnitPrice="19.0000"
         UnitsInStock="19" UnitsOnOrder="40" >
</product>
...
</Products>

First, you must load the XML file into an XElement variable with the following statement (I'm assuming that the XML file is in the same folder as the project):

Dim XMLproducts = XElement.Load("../../Products.xml")

Now, you can write a LINQ query that generates anonymous types, like the following:

Dim prods = From prod In XMLproducts.Descendants("product")
            Select New With {.Name = prod.Attribute("ProductName").Value,
                             .Price = prod.Attribute("UnitPrice").Value,
                             .InStock = prod.Attribute("UnitsInStock").Value,
                             .OnOrder = prod.Attribute("UnitsOnOrder").Value}}

The prods collection consists of objects with the four scalar properties. To make the example a touch more interesting, let's say that you don't want to create a "flat" object. The InStock and OnOrder properties will become properties of another object, the Stock property. The new anonymous type will have the following structure:

Product.Name
Product.Price
Product.Stock.InStock
Product.Stock.OnOrder

To create an anonymous type with the revised structure, you must replace the InStock and OnOrder properties with a new object, the Stock object, which will expose the InStock and OnOrder properties. The revised query is shown next:

Dim prods =
    From prod In XMLproducts.Descendants("product")
    Select New With {.Name = prod.Attribute("ProductName").Value,
                     .Price = prod.Attribute("UnitPrice").Value,
                     .Stock = New With {
                              .InStock = prod.Attribute("UnitsInStock").Value,
                .OnOrder = prod.Attribute("UnitsOnOrder").Value}}

A simple LINQ query allows you to move from XML into objects and replace the code that would normally use the XML axis methods (Elements and Descendents) with pure objects. Of course, anonymous types can be used only in the context of the procedure in which they were created. If you want to pass the prods collection between procedures, you should create a new Product class and use it to create instances of this object, because the anonymous types can't be used outside the routine in which they were created. The definition of the Product class, and the accompanying Stock class, is quite trivial:

Public Class Product
    Public Property Name As String
    Public Property Price As Decimal
    Public Property Stock As Stock
End Class
Public Class Stock
    Public InStock As Integer
    Public OnOrder As Integer
End Class

With the two class definitions in place, you can revise the LINQ query to populate the products collection with instances of the Product class:

Dim Products =
From prod In XMLproducts.Descendants("product")
       Select New Product With {
                  .Name = prod.Attribute("ProductName").Value,
                  .Stock = New Stock With {
                            .InStock = prod.Attribute("UnitsInStock").Value,
                            .OnOrder = prod.Attribute("UnitsOnOrder").Value}}

It shouldn't come as a surprise that you can iterate through both collections with the same statements:

For Each p In prods
    Debug.WriteLine("PRODUCT: " & p.Name & vbTab &
                    "   PRICE: " & p.Price.ToString &
                    "   STOCK = " & p.Stock.InStock & "/" & p.Stock.OnOrder)
Next

When executed, the preceding statements will generate the following output:

PRODUCT: Grandma's Boysenberry Spread          PRICE: 25.0100   STOCK = 179/0
PRODUCT: Uncle Bob's Organic Dried Pears       PRICE: 30.0100   STOCK = 27/0
PRODUCT: Northwoods Cranberry Sauce            PRICE: 40.0000   STOCK = 164/0

Working with XML Files

In this section, we're going to build a functional interface for viewing customers and orders. And this time we aren't going to work with a small sample file. We'll actually get our data from one of the sample databases that comes with SQL Server: the Northwind database. The structure of this database is discussed in Chapter 15, "Programming with ADO.NET," in detail, but for now I'll show you how to extract data in XML format from SQL Server. If you don't have SQL Server installed or if you're unfamiliar with databases, you can use the sample XML files in the folder of the VBLINQ project. Figure 14.4 shows the main form of the application, which retrieves the same data either from an XML file or directly from the database.

You may be wondering why you would extract relational data and process them with LINQ instead of executing SQL statements against the database. XML is the standard data-exchange format, and you may get data from any other source in this format. You may get an XML file generated from someone's database or even an Excel spreadsheet. In the past, you had to convert the data to another, more flexible format and then process it. With LINQ, you can directly query the XML document, transform it into other formats, and of course save it.

Displaying related data from XML files

Figure 14.4. Displaying related data from XML files

Start SQL Server, and execute the following query:

SELECT * FROM Customers FOR XML AUTO

This statement selects all columns and all rows for the Customers table and generates an element for each row. The field values are stored in the document as attributes of the corresponding row. The output of this statement is not a valid XML document because its elements are not embedded in a root element. To request an XML document in which all elements are embedded in a root element, use the ROOT keyword:

SELECT * FROM Customers FOR XML AUTO, ROOT('AllCustomers')

I'm using the root element AllCustomers because the elements of the XML document are named after the table. The preceding statement will generate an XML document with the following structure:

<AllCustomers>
   <Customers CustomerID="..." CompanyName="xxx" ... />
   <Customers CustomerID="..." CompanyName="xxx" ... />
    ...
</AllCustomers>

It would make more sense to generate an XML document with the Customers root element and name the individual elements Customer. To generate this structure, use the following statement:

SELECT * FROM Customers Customer FOR XML AUTO, ROOT('Customers')

Here's a segment of the XML document with the customers:

<Customers>
  <Customer CustomerID="ALFKI" CompanyName=
   "Alfreds Futterkiste" ContactName="Maria Anders"
   ContactTitle="Sales Representative"
   Country="Germany" />
  <Customer CustomerID="ANATR" CompanyName=
   "Ana Trujillo Emparedados y helados"
    ContactName="Ana Trujillo" ContactTitle="Owner"
    Country="Mexico" />

Finally, you can create an XML document where the fields are inserted as elements, rather than attributes. To do so, use the ELEMENTS keyword:

SELECT * FROM Customers Customer FOR XML AUTO,
   ELEMENTS ROOT('Customers')

The other statements that generated the XML files with the rows of the tables Orders, Order Details, and Products are as follows:

SELECT * FROM Orders Order FOR XML AUTO,  ROOT('Orders')
SELECT * FROM [Order Details] Detail FOR XML AUTO,
   ELEMENTS, ROOT('Details')
SELECT ProductID, ProductName FROM Products
   FOR XML AUTO, ELEMENTS ROOT('Products')

Notice that all files are attribute based, except for the Details.xml file, which is element based. I had no specific reason for choosing this structure; I just wanted to demonstrate both styles for processing XML in the sample project's code. Also, the reason I've included the Products table is because the Order Details table, which contains the lines of the order, stores the IDs of the products, not the product names. When displaying orders, as shown in Figure 14.4, you must show product names, not just product IDs. The four collections with the entities we extracted from the Northwind database are declared and populated at the form's level via the following statements:

Dim customers As XElement = XElement.Load("......Customers.xml")
Dim orders As XElement = XElement.Load("......Orders.xml")
Dim details As XElement = XElement.Load("......Details.xml")
Dim products As XElement = XElement.Load("......Products.xml")

As is apparent from the code, I've placed the four XML files created with the SQL statements shown earlier in the project's folder. The Display Data button populates the top ListView control with the rows of the Customers table, via the following statements:

Private Sub bttnShow_Click(...) Handles bttnShow.Click
    For Each c In customers.Descendants("Customer")
        Dim LI As New ListViewItem
LI.Text = c.@CustomerID
        LI.SubItems.Add(c.@CompanyName)
        LI.SubItems.Add(c.@ContactName)
        LI.SubItems.Add(c.@ContactTitle)
        ListView1.Items.Add(LI)
    Next
End Sub

The code is quite simple. It doesn't even use LINQ; it iterates through the Customer elements of the customers collection and displays their attributes on the control. Notice the use of the shortcut for the Attribute property of the current XElement.

When the user clicks a customer name, the control's SelectedIndexChanged event is fired. The code in this handler executes a LINQ statement that selects the rows of the Orders table that correspond to the ID of the selected customer. Then, it iterates through the selected rows, which are the orders of the current customer, and displays their fields on the second ListView control via the following statements:

Private Sub ListView1_SelectedIndexChanged(...) _
            Handles ListView1.SelectedIndexChanged
    If ListView1.SelectedItems.Count = 0 Then Exit Sub
    ListView2.Items.Clear()
    Dim scustomerID = ListView1.SelectedItems(0).Text
    Dim query = From o In orders.Descendants("Order")
                Where Convert.ToString(o.@CustomerID) = scustomerID
                Select o
    For Each o In query
        Dim LI As New ListViewItem
        LI.Text = [email protected]
        LI.SubItems.Add(Convert.ToDateTime
                 (o.@OrderDate).ToShortDateString)
        LI.SubItems.Add(Convert.ToDecimal(o.@Freight).ToString("#,###.00"))
        LI.SubItems.Add([email protected])
        ListView2.Items.Add(LI)
    Next
End Sub

The LINQ query selects Order elements based on their CustomerID attribute. Finally, when an order is clicked, the following LINQ query retrieves the selected order's details:

Dim query = From itm In details.Descendants("Detail")
            Where Convert.ToInt32(itm.<OrderID>.Value) = orderID
            Select itm

The Details.xml file contains elements for all columns, not attributes, and I use statements such as <dtl.UnitPrice> to access the subelements of the current element. To display product names, the code selects the row of the Products collection that corresponds to the ID of each detail line as follows:

Dim product = _
      From p In products.Descendants("Product")
      Where Convert.ToInt32(p.@ProductID) =
            Convert.ToInt32(dtl.<ProductID>.Value)
      Select p

The product variable is actually a collection of XElements, even though it can never contain more than a single element (product IDs are unique). You access the ProductName column of the selected row with the expression product(0).@productName. You can call the First method to make sure you've selected a single product, no matter what:

Dim product = _
      (From p In products.Descendants("Product")
       Where Convert.ToInt32(p.@ProductID) =
             Convert.ToInt32(dtl.<ProductID>.Value)
       Select p).First

LINQ to SQL

SQL stands for Structured Query Language, a language for querying databases. SQL is discussed in the last part of this book, and as you will see, SQL resembles LINQ. SQL is a simple language, and I will explain the SQL statements used in the examples; readers who are somewhat familiar with databases should be able to follow along.

Now, let's build another application for displaying customers, orders, and order details. The difference is that this time you won't get your data from an XML document; you'll retrieve them directly from the database. As you will see, the same LINQ queries will be used to process the rows returned by the queries. The code won't be identical to the code presented in the preceding section, but the differences are minor. The same principles will be applied to a very different data source.

You need a mechanism to connect to the database so you can retrieve data, and this mechanism is the DataContext class. The DataContext class talks to the database, retrieves data, and submits changes back to the database. To create a DataContext object, pass a string with the information about the database server, the specific database, and your credentials to the DataContext class's constructor, as shown here:

Dim db As New DataContext("Data Source=localhost;
                  initial catalog=northwind;
                  Integrated Security=True")

To use the DataContext class in your code, you must add a reference to the System.Data.Linq namespace and then import it into your code with this statement:

Imports System.Data.Linq

You will find more information on connecting to databases in Chapter 15. For the purposes of this chapter, the preceding connection string will connect your application to the Northwind database on the local database server, provided that you have SQL Server or SQL Server Express installed on the same machine as Visual Studio. If you do not, replace "localhost" in the connection string with the name or IP address of the machine on which SQL Server is running.

After you have initialized the DataContext object, you're ready to read data from tables into variables. To do so, call the GetTable method of the db object to retrieve the rows of a table. Note that the name of the table is not specified as an argument. Instead, the table is inferred from the type passed to the GetTable method as an argument. The GetTable(Of Customer) method will retrieve the rows of the Customers table, because the name of the table is specified in the definition of the class, as you will see shortly.

customers = From cust In db.GetTable(Of Customer)()
            Select New Customer With
            {.CustomerID = cust.CustomerID,
             .CompanyName = cust.CompanyName,
             .ContactName = cust.ContactName,
             .ContactTitle = cust.ContactTitle}
orders = From ord In db.GetTable(Of Order)()
         Select New Order With
         {.OrderID = ord.OrderID,
          .OrderDate = ord.OrderDate,
          .CustomerID = ord.CustomerID,
          .Freight = ord.Freight,
          .ShipName = ord.ShipName}
details = From det In db.GetTable(Of Detail)()
          Select New Detail With
          {.OrderID = det.OrderID,
           .ProductID = det.ProductID,
           .Quantity = det.Quantity,
           .UnitPrice = det.UnitPrice,
           .Discount = det.Discount}
products = From prod In db.GetTable(Of NWProduct)()
           Select New NWProduct With
           {.ProductID = prod.ProductID,
            .ProductName = prod.ProductName}

The type of the customers, orders, details, and products variables is IQueryable(of entity), where entity is the appropriate type for the information you're reading from the database. The four variables that will store the rows of the corresponding tables must be declared at the form level with the following statements:

Dim customers As System.Linq.IQueryable(Of Customer)
Dim orders As System.Linq.IQueryable(Of Order)
Dim details As System.Linq.IQueryable(Of Detail)
Dim products As System.Linq.IQueryable(Of NWProduct)

The variables must be declared explicitly at the form level, because they will be accessed from within multiple event handlers.

To make the most of LINQ to SQL, you must first design a separate class for each table that you want to load from the database. You can also specify the mapping between your classes and the tables from which their instances will be loaded, by prefixing them with the appropriate attributes. The Customer class, for example, will be loaded with data from the Customers table. To specify the relationship between the class and the table, use the Table attribute, as shown here:

<Table(Name:="Customers")>Public Class Customer
End Class

Each property of the Customer class will be mapped to a column of the Customers table. In a similar manner, decorate each property with the name of the column that will populate the property:

<Column(Name:="CompanyName")>Public Property Name
End Property

If the name of the property matches the name of the relevant column, you can omit the column's name:

<Column()>Public Property Name
End Property

Listing 14.6 shows the definition of the four classes we'll use to store the four tables (Customers, Orders, Order Details, and Products).

Example 14.6. The classes for storing customers and orders

<Table(Name:="Customers")> Public Class Customer
    Private _CustomerID As String
    Private _CompanyName As String
    Private _ContactName As String
    Private _ContactTitle As String

    <Column()> Public Property CustomerID() As String
        Get
            Return _customerID
        End Get
        Set(ByVal value As String)
            _customerID = value
        End Set
    End Property

    <Column()> Public Property CompanyName() As String
        Get
            Return _CompanyName
        End Get
        Set(ByVal value As String)
            _CompanyName = value
End Set
    End Property

    <Column()> Public Property ContactName() As String
        ....
    End Property

    <Column()> Public Property ContactTitle() As String
        ....
    End Property
End Class

<Table(Name:="Orders")> Public Class Order
    Private _OrderID As Integer
    Private _CustomerID As String
    Private _OrderDate As Date
    Private _Freight As Decimal
    Private _ShipName As String

    <Column()> Public Property OrderID() As Integer
        ....
    End Property

    <Column()> Public Property CustomerID() As String
        ....
    End Property

    <Column()> Public Property OrderDate() As Date
        ....
    End Property


    <Column()> Public Property Freight() As Decimal
        ....
    End Property

    <Column()> Public Property ShipName() As String
        ....
    End Property
End Class

<Table(Name:="Order Details")> Public Class Detail
    Private _OrderID As Integer
    Private _ProductID As Integer
    Private _Quantity As Integer
    Private _UnitPrice As Decimal
    Private _Discount As Decimal

    <Column()> Public Property OrderID() As Integer
....

    End Property

    <Column()> Public Property ProductID() As Integer
        ....
    End Property

    <Column()> Public Property Quantity() As Short
        ....
    End Property

    <Column()> Public Property UnitPrice() As Decimal
        ....
    End Property


    <Column()> Public Property Discount() As Double
        ....
    End Property
End Class

<Table(Name:="Products")> Public Class NWProduct
    Private _ProductID As Integer
    Private _ProductName As String

    <Column()> Public Property ProductID() As Integer
        ....
    End Property

    <Column()> Public Property ProductName() As String
        ....
    End Property

End Class

I didn't show the implementation of most properties, because it's trivial. What's interesting in this listing are the Table and Column attributes that determine how the instances of the classes will be populated from the database, as you saw earlier.

The code that displays the selected customer's orders and the selected order's details is similar to the code you saw in the previous section that displays the data from the XML files. It selects the matching rows in the relevant table and shows them in the corresponding ListView control.

Retrieving Data with the ExecuteQuery Method

You can also retrieve a subset of the table, or combine multiple tables, by executing a SQL query against the database. The ExecuteQuery method, which accepts as arguments the SELECT statement to be executed and an array with parameter values, returns a collection with the selected rows as objects. To call the ExecuteQuery method, you must specify the class that will be used to store the results with the Of keyword in parentheses following the method's name. Then you specify the SELECT statement that will retrieve the desired rows. If this query contains any parameters, you must also supply an array of objects with the parameter values. Parameters are identified by their order in the query, not by a name. The first parameters is 0, the second parameter is 1, and so on. The following statement will retrieve all customers from Germany and store them in instances of the Customer class:

Dim params() = {"Germany"}
Dim GermanCustomers = _
          db.ExecuteQuery(Of Customer)( _
          "SELECT CustomerID, CompanyName," & _
          "ContactName, ContactTitle " &
          "FROM Customers WHERE Country={0}", params)"

After the GermanCustomers collection has been populated, you can iterate through its items as usual, with a loop like the following:

For Each cust In GermanCustomers
    Debug.WriteLine(cust.CompanyName & " " & _
                     cust.ContactName)
Next

Once you have retrieved the results from the database, you can execute LINQ queries against the collection. To find out the number of customers from Germany, use the following expression:

Dim custCount = GermanCustomers.Count

To apply a filtering expression and then retrieve the count, use the following LINQ expression:

Dim g = GermanCustomers.Where(Function(c As Customer) _
            c.CompanyName.ToUpper Like "*DELIKATESSEN*").Count

To appreciate the role of the DataContext class in LINQ to SQL, you should examine the ToString property of a LINQ query that's executed against the database. Insert a statement to display the expression GermanCustomers.ToString() in your code, and you will see that the DataContext class has generated and executed the following statement against the database. If you're familiar with SQL Server, you can run the SQL Server Profiler and trace all commands executed against SQL Server. Start SQL Server Profiler (or ask the database administrator to create a log of all statements executed by your workstation against a specific database), and then execute a few LINQ to SQL queries. Here's the statement for selecting the German customers as reported by the profiler:

exec sp_executesql N'SELECT Customers.CompanyName,
     Orders.OrderID, SUM(UnitPrice*Quantity) AS
         OrderTotal FROM Customers INNER JOIN Orders
ON Customers.CustomerID = Orders.CustomerID
     INNER JOIN [Order Details] ON
         [Order Details].OrderID = Orders.OrderID
     WHERE Customers.Country=@p0
     GROUP BY Customers.CompanyName,
     Orders.OrderID',N'@p0 nvarchar(7)',@p0=N'Germany'

Working with LINQ to SQL Classes

The process of getting data out of a database and into a custom class is as straightforward as it can get. You create a class with properties that match the columns of the equivalent table, and then you use the DataContext object to populate these classes. You may be thinking already about a class generator that will take care of the mapping between the class properties and the table columns. Visual Studio does that for you with a component called LINQ to SQL Classes.

A LINQ to SQL Classes component encapsulates a segment of a database, or the entire database, and lets you work against a database as if the database entities were objects. While in traditional database programming you code against tables that are made up of rows, with LINQ to SQL Classes you will work against the same database, only this time the tables will be collections made up of custom objects. The Customers table of the Northwind database, for example, contains rows that represent customers. When you work with a LINQ to SQL Classes component, the Customers table becomes a collection, and its rows become instances of Customer objects. As you will see shortly, the idea behind LINQ to SQL Classes is to bridge the gap between traditional database programming and the object-oriented features of modern programming languages. You'll see the advantages of accessing databases as collections of strongly typed objects in just a few pages.

To add this component to your solution, right-click the solution name, and from the context menu select Add New Item. In the Add New Item dialog box, select the LINQ to SQL Classes component, as shown in Figure 14.5, and set the component's name (use the NWind name for this example).

Start by adding a LINQ to SQL Classes component to your project.

Figure 14.5. Start by adding a LINQ to SQL Classes component to your project.

Once the new component has been added to the project, the Server Explorer window will open. Here you can select a connection to one of the databases on your system (I'm assuming you have installed either SQL Server or SQL Server Express). Create a new connection to the Northwind database if you don't already have a connection to this database, and open it. If you don't know how to create connections to databases, follow this procedure:

  1. Switch to Server Explorer, and right-click the Data Connections item. From the context menu, select Add Connection to open the dialog box shown in Figure 14.6.

  2. In the Add Connection dialog box that appears, select the name of the database server you want to use. I'm assuming that most readers have a version of SQL Server 2008 installed on their machines, so you can specify localhost as the server name. If you're connected to a remote database server on the network, the database administrator will give the proper database name and credentials.

    Creating a new database connection

    Figure 14.6. Creating a new database connection

  3. Select the authentication method for the database server. Again, most readers can select the option Use Windows Authentication. To connect to a remote server, you will most likely have to select Use SQL Server Authentication and supply your credentials, as shown in Figure 14.6.

  4. Expand the list of databases in the drop-down list in the lower pane of the dialog box, and select Northwind. If you haven't installed the Northwind database, then you should download and install it, as explained in Chapter 15.

As soon as you close the Add Connection dialog box, the designer will add a new component to the class, the DataClasses1.dbml component, and will open it in design mode. DataClasses1 is the default name of a LINQ to SQL Classes component, and I suggest you change the name to something more meaningful. The VBLINQ project uses the name TableClasses.

The designer is initially an empty space. But here's how easy it is to create custom objects based on the database entities, as shown in Figure 14.7.

Designing a LINQ to SQL Classes class with visual tools

Figure 14.7. Designing a LINQ to SQL Classes class with visual tools

Server Explorer will display all the items in the database. Select the Customers, Orders and Order Details tables from Server Explorer, and drop them onto the designer's surface. The designer will pick up the relations between the tables from the database and will depict them as arrows between related classes. It will also create the appropriate classes on the fly, one for each table. Specifically, the designer will create the Customer, Order, and Order_Detail classes that represent the entities stored in the Customers, Orders, and Order Details tables. Notice how the designer singularized the names of the entities.

The designer has also created three collections to represent the three tables, as well as a DataContext object to connect to the database. To exercise the autogenerated classes, build the sample form shown in Figure 14.8. This form loads the countries and uses their names to populate the ComboBox control at the top. Every time the user selects a country, the application makes another trip to the database, retrieves the customers in the selected country, and displays the customer names on the Select Customer ListBox control.

A form for viewing customers, their orders, and the details for each order

Figure 14.8. A form for viewing customers, their orders, and the details for each order

Once a customer has been selected, the application makes another trip to the database and selects the customer's orders, which are displayed on the top ListView control. Finally, when an order is selected, the application reads the order's details and displays them on the lower ListView control on the right. The idea is to get as little information as possible from the database depending on the user's action. There's no need to retrieve all the customers when the application starts, because the user many not even view any customer. In general, you should try to limit the user's selection so that you can minimize the information you request from the database and download at the client. Although there will be times you need to minimize trips to the database, in that case you will pull data that you might need and then possibly throw some of it away.

The DataContext Object

The following statement creates a new DataContext object for accessing the Northwind database:

Dim ctx As New NwindDataContext

The NWindDataContext class was generated by the designer; it gives you access to the database's tables (and stored procedures). The database tables are properties of the ctx variable, and they return an IQueryable collection with the rows of each table. To access the Customers table, for example, request the Customers property:

Ctx.Customers

Each item in the Customers collection is an object of the Customer type. The designer also generated a class for the entities stored in each of the tables. Not only that, but it singularized the names of the tables.

Accessing the Tables with LINQ

Since the Customers property of the ctx variable returns the rows of the Customers table as a collection, you can use LINQ to query the table. The following query returns the German customers:

Dim germanCustomers = From cust In ctx.Customers
    Where cust.Country = "Germany"
    Select cust

The compiler knows that cust is a variable of the Customer type, so it displays the fields of the Customers table (which are now properties of the Customer object) in the IntelliSense drop-down list. In effect, the LINQ to SQL component has mapped the selected tables into objects that you can use to access your database using OOP techniques.

But the LINQ to SQL Classes component has done much more. germanCustomers is a query that isn't executed until you request its elements. The expression ctx.Customers doesn't move the rows of the Customers table to the client so you can query them. Instead, it parses your LINQ query, builds the appropriate SQL query, and executes it when you iterate through the query results. To see the queries that are executed as per your LINQ query when they're submitted to the database, insert the following simple statement right after the declaration of the ctx variable:

ctx.Log = Console.Out

This statement tells the compiler to send all the commands that the DataContext object submits to the database to the Output window. Place a button on the main form of the project, and in its Click event handler insert the following statements:

ctx.Log = Console.Out
Dim selCustomers = From cust In ctx.Customers
                   Where cust.Country = "Germany"
                   Select cust
MsgBox("No query executed so far!")
For Each cust In selCustomers
    ListBox1.Items.Add(cust.CustomerID & vbTab & cust.CompanyName)
Next

Execute these statements, and watch the Output window. The message box will be displayed and nothing will be shown in the Output window, because no data has been requested from the database yet. selCustomers is just a query that the compiler has analyzed, but it hasn't been executed yet. As soon as you close the message box, the following SQL statement will be submitted to the database to request some data:

SELECT [t0].[CustomerID], [t0].[CompanyName], [t0].[ContactName],
[t0].[ContactTitle], [t0].[Address], [t0].[City], [t0].[Region],
[t0].[PostalCode], [t0].[Country], [t0].[Phone], [t0].[Fax]
FROM [dbo].[Customers] AS [t0]
WHERE [t0].[Country] = @p0
-- @p0: Input NVarChar (Size = 4000; Prec = 0; Scale = 0) [Germany]
-- Context: SqlProvider(Sql2008) Model: AttributedMetaModel Build: 4.0.20506.1

If you're not familiar with SQL, don't panic. You'll find the basics later in this book. If you're a totally result-oriented developer, don't even bother with the SQL statement; VB does it all for you. It knows how to request the data you need, and it won't bring even one extra row from the Customers table back to the client. Of course, you shouldn't iterate through all the rows of the table, because this could ruin application performance. Never, never bring an entire table to the client, unless it's a small table (categories, state names, and so on). Even then, it's not a good idea to keep the data at the client for long periods of time; other users may edit the data at the database, and when that happens, the data at the client become "stale," because the data at the client are not directly associated with the database and will not be updated. The short of the story is that the DataContext object establishes a connection to the database and lets you view your database tables as collections of objects. Use it to grab the data you need, and submit changes as the user edits the data.

To limit the number of rows you bring to the client, try to give users the option to specify selection criteria. The sample application you'll build in this section requests that users select a country and then brings only the customers from that particular country to the client. Actually, you don't even need to bring all the fields of the Customers table. All you will need is the CompanyName that's displayed on the ListBox control and the ID that you'll use in subsequent queries to identify the selected customer. If you have too much data, you can limit the number of rows you bring to the client (to an arbitrary value, say, 1,000 rows). If the user has selected more rows, they should specify a more specific search pattern to limit the number of rows. After all, who needs to see thousands and thousands of rows just because they exist?

Navigation Methods

Tables are rarely isolated in a database, and no one ever really cares about the rows of a single table. Tables are (almost always) related to one another. Customers place orders, and orders contain products and prices. Orders belong to customers, but they're created by employees. The most common operation in data-driven applications is not entering new rows or editing existing ones. Instead, data-driven applications spend most of their time going from a row in a specific table to one or more related rows in another table. Of course, retrieving the related rows implies that you will also design an interface to display the data you gather from multiple tables to the user. Let's say you have landed on a specific customer, represented by the selCustomer variable (it's a variable of the Customer type, of course). You could select a customer by index, with a LINQ query, or let the user select it from a list. The idea is that you have retrieved the customer you're interested in.

To access the related rows in other tables, you can request the property of the Customer object that corresponds to the related table. To access the orders of the customer represented by the cust object, use the following expression:

cust.Orders

This expression returns an IQueryable collection as usual, with the rows of the Orders table that correspond to the selected customer. The Orders property represents the Orders table and returns an IQueryable collection of Order objects, each representing an order, as you might expect. However, it doesn't return all rows of the Orders table—just the ones that belong to the selected customer. Each Order object in turn exposes the columns of the Orders table as properties. To access the OrderDate field of the first order of the first customer in the germanCustomers collection, you'd use an expression like the following:

selCustomers.ToList(0).Orders.ToList(0).OrderDate

You have to apply the ToList operator to every collection to force the execution of the appropriate query and then select a specific item of the collection.

Now we can take a closer look at the code of the sample project code. The following code loads all countries and displays their names on the ListBox control:

Private Sub Button1_Click(...) Handles Button1.Click
    ctx = New NWindDataContext
    ctx.Log = Console.Out
    Dim countries = From cust In ctx.Customers
                    Select cust.Country Distinct
    For Each country As String In countries
        cbCountries.Items.Add(country)
    Next
End Sub

The very first query is peculiar indeed. The Northwind database doesn't store the countries in a separate table, so you have to go through the Customers table and collect all the unique country names. This is what the Distinct keyword does: It forces the query to return each unique country name from the Customers table only once. LINQ doesn't download all the rows of the Customers table to the client to select the unique country names. The actual query sent to the database by LINQ is the following, which instructs SQL Server to return only the unique country names:

SELECT DISTINCT [t0].[Country]
FROM [dbo].[Customers] AS [t0]

LINQ is very efficient when it comes to talking to the database, and you will see shortly how you can monitor the queries it submits to the database.

When the user selects a customer on the ListBox control, the statements included in Listing 14.7 are executed to display the number of orders placed by the customer and the total revenue generated by the selected customer, as well as the headers of all orders.

Example 14.7. Retrieving the orders of the selected customer

Private Sub ListBox1_SelectedIndexChanged(...)
                Handles ListBox1.SelectedIndexChanged
    If ListBox1.SelectedItem Is Nothing Then Exit Sub
    Dim custID As String = ListBox1.SelectedItem.ToString.Substring(0, 5)
    Dim custName As String = ListBox1.SelectedItem.ToString.Substring(5).Trim
    Dim customerOrders = From ord In ctx.Orders
                     Where ord.CustomerID = custID
                     Select New With {.order = ord, .details = ord.Order_Details}
    Dim orders = customerOrders.Count
    If orders > 0 Then
Dim tot = From o In customerOrders
                  Select Aggregate det In o.details Into
                         Sum(det.UnitPrice * det.Quantity * (1 - det.Discount))
        TextBox1.Text = "Customer " & custName & " has placed " &
                         orders.ToString & " orders totalling $" &
                         tot.Sum.ToString
    Else
        TextBox1.Text = "There are no order for customer " & custName
    End If
    lvOrders.Items.Clear()
    For Each ord In customerOrders
        Dim LI As New ListViewItem
        LI.Text = ord.order.OrderID.ToString
        LI.SubItems.Add(ord.order.OrderDate.Value.ToShortDateString)
        LI.SubItems.Add((Aggregate dtl In ord.details Into
                   Sum(dtl.UnitPrice * dtl.Quantity *
                      (1 - dtl.Discount))).ToString)
        LI.SubItems.Add(Aggregate dtl In ord.details Into Sum(dtl.Quantity))
        LI.SubItems.Add(ord.order.Freight.ToString)
        lvOrders.Items.Add(LI)
    Next
End Sub

The code extracts the selected customer's ID from the ListBox control and stores it to the custID variable. It uses this variable to select the customer's orders into the customerOrders collection. Next, it calculates the number of orders and the total revenue generated by the customer and displays it on the form. Finally, it iterates through the customerOrders collection and displays the orders on a ListView control. One of the items shown in the ListView control is the total of each order, which is calculated by another LINQ query that aggregates the current order's details:

Aggregate dtl In ord.details Into
                 Sum(dtl.UnitPrice * dtl.Quantity * (1 - dtl.Discount))

This query returns a Double value, which is formatted and displayed like a variable. The query isn't assigned to a variable, and there's no Select clause—just the aggregate value.

You may be tempted to write a loop that iterates through all the rows in the Customers table to calculate aggregates. Just don't! Use LINQ to formulate the appropriate query, and then let the compiler figure out the statement that must be executed against the database to retrieve the information you need, and no more. If you execute a loop at the client, LINQ to SQL will move all the rows of the relevant tables to the client, where the loop will be executed. Although this may work for Northwind, as the database grows larger it will be an enormous burden on the database and the local network. The query might get a little complicated, but it saves you from the performance issues you'd face when the application is released to many clients. Eventually, you will be forced to go back and rewrite your code.

Let's say you need to know the revenue generated by all customers in each country and in a specific year. The following LINQ query does exactly that and returns a collection of anonymous types with just two fields: the country name and the total revenue per country.

Dim revenueByCountry = From cust In ctx.Customers,
                       ord In cust.Orders,
                       det In ord.Order_Details
                       Group By cust.Country Into countryTotals = Group,
                       countryRev =
                             Sum(det.Quantity * det.UnitPrice * (1 - det.Discount))
                       Select Country, countryRev

This statement returns a collection of country names and totals like this:

Austria  57401.84
Belgium  11434.48
Brazil   41941.19
Canada   31298.06
Denmark  25192.54

To execute this query, which in effect scans a large segment of the database and returns a few totals, LINQ generates and executes the following SQL statement:

SELECT [t4].[Country], [t4].[value] AS [countryRev]
FROM (
    SELECT SUM([t3].[value]) AS [value], [t3].[Country]
    FROM (
        SELECT (CONVERT(Real,
               (CONVERT(Decimal(29,4),CONVERT(Int,[t2].[Quantity]))) *
               [t2].[UnitPrice]))
                         * (@p0 - [t2].[Discount]) AS [value],
                [t1].[OrderDate], [t1].[CustomerID], [t0].[CustomerID] AS
                [CustomerID2],
                [t2].[OrderID], [t1].[OrderID] AS [OrderID2], [t0].[Country]
        FROM [dbo].[Customers] AS [t0], [dbo].[Orders] AS [t1],
             [dbo].[Order Details] AS [t2]) AS [t3]
        WHERE (DATEPART(Year, [t3].[OrderDate]) = @p1) AND
                   ([t3].[CustomerID] = [t3].[CustomerID2]) AND
                   ([t3].[OrderID] = [t3].[OrderID2])
GROUP BY [t3].[Country] ) AS [t4]
ORDER BY [t4].[Country]
-- @p0: Input Real (Size = −1; Prec = 0; Scale = 0) [1]
-- @p1: Input Int (Size = −1; Prec = 0; Scale = 0) [1997]

This is a fairly complicated SQL query, unless you're familiar with SQL. If not, try to master LINQ and let the compiler generate the SQL queries for you. If you haven't used SQL before, you'll find an introduction to SQL in the following chapter.

Likewise, when you click an order, the program retrieves the details of the selected order and displays them on the second ListView control, using the following query:

Dim selectedDetails = From det In ctx.Order_Details, prod In ctx.Products
                      Where prod.ProductID = det.ProductID And
det.OrderID = OrderID
                      Select New With {.details = det}

Note that this query combines the Products table, because the Order Details table contains only the IDs of the products, not their names. This is how the data are organized in the database, but not how you want to present the data to the user.

Updates

In addition to querying the database, you can also update it by inserting new rows or updating and deleting existing rows. To insert a new row into a table, create a new object of the appropriate type and then add it to its table by calling the InsertOnSubmit method. The InsertOnSubmit method doesn't submit the new row to the database; the new row lives at the client and is submitted to the database when you call the SubmitChanges method. The following statements create a new product and add it to the Products table:

Dim P As New Product
P.ProductName = "New Product"
P.CategoryID = 3
P.UnitPrice = 9.45
ctx.Products.InsertOnSubmit(P)
ctx.SubmitChanges

You can accumulate multiple changes to the Products table (or any other table for that matter) and submit them all at once to the database with a single call to the SubmitChanges method. The compiler will figure out the appropriate order for submitting the rows. For example, it will first insert new categories and then products, because a new product may contain a reference to a new category.

To update a row, just change some of its properties, and the edits will be submitted to the database when the SubmitChanges method is called. Finally, to delete a row, call the DeleteOnSubmit method and pass the Product object to be deleted as an argument. There are also two related methods, the InsertAllOnSubmit and DeleteAllOnSubmit methods, which accept an IEnumerable collection of objects as an argument.

Submitting updates to the database isn't a trivial topic. For example, one or more other users might have edited or deleted the row you're trying to update since your application read it. You can't take it for granted that all updates will be submitted successfully to the database. Consider, too, one of the restrictions in the Northwind database is that product prices can't be negative. The Product class generated by the designer doesn't enforce this restriction at the client. At the client, it's perfectly legal to assign a negative value to a product's UnitPrice property, but a row that contains a negative price value will fail to update the database. The database itself will reject any updates that violate any database constraint. You can also set a product's CategoryID field to an arbitrary value, but unless this value matches the ID of an existing category in the Categories table, the changes will be rejected by the database. Handling these conditions requires additional code. The topic of handling update errors is discussed in detail in Part V of this book. This section is a quick introduction to a component that allows you to handle database tables as objects and manipulate them with LINQ.

The VBLINQ2SQL project (available for download from www.sybex.com/go/masteringvb2010) contains a form that displays all products on a ListView control, as shown in Figure 14.9. The Add Another Product button brings up the form shown in the same figure, which allows you to specify a new product and submit it to the database. The new product is added automatically to the list with the products. You can also click the Reload All Products button to confirm that the product has been committed to the database. If the new product violates one of the database constraints (for example, it has a negative value), the operation will fail, and you will see an appropriate error message.

Viewing all products and inserting/editing individual products

Figure 14.9. Viewing all products and inserting/editing individual products

If you double-click a product row in the Northwind Products form, the auxiliary form will be displayed again, this time populated with the fields of the selected product, and you can edit them.

Creating a New Order

I'll complete the presentation of the VBLINQ2SQL sample application by discussing the code that creates a new order. The Order object combines several tables and the most interesting object in the Northwind database. Orders are placed by customers and credited to employees. They also contain a number of products, along with their quantities and prices. A proper interface should allow users to specify all these items, and you will see a suitable interface for creating orders in Chapter 15. For the purposes of this sample application, I've decided to select the appropriate items at random, but the application does generate actual orders and submits them to the database. The Add New Order button on Form2, which is shown in Figure 14.10, does exactly that with the statements included in Listing 14.8.

Example 14.8. Adding a new order to the Northwind database

Dim RND As New Random
' select a customer at random
Dim cust = CType(ctx.Customers.Skip(
                     RND.Next(1, 50)).Take(1).First, Customer)
' select an employee at random
Dim emp = CType(ctx.Employees.Skip(
RND.Next(1, 10)).Take(1).First, Employee)
' and create order's header
Dim order As New Order
order.OrderDate = Now
order.Customer = cust
order.Employee = emp
' select a random freight for the order in the range from $3 to $75
order.Freight = RND.Next(300, 7500) / 100
Dim discount As Decimal
' select a random discount value for the order
discount = RND.Next(0, 45) / 100
Dim prod As Product
' create a random number of detail lines in the range from 10 to 50
For i As Integer = 1 To RND.Next(10, 50)
    prod = CType((From p As Product In ctx.Products
                  Where p.ProductID = RND.Next(1, 50) Select p).Single, Product)
' add product to order only if it doesn't exist already
' because the Order Details table has a unique costraint
' on fields OrerID + ProductID
    If order.Order_Details.Where(
              Function(d) d.ProductID = prod.ProductID).Count = 0 Then
        order.Order_Details.Add(
            New Order_Detail With
               {.ProductID = prod.ProductID,
                .Quantity = RND.Next(5, 15),
                .UnitPrice = prod.UnitPrice,
                .Discount = discount})
    End If
 Next
' and now submit the order to the database
ctx.Orders.InsertOnSubmit(order)
ctx.SubmitChanges()
frmOrder.txtOrderID.Text = order.OrderID.ToString
frmOrder.txtOrderDate.Text = order.OrderDate.Value.ToShortDateString
frmOrder.txtOrderCustomer.Text = order.Customer.CompanyName &
                                  " / " & order.Customer.ContactName
frmOrder.txtEmployee.Text = order.Employee.LastName & ", " &
                                 order.Employee.FirstName
' statement to display order on frmOrder auxiliary form
frmOrder.ShowDialog()

Since I'm only interested in showing you how to create a new order, the code selects a customer at random. It does so by skipping a random number of rows in the Customers table with the Skip method and then selecting the following one. Next, it selects a random number of products. Because of a constraint in the Orders table, the code must verify that each new product appears only once in the order. In other words, if the product chosen at random belongs to the order already, the code ignores it, and it does so by calling the Where extension method of the order collection with this statement:

If order.Order_Details.Where(Function(d) d.ProductID = prod.ProductID).Count = 0
Adding a new order to the Northwind database with LINQ to SQL

Figure 14.10. Adding a new order to the Northwind database with LINQ to SQL

The lambda expression passed to the method selects the row with the product ID you're about to add. If it doesn't exist on that particular order, then the code adds it to the order collection. Other than this detail, the code is straightforward. If you open the sample application and examine its code, you will see that it contains straightforward code that manipulates custom types and collections and only a couple of database-related statements. The new order is represented by an object of the Order type:

Dim order As New Order

To create the order, the code sets the properties of this object. To specify the customer, the code assigns a Customer object to the Customer property of the Order variable. The class generated by the wizard knows that it has to submit just the CustomerID field to the database.

Order.Order_Details is a collection of Order_Detail objects, one for each product in the order. The application creates and initializes the order's detail lines, one at a time, and adds them to the Order.Order_Details collection. When done, it submits the order to the database by calling the SubmitChanges method. LINQ to SQL knows how to submit all the items in the new order to the database in the proper order. Not only that, but it also retrieves the order's ID (a value that's generated by the database) and updates the order variable at the client. The code that displays the order on an auxiliary form has access to the order's ID without making another trip to the database.

LINQ to SQL is the most important component of LINQ, because it encapsulates the complexity of a database and allows us to work with the database tables as if they were collections of custom types. It bridges the gap between the object-oriented world of Visual Basic and the realm of relational databases. There's another similar component, LINQ to Entities, which is discussed in detail in Chapter 17, "Using the Entity Data Model." LINQ to Entities takes the same principles one step further by allowing you to create your own objects and map them to database tables. LINQ to Entities takes LINQ to SQL one step further in the direction of programming databases with the object-oriented features of modern languages like VB.

The Bottom Line

Perform simple LINQ queries.

A LINQ query starts with the structure From variable Incollection, where variable is a variable name and collection is any collection that implements the IEnumerable interface (such as an array, a typed collection, or any method that returns a collection of items). The second mandatory part of the query is the Select part, which determines the properties of the variable you want in the output. Quite often you select the same variable that you specify in the From keyword. In most cases, you apply a filtering expression with the Where keyword. Here's a typical LINQ query that selects filenames from a specific folder:

Dim files =
        From file In
          IO.Directory.GetFiles("C:Documents")
          Where file.EndsWith("doc")
        Select file
Master It

Write a LINQ query that calculates the sum of the squares of the values in an array.

Create and process XML files with LINQ to XML.

LINQ to XML allows you to create XML documents with the XElement and XAttribute classes. You simply create a new XElement object for each element in your document and create a new XAttribute object for each attribute in the current element. Alternatively, you can simply insert XML code in your VB code. To create an XML document dynamically, you can insert embedded expressions that will be evaluated by the compiler and replaced with their results.

Master It

How would you create an HTML document with the filenames in a specific folder?

Process relational data with LINQ to SQL.

LINQ to SQL allows you to query relational data from a database. To access the database, you must first create a DataContext object. Then you can call this object's GetTable method to retrieve a table's rows or the ExecuteQuery method to retrieve selected rows from one or more tables with a SQL query. The result is stored in a class designed specifically for the data you're retrieving via the DataContext object.

Master It

Explain the attributes you must use in designing a class for storing a table.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.51.153