Chapter 15. Programming with ADO.NET

With this chapter, we start exploring applications that manipulate large sets of data stored in a database. After a quick introduction to databases, you'll learn about the basic mechanisms of interacting with databases. As you will see, it's fairly straightforward to write a few VB statements to execute SQL queries against the database in order to either edit or retrieve selected rows. The real challenge is the design and implementation of functional interfaces that display the data requested by the user, allow the user to navigate through the data and edit it, and finally submit the changes to the database. You'll learn how to execute queries against the database, retrieve data, and submit modified or new data to the database.

In this chapter, you'll learn how to do the following:

  • Store data in relational databases

  • Query databases with SQL

  • Submit queries to the database using ADO.NET

What Is a Database?

A database is a container for storing relational, structured information. The same is true for a file or even for the file system on your hard disk. What makes a database unique is that it is designed to preserve relationships and make data easily retrievable. The purpose of a database is not so much the storage of information as its quick retrieval. In other words, you must structure your database so that it can be queried quickly and efficiently. It's fairly easy to create a database for storing products and invoices and add new invoices every day. In addition to just storing information, you should also be able to retrieve invoices by period, retrieve invoices by customer, or retrieve invoices that include specific products. Unless the database is designed properly, you won't be able to retrieve the desired information efficiently.

Databases are maintained by special programs, such as Microsoft Office Access and SQL Server. These programs are called database management systems (DBMSs), and they're among the most complicated applications. A fundamental characteristic of a DBMS is that it isolates much of the complexity of the database from the developer. Regardless of how each DBMS stores data on disk, you see your data organized in tables with relationships between tables. To access or update the data stored in the database, you use a special language, the Structured Query Language (SQL). Unlike other areas of programming, SQL is a truly universal language, and all major DBMSs support it.

The recommended DBMS for Visual Studio 2010 is SQL Server 2008. In fact, the Visual Studio 2008 setup program offers to install a developer version of SQL Server 2008 called SQL Server 2008 Express. However, you can use Access as well as non-Microsoft databases such as Oracle. Although this chapter was written with SQL Server 2008, most of the examples will work with Access as well.

Data is stored in tables, and each table contains entities of the same type. In a database that stores information about books, there could be a table with titles, another table with authors, and a table with publishers. The table with the titles contains information such as the title of the book, the number of pages, and the book's description. Author names are stored in a different table because each author might appear in multiple titles. If author information were stored along with each title, we'd be repeating author names. So, every time we wanted to change an author's name, we'd have to modify multiple entries in the titles table. Even retrieving a list of unique author names would be a challenge because you'd have to scan the entire table with the titles, retrieve all the authors, and then get rid of the duplicate entries. Of course, you need a mechanism to associate titles with their authors, and you'll see how this is done in the following section. The same is true for publishers. Publishers are stored in a separate table, and each title contains a pointer to the appropriate row in the publishers table.

The reason for breaking the information we want to store in a database into separate tables is to avoid duplication of information. This is a key point in database design. Duplication of information will sooner or later lead to inconsistencies in the database. The process of breaking the data into related tables that eliminate all possible forms of information duplication is called normalization, and there are rules for normalizing databases. The topic of database normalization is not discussed further in this book. However, all it really takes to design a functional database is common sense. In short, you identify the entities you want to store in the database (such as customers, products, hotels, books, and the like) and store them in separate tables. You also avoid duplicating information at all costs. If you design a table for storing books along with their authors, you'll soon realize that the same author names are repeated in multiple books. Data duplication means that you have combined entities, and you need to break the original table into one with books and another with authors. Of course, you'll have to establish a relationship between the two tables so you can locate a book's author(s) or an author's books. This is done through relations between the tables; hence the term relational databases. Don't worry if you haven't worked with databases before; the following sections demonstrate the structure of a database through examples. After you learn how to extract data from your database's tables with SQL statements, you'll develop a much better understanding of the way databases should be structured.

Using Relational Databases

The databases we're interested in are called relational because they are based on relationships among the data they contain. The data is stored in tables, and tables contain related data, or entities, such as people, products, orders, and so on. Of course, entities are not independent of each other. For example, orders are placed by specific customers, so the rows of the Customers table must be linked to the rows of the Orders table that stores the orders of the customers. Figure 15.1 shows a segment of a table with customers (top) and the rows of a table with orders that correspond to one of the customers (bottom).

As you can see in Figure 15.1, relationships are implemented by inserting columns with matching values in the two related tables; the CustomerID column is repeated in both tables. The rows with a common value in the CustomerID fields are related. In other words, the lines that connect the two tables simply indicate that there are two fields, one on each side of the relationship, with a common value. The customer with the ID value ALFKI has placed the orders 10643 and 10692 (among others). To find all the orders placed by a customer, we can scan the Orders table and retrieve the rows in which the CustomerID field has the same value as the ID of the specific customer in the Customers table. Likewise, you can locate customer information for each order by looking up the row of the Customers table that has the same ID as the one in the CustomerID field of the Orders table.

Linking customers and orders with relationships

Figure 15.1. Linking customers and orders with relationships

The two fields used in a relationship are called key fields. The CustomerID field of the Customers table is the primary key because it identifies a single customer. Each customer has a unique value in the CustomerID field. The CustomerID field of the Orders table is the foreign key of the relationship. A CustomerID value appears in a single row of the Customers table and identifies that row; it's the table's primary key. However, it might appear in multiple rows of the Orders table because the CustomerID field is the foreign key in this table. In fact, it will appear in as many rows of the Orders table as there are orders for the specific customer. Note that the primary and foreign keys need not have the same names, but it's convenient to use the same name because they both represent the same entity.

The concept of relationships between tables is pretty straightforward and very easy to implement through a pair of keys. Yet, this is the foundation of relational databases.

To help you understand relational databases, I will present the structure of the two sample databases used for the examples in this and the following chapters. If you're not familiar with the Northwind and Pubs databases, read the following two sections and you'll find it easier to follow the examples.

Obtaining the Northwind and Pubs Sample Databases

SQL Server 2008 developers will wonder where the Northwind and Pubs databases have gone. Microsoft has replaced both databases with a single new database called AdventureWorks. Microsoft made the change to demonstrate new SQL Server features in an environment that more closely matches large enterprise systems. Because the AdventureWorks database is extremely complex and not very friendly for teaching database principles, this book won't rely on it. However, you might want to look at the AdventureWorks database anyway to see what it provides and understand how complex databases can become.

Many developers are used to working with the Northwind and Pubs databases with other Microsoft products. These two databases have become so standard that many authors, including myself, rely on the presence of these databases to ensure that everyone can see example code without a lot of extra effort. Unfortunately, you won't find an option for installing them as part of the standard SQL Server 2008 installation. However, you can find scripts for creating these databases in SQL Server Express online at http://www.microsoft.com/downloads/details.aspx?FamilyID=06616212-0356-46A0-8DA2-EEBC53A68034&displaylang=en. The name of the file you'll receive is SQL2000SampleDb.MSI. Even though Microsoft originally created this file for SQL Server 2000, it works just fine with SQL Server 2008.

After you download the script files, you need to install them. Right-click the file, and choose Install from the context menu. You will see a Welcome dialog box, telling you that this file contains the sample databases for SQL Server 2000. Click Next, read the licensing agreement, and agree to it. Keep following the prompts until you install the sample database scripts in the appropriate directory.

At this point, you have two scripts for creating the sample databases. If you used the default installation settings, these files appear in the Program FilesMicrosoft SQL Server 2000 Sample Database Scripts folder of your machine. The InstNwnd.SQL file will create the Northwind database, and the InstPubs.SQL file will create the Pubs database.

Double-click the name of each SQL file, and each will open in SQL Server Management Studio. Then click the Execute button in the toolbar (it's the button with the icon of an exclamation mark) to run the script, which will install the appropriate database.

To install the databases for the Express version of SQL Server 2008, open a command prompt. Type OSQL -E -i InstNwnd.SQL, and press Enter. The OSQL utility will create the Northwind database for you (this process can take quite some time). After the Northwind database is complete, type OSQL -E -i InstPubs.SQL, and press Enter. The process will repeat itself.

If you try to run the OSQL utility and receive an error message at the command prompt, the SQL Server 2008 installation didn't modify the path information for your system as it should have. In some cases, this makes your installation suspect, and you should reinstall the product if you experience other problems. To use the installation scripts, copy them from the installation folder to the Program FilesMicrosoft SQL Server90Toolsinn folder. You can run the OSQL utility at the command prompt from this folder to create the two sample databases.

You'll want to test the installation to make sure it worked. Open Visual Studio, and choose View

Obtaining the Northwind and Pubs Sample Databases

In the Server Name field, type the name of your machine, or select one with the mouse. Click the down arrow in the Select Or Enter A Database Name field. You should see both the Northwind and Pubs databases, as shown in Figure 15.2. If you don't see these entries, it means that an error occurred. Try running the scripts a second time.

Use the Add Connection dialog box to check for the two databases.

Figure 15.2. Use the Add Connection dialog box to check for the two databases.

Exploring the Northwind Database

In this section, you'll explore the structure of the Northwind sample database. The Northwind database stores products, customers, and sales data, and many of you are already familiar with the structure of the database.

To view a table's contents, expand the Table section of the tree under the Northwind connection in Server Explorer, and locate the name of the table you want to examine. Right-click the name, and choose Show Table Data from the context menu. This will open the table, and you can view and edit its rows. If you choose the Open Table Definition command from the same menu, you will see the definitions of the table's columns. You can change the type of the columns (each column stores items of the same type), change their length, and set a few more properties that are discussed a little later in this chapter. To follow the description of the sample databases, open the tables in view mode.

If you have installed SQL Server 2008, you can use SQL Server Management Studio to explore the same database. Just right-click the Northwind database, and from the context menu select Open Table to view the data or select Design to change the table definition.

Products Table

The Products table stores information about the products of the fictional Northwind corporation. This information includes the product name, packaging information, price, and other relevant fields. Each product (or row) in the table is identified by a unique numeric ID. Because each ID is unique, the ProductID column is the table's primary key. This column is an Identity column: It's a numeric value, which is generated automatically by the database every time you insert a new row to the table. The rows of the Products table are referenced by invoices (the Order Details table, which is discussed later), so the product IDs appear in the Order Details table as well. The ProductID column, as well as most primary keys in any database, has a unique property: It's an Identity column. Every time you add a new product to the table, SQL Server assigns the next available value to this column. If the ID of the last row in the Products table is 72, the first product that will be added will take the primary key value of 73 automatically. SQL Server will always assign the proper value to this column, and it will always be unique.

Suppliers Table

Each product has a supplier, too. Because the same supplier can offer more than one product, the supplier information is stored in a different table, and a common field, the SupplierID field, is used to link each product to its supplier (as shown in Figure 15.3). For example, the products Chai, Chang, and Aniseed Syrup are purchased from the same supplier: Exotic Liquids. Its SupplierID fields all point to the same row in the Suppliers table.

Linking products to their suppliers and their categories

Figure 15.3. Linking products to their suppliers and their categories

Categories Table

In addition to having a supplier, each product belongs to a category. Categories are not stored along with product names; they are stored separately in the Categories table. Again, each category is identified by a numeric value (field CategoryID) and has a name (field CategoryName). In addition, the Categories table has two more columns: Description, which contains text, and Picture, which stores a bitmap. The CategoryID field in the Categories table is the primary key, and the field by the same name in the Products table is the corresponding foreign key.

Customers Table

The Customers table stores information about the company's customers. Each customer is stored in a separate row of this table, and customers are referenced by the Orders table. Unlike product IDs, customer IDs are five-character strings and are stored in the CustomerID column. This is an unusual choice for IDs, which are usually numeric values. The CustomerID column isn't an Identity column; the user must determine the key of each customer and submit it along with the other customer data. The database has a unique constraint for this column: The customer's ID must be unique, and the database won't accept any rows with a duplicate CustomerID value.

Orders Table

The Orders table stores information about the orders placed by Northwind's customers. The OrderID field, which is an integer value, identifies each order. Orders are numbered sequentially, so this field is also the order's number. Each time you append a new row to the Orders table, the value of the new OrderID field is generated automatically by the database. Not only is the OrderID column the table's primary key, but it's also an Identity column.

The Orders table is linked to the Customers table through the CustomerID column. By matching rows that have identical values in their CustomerID fields in the two tables, we can recombine customers with their orders. Refer back to Figure 15.1 to see how customers are linked to their orders.

Order Details Table

The Orders table doesn't store any details about the items ordered; this information is stored in the Order Details table (see Figure 15.4). Each order consists of one or more items, and each item has a price, a quantity, and a discount. In addition to these fields, the Order Details table contains an OrderID column, which holds the ID of the order to which the detail line belongs.

The reason why details aren't stored along with the order's header is that the Orders and Order Details tables store different entities. The order's header, which contains information about the customer who placed the order, the date of the order, and so on, is quite different from the information you must store for each item ordered. If you attempt to store the entire order into a single table, you'll end up repeating a lot of information. Notice also that the Order Details table stores the IDs of the products, not the product names.

Employees Table

This table holds employee information. Each employee is identified by a numeric ID, which appears in each order. When a sale is made, the ID of the employee who made the sale is recorded in the Orders table. An interesting technique was used in the design of the Employees table: Each employee has a manager, which is another employee. The employee's manager is identified by the ReportsTo field, which is set to the ID of the employee's manager. The rows of the Employees table contain references to the same table. This table contains a foreign key that points to the primary key of the same table, a relation that allows you to identify the hierarchy of employees in the corporation.

Customers, Orders, and Order Details tables and their relations

Figure 15.4. Customers, Orders, and Order Details tables and their relations

Shippers Table

Each order is shipped with one of the three shippers stored in the Shippers table. The appropriate shipper's ID is stored in the Orders table.

Exploring the Pubs Database

Before looking at SQL and more practical techniques for manipulating tables, let's look at the structure of another sample database I'm going to use in this chapter, the Pubs database. Pubs is a database for storing book, author, and publisher information, not unlike a database you might build for an online bookstore.

The Pubs database consists of really small tables, but it was carefully designed to demonstrate many of the features of SQL, so it's a prime candidate for sample code. Just about any book about SQL Server uses the Pubs database. In the examples of the following sections, I will use the Northwind database because it's closer to a typical business database, and the type of information stored in the Northwind database is closer to the needs of the average VB programmer than the Pubs database. Some of the fine points of SQL, however, can't be demonstrated with the data of the Northwind database, so in this section I'll show examples that use the ubiquitous Pubs database.

Titles Table

The Titles table contains information about individual books (the book's title, ID, price, and so on). Each title is identified by an ID, which is not a numeric value, that's stored in the title_id column. The IDs of the books look like this: BU2075.

Authors Table

The Authors table contains information about authors. Each author is identified by an ID, which is stored in the au_id field. This field is a string with a value such as 172-32-1176 (they resemble U.S. Social Security numbers).

TitleAuthor Table

The Titles and Authors tables are not directly related because they can't be joined via a one-to-many relationship; the relationship between the two tables is many-to-many. The relations you have seen so far are one-to-many because they relate one row in the table that has the primary key to one or more rows in the table that has the foreign key: One order contains many detail lines, one customer has many orders, one category contains many products, and so on.

The relation between titles and authors is many-to-many because each book may have multiple authors, and each author may have written multiple titles. If you stop to think about the relationship between these two tables, you'll realize that it can't be implemented with a primary key and a foreign key (like the Order-Customer relationship or the Order-Shipper relationship in the Northwind database). To establish a many-to-many relationship, you must create a join table between the other two, and this table must have a one-to-many relationship with both tables.

Figure 15.5 shows how the Titles and Authors tables of the Pubs database are related to one another. The table between them holds pairs of title IDs and author IDs. If a book was written by two authors, the TitleAuthor table contains two entries with the same title ID and different author IDs. The book with a title_id of BU1111 was written by two authors. The IDs of the authors appear in the TitleAuthor table along with the ID of the book. The IDs of these two authors are 267-41-2394 and 724-80-9391. Likewise, if an author has written more than one book, the author's ID will appear many times in the TitleAuthor table — each time paired with a different title ID.

The TitleAuthor table links titles to authors.

Figure 15.5. The TitleAuthor table links titles to authors.

At times you won't be able to establish the desired relationship directly between two tables because the relationship is many-to-many. When you discover a conflict between two tables, you must create a join table between them. A many-to-many relation is actually implemented as two one-to-many relations.

Publishers Table

The Publishers table contains information about publishers. Each title has a pub_id field, which points to the matching row of the Publishers table. Unlike the other major tables of the Pubs database, the Publishers table uses a numeric value to identify each publisher.

Understanding Relations

In a database, each table has a field with a unique value for every row. As indicated earlier in this chapter, this field is the table's primary key. The primary key does not have to be a meaningful entity because in most cases there's no single field that's unique for each row. Books can be identified by their ISBNs and employees by their SSNs, but these are exceptions to the rule. In general, you can't come up with a meaningful key that's universally unique. The primary key need not resemble the entity it identifies. The only requirement is that primary keys be unique in the entire table. In most designs, we use an integer as the primary key. To make sure they're unique, we even let the DBMS generate a new integer for each row added to the table. Each table can have one primary key only, and the DBMS can automatically generate an integer value for a primary key field every time a new row is added. SQL Server uses the term Identity for this data type, and there can be only one Identity field in each table.

Database designers sometimes use unique global identifiers, which are lengthy strings like 281167b3-7cbc-41a0-ad89-8c0995ac2d07 to identify the rows. These strings, which are guaranteed to be globally unique, are generated automatically by both Visual Basic and SQL Server. To generate a GUID value with Visual Basic, call the Guid.NewGuid method. GUIDs are not as easy to handle as integers, and they entail some performance penalty. If you need a unique identity across multiple databases, you have to use GUIDs.

The related rows in a table repeat the primary key of the row they are related to in another table. The copies of the primary keys in all other tables are called foreign keys. Foreign keys need not be unique (in fact, by definition they aren't), and any field can serve as a foreign key. What makes a field a foreign key is that it matches the primary key of another table. The CategoryID field is the primary key of the Categories table because it identifies each category. The CategoryID field in the Products table is the foreign key because the same value might appear in many rows (many products can belong to the same category). Whereas the primary key refers to a table, the foreign key refers to a relationship. The CategoryID column of the Products table is the foreign key in the relationship between the Categories and Products tables. The Products table contains another foreign key, the SupplierID column, which forms the relationship between the Suppliers and Products tables.

Referential Integrity

Maintaining the links between tables is not a trivial task. When you add an invoice line, for instance, you must make sure that the product ID that you insert in the Order Details table corresponds to a row in the Products table. An important aspect of a database is its integrity. To be specific, you must ensure that the relations are always valid, and this type of integrity is called referential integrity. There are other types of integrity (for example, setting a product's value to a negative value will compromise the integrity of the database), but this is not nearly as important as referential integrity. The wrong price can be easily fixed. But issuing an invoice to a customer who doesn't exist isn't easy (if even possible) to fix. Modern databases come with many tools to help ensure their integrity, especially referential integrity. These tools are constraints you enter when you design the database, and the DBMS makes sure that the constraints are not violated as the various programs manipulate the database.

When you relate the Products and Categories tables, for example, you must also ensure the following:

  • Every product added to the foreign table must point to a valid entry in the primary table. If you are not sure which category the product belongs to, you can leave the CategoryID field of the Products table empty (the field will have a null value). Or, you can create a generic category, the UNKNOWN or UNDECIDED category, and use this category if no information is available.

  • No rows in the Categories table should be removed if there are rows in the Products table pointing to the specific category. This situation would make the corresponding rows of the Products table point to an invalid category (the rows that have no matching row in the primary table are called orphan rows).

These two restrictions would be quite a burden on the programmer if the DBMS didn't protect the database against actions that could impair its integrity. The referential integrity of your database depends on the validity of the relations. Fortunately, all DBMSs can enforce rules to maintain their integrity, and you'll learn how to enforce rules that guarantee the integrity of your database later in this chapter. In fact, when you create the relationship, you can select a couple of check boxes that tell SQL Server to enforce the relationship (that is, not to accept any changes in the data that violate the relationship). If you leave these check boxes deselected, be ready to face a real disaster sooner or later.

Visual Database Tools

To simplify the development of database applications, Visual Studio 2010 comes with some visual tools, the most important of which are briefly described in Table 15.1 and then discussed in the following sections.

Table 15.1. Visual database tools

Name

Description

Server Explorer

This is the most prominent tool. Server Explorer is the toolbox for database applications, in the sense that it contains all the basic tools for connecting to databases and manipulating their objects.

Query Builder

This is a tool for creating SQL queries (statements that retrieve the data you want from a database or update the data in the database). SQL is a language in its own right, and we'll discuss it later in this chapter. Query Builder lets you specify the operations you want to perform on the tables of a database with point-and-click operations. In the background, Query Builder builds the appropriate SQL statement and executes it against the database.

Database Designer and Tables Designer

These tools allow you to work with an entire database or its tables. When you work with the database, you can add new tables, establish relationships between the tables, and so on. When you work with individual tables, you can manipulate the structure of the tables, edit their data, and add constraints. You can use these tools to manipulate a complicated object — the database — with point-and-click operations.

SQL: An Overview

SQL is a universal language for manipulating data in database tables. Every DBMS supports it, so you should invest the time and effort to learn it. You can generate SQL statements with point-and-click operations (Query Builder is a visual tool for generating SQL statements), but this is no substitute for understanding SQL and writing your own statements. The visual tools are nothing more than a user-friendly interface for specifying SQL statements. In the background, they generate the appropriate SQL statement, and you will get the most out of these tools if you understand the basics of SQL. I will start with an overview of SQL, and then I'll show you how to use the Query Builder utility to specify a few advanced queries. If you're familiar with SQL, you can skip this section or just glance through it and take a look at the examples.

By the way, the SQL version of SQL Server is called T-SQL, which stands for Transact-SQL. T-SQL is a superset of SQL and provides advanced programming features that are not available with SQL. I'm not going to discuss T-SQL in this book, but once you understand SQL, you'll find it easy to leverage this knowledge to T-SQL.

SQL is a nonprocedural language, which means that SQL doesn't provide traditional programming structures such as If statements or loops. Instead, it's a language for specifying the operation you want to perform against a database at a high level. The details of the implementation are left to the DBMS. SQL is an imperative language, like Language Integrated Query (LINQ), as opposed to a traditional programming language, such as VB. Traditional languages are declarative: The statements you write tell the compiler how to perform the desired actions. This is good news for nonprogrammers, but many programmers new to SQL might wish it had the structure of a more traditional language. You will get used to SQL and soon be able to combine the best of both worlds: the programming model of VB and the simplicity of SQL. Besides, there are many similarities between SQL and LINQ, and you'll be able to leverage your skills in any of the two areas.

To retrieve all the company names from the Customers table of the Northwind database, you issue a statement like this one:

SELECT CompanyName
FROM   Customers

To select customers from a specific country, you must use the WHERE clause to limit the selected rows, as in the following statement:

SELECT CompanyName
FROM   Customers
WHERE  Country = 'Germany'

The DBMS will retrieve and return the rows you requested. As you can see, this is not the way you'd retrieve rows with Visual Basic. With a procedural language such as VB, you'd have to write loops to scan the entire table, examine the value of the Country column, and either select or reject the row. Then you would display the selected rows. With SQL, you don't have to specify how the selection operation will take place; you simply specify what you want the database to do for you — not how to do it. As a reminder, the equivalent LINQ statement would be as follows (I'm using LINQ to SQL syntax):

Dim selectedCustomers = From cust In Customers
Where cust.Country = "Germany"
Select cust

SQL statements are divided into two major categories, which are actually considered separate languages: the statements for manipulating the data, which form the Data Manipulation Language (DML), and the statements for defining database objects, such as tables or their indexes, which form the Data Definition Language (DDL). The DDL is not of interest to every database developer, and I will not discuss it in this book. The DML is covered in depth because you'll use these statements to retrieve data, insert new data into the database, and edit or delete existing data.

The statements of the DML part of the SQL language are also known as queries, and there are two types of queries: selection queries and action queries. Selection queries retrieve information from the database. A selection query returns a set of rows with identical structure. The columns can come from different tables, but all the rows returned by the query have the same number of columns. Action queries modify the database's objects or create new objects and add them to the database (new tables, relationships, and so on).

Executing SQL Statements

If you are not familiar with SQL, I suggest that you follow the examples in this chapter and experiment with the sample databases. To follow the examples, you have two options: SQL Server Management Studio (SSMS) and Query Designer of Visual Studio. SSMS helps you manage databases in various ways, including creating queries to extract data. Query Designer is an editor for SQL statements that also allows you to execute them and see the results. In addition to Query Designer, you can also use Query Builder, which is part of SSMS and Visual Studio. Query Builder lets you build the statements with visual tools, and you don't have to know the syntax of SQL in order to create queries with Query Builder. After a quick overview of the SQL statements, I will describe Query Builder and show you how to use its interface to build fairly elaborate queries.

Using SQL Server Management Studio

One of the applications installed with SQL Server is SQL Server Management Studio. To start it, choose Start

Using SQL Server Management Studio
SSMS provides access to all the database engine objects, including databases.

Figure 15.6. SSMS provides access to all the database engine objects, including databases.

After you're connected, right-click the database you want to use, and choose New Query from the context menu. Enter the SQL statement you want to execute in the blank query that SSMS creates. The SQL statement will be executed against the selected database when you press Ctrl+E or click the Execute button (it's the button with the exclamation point icon). Alternatively, you can prefix the SQL statement with the USE statement, which specifies the database against which the statement will be executed. To retrieve all the Northwind customers located in Germany, enter this statement:

USE Northwind
SELECT CompanyName FROM Customers
WHERE  Country = 'Germany'

The USE statement isn't part of the query; it simply tells SSMS the database against which it must execute the query. I'm including the USE statement with all the queries so you know the database used for each example. If you're executing the sample code from within Visual Studio, you need not use the USE statement, because all queries are executed against the selected database. Actually, the statement isn't supported by Query Designer of Visual Studio.

The results of the query, known as the result set, will appear in a grid in the lower pane. An action query that updates a table (adds a new row, edits a row, or deletes an existing row) doesn't return any rows; it simply displays the number of rows affected on the Messages tab.

To execute another query, enter another statement in the upper pane, or edit the previous statement and press Ctrl+E again. You can also save SQL statements into files so that you won't have to type them again. To do so, open the File menu, choose Save As or Save, and enter the name of the file in which the contents of the Query pane will be stored. The statement will be stored in a text file with the extension .sql.

Using Visual Studio

To execute the same queries with Visual Studio, open the Server Explorer window, and right-click the name of the database against which you want to execute the query. From the context menu, choose New Query, and a new query window will open. You will also see a dialog box prompting you to select one or more tables. For the time being, close this dialog box, because you will supply the names of the tables in the query; later in this chapter, you'll learn how to use the visual tools to build queries.

Query Designer of Visual Studio consists of four panes (Figure 15.7). The upper pane (which is the Table Diagram pane) displays the tables involved in the query, their fields, and the relationships between the tables — if any. The next pane shows the fields that will be included in the output of the query. Here you specify the output of the query, as well as the selection criteria. This pane is Query Builder, the tool that lets you design queries visually. It's discussed later in this chapter. In the next pane, the SQL pane, you see the SQL statement produced by the visual tools. If you modify the query with the visual tools, the SQL statement is updated automatically; likewise, when you edit the query, the other two panes are updated automatically to reflect the changes. The last pane, the Results pane, contains a grid with the query's output. Every time you execute the query by clicking the button with the exclamation mark in the toolbar, the bottom pane is populated with the results of the query. For the examples in this section, ignore the top two panes. Just enter the SQL statements in the SQL pane, and execute them.

Executing queries with Visual Studio

Figure 15.7. Executing queries with Visual Studio

Selection Queries

We'll start our discussion of SQL with the SELECT statement. After you learn how to express the criteria for selecting the desired rows with the SELECT statement, you can apply this information to other data-manipulation statements. The simplest form of the SELECT statement is

SELECT fields
FROM   tables

where fields and tables are comma-separated lists of the fields you want to retrieve from the database and the tables they belong to. The list of fields following the SELECT statement is referred to as the selection list. To select the contact information from all the companies in the Customers table, use this statement:

USE Northwind
SELECT CompanyName, ContactName, ContactTitle
FROM   Customers

To retrieve all the fields, use the asterisk (*). The following statement selects all the fields from the Customers table:

SELECT * FROM Customers

As soon as you execute a statement that uses the asterisk to select all columns, Query Designer will replace the asterisk with the names of all columns in the table.

Limiting the Selection with WHERE

The unconditional form of the SELECT statement used in the previous section is quite trivial. You rarely retrieve data from all rows in a table. Usually you specify criteria, such as "all companies from Germany," "all customers who have placed three or more orders in the last six months," or even more-complicated expressions. To restrict the rows returned by the query, use the WHERE clause of the SELECT statement. The most common form of the SELECT statement is the following:

SELECT fields
FROM   tables
WHERE  condition

The fields and tables arguments are the same as before, and condition is an expression that limits the rows to be selected. The syntax of the WHERE clause can get quite complicated, so we'll start with the simpler forms of the selection criteria. The condition argument can be a relational expression, such as the ones you use in VB. To select all the customers from Germany, use the following condition:

WHERE Country = 'Germany'

To select customers from multiple countries, use the OR operator to combine multiple conditions:

WHERE Country = 'Germany' OR
      Country = 'Austria'

You can also combine multiple conditions with the AND operator.

Combining Data from Multiple Tables

It is possible to retrieve data from two or more tables by using a single statement. (This is the most common type of query, actually.) When you combine multiple tables in a query, you can use the WHERE clause to specify how the rows of the two tables will be combined. Let's say you want a list of all product names, along with their categories. For this query, you must extract the product names from the Products table and the category names from the Categories table and specify that the ProductID field in the two tables must match. The statement

USE Northwind
SELECT ProductName, CategoryName
FROM   Products, Categories
WHERE  Products.CategoryID = Categories.CategoryID

retrieves the names of all products, along with their category names. Here's how this statement is executed. For each row in the Products table, the SQL engine locates the matching row in the Categories table and then appends the ProductName and CategoryName fields to the result. In other words, it creates pairs of a product and the matching category. Products that don't belong to a category are not included in the result set.

Aliasing Table Names

To avoid typing long table names, you can alias them with a shorter name and use this shorthand notation in the rest of the query. The query that retrieves titles and publishers can be written as follows:

USE pubs
SELECT T.title
FROM   titles T, publishers P
WHERE  T.pub_id = P.pub_id

The table names are aliased in the FROM clause, and the alias is used in the rest of the query. There's one situation where table name aliasing is mandatory, and this is when you want to refer to the same table twice. This is a fairly advanced topic, but I'm including a typical example to demonstrate an interesting technique, because you may run into it. Some tables contain references to themselves, and the Employees table of the Northwind database belongs to this category. Each employee reports to a manager, who's also an employee. The manager is identified by the column ReportsTo, which contains the ID of an employee's manager. Here are three rows of the Employees table that demonstrate this hierarchy:

EmployeeID

LastName

FirstName

Title

ReportsTo

2

Fuller

Andrew

Vice President, Sales

NULL

5

Buchanan

Steven

Sales Manager

2

6

Suyama

Michael

Sales Representative

5

As you can see here, Suyama reports to Buchanan, and Buchanan in turn reports to Fuller. How would you retrieve each employee's name and title along with his manager's name? Here, we must make a query that involves two tables, one with the employee names and another one with the manager names. It just so happens that the two tables are in reality the same table. The trick is to treat the Employees table as two separate tables using aliases. Here's the query that retrieves employees and managers in a single result set:

SELECT  Employees.EmployeeID,
        Employees.LastName + ' ' + Employees.FirstName,
        Employees.Title,
        Managers.FirstName + ' ' + Managers.LastName,
        Managers.Title
FROM    Employees, Employees Managers
WHERE   Employees.ReportsTo = Managers.EmployeeID

If the database contained two different tables, one for employees and another one for managers, you'd have no problem coding the query. Because we want to use the same table for both managers and employees, we use aliases to create two virtual tables and associate them with the ReportsTo and EmployeeID columns. The result is a neat list of names that shows clearly the hierarchy of the Northwind corporation's employees:

1  Davolio Nancy    Sales Representative      Andrew Fuller
3  Leverling Janet  Sales Representative      Andrew Fuller
4  Peacock Margaret Sales Representative      Andrew Fuller
5  Buchanan Steven  Sales Manager             Andrew Fuller
6  Suyama Michael   Sales Representative      Steven Buchanan
7  King Robert      Sales Representative      Steven Buchanan
8  Callahan Laura   Inside Sales Coordinator  Andrew Fuller
9  Dodsworth Anne   Sales Representative      Steven Buchanan

I skipped the manager's title in the listing because it wouldn't fit on the printed page. Note that because the Last Name and First Name column names contain spaces, they're embedded in square brackets. The same is true for table names that contain spaces.

Aliasing Column Names

By default, each column of a query is labeled after the actual field name in the output. If a table contains two fields named CustLName and CustFName, you can display them with different labels by using the AS keyword. The following SELECT statement produces two columns labeled CustLName and CustFName:

SELECT CustLName, CustFName

The query's output looks much better if you change the labels of these two columns with a statement like the following:

SELECT CustLName AS [Last Name],
       CustFName AS [First Name]

It is also possible to concatenate two fields in the SELECT list with the concatenation operator. Concatenated fields are labeled automatically as Expr1, Expr2, and so on, so you must supply your own name for the combined field. The following statement creates a single column for the customer's name and labels it Customer Name:

SELECT CustLName + ', ' + CustFName AS [Customer Name]

Skipping Duplicates with DISTINCT

The DISTINCT keyword eliminates from the cursor any duplicates retrieved by the SELECT statement. Let's say you want a list of all countries with at least one customer. If you retrieve all country names from the Customers table, you'll end up with many duplicates. To eliminate them, use the DISTINCT keyword, as shown in the following statement:

USE Northwind
SELECT DISTINCT Country
FROM   Customers

The LIKE Operator

The LIKE operator uses pattern-matching characters like the ones you use to select multiple files in DOS. The LIKE operator recognizes several pattern-matching characters (or wildcard characters) to match one or more characters, numeric digits, ranges of letters, and so on. Table 15.2 describes these characters.

Table 15.2. SQL wildcard characters

Wildcard Character

Description

%

Matches any number of characters. The pattern program% will find program, programming, programmer, and so on. The pattern %program% will locate strings that contain the words program, programming, nonprogrammer. and so on.

_

(Underscore character.) Matches any single alphabetic character. The pattern b_y will find boy and bay. but not boysenberry.

[ ]

Matches any single character within the brackets. The pattern Santa [YI]nez will find both Santa Ynez and Santa Inez.

[ ˆ ]

Matches any character not in the brackets. The pattern %q[ˆu]% will find words that contain the character q not followed by u (they are misspelled words).

[ - ]

Matches any one of a range of characters. The characters must be consecutive in the alphabet and specified in ascending order (A to Z, not Z to A). The pattern [a-c]% will find all words that begin with a, b. or c (in lowercase or uppercase).

#

Matches any single numeric character. The pattern D1## will find D100 and D139. but not D1000 or D10.

You can use the LIKE operator to retrieve all titles about Windows from the Pubs database, by using a statement like the following one:

USE pubs
SELECT titles.title
FROM   titles
WHERE  titles.title LIKE '%Windows%'

The percent signs mean that any character(s) may appear in front of or after the word Windows in the title.

To include a wildcard character itself in your search argument, enclose it in square brackets. The pattern %50[%]% will match any field that contains the string 50%.

Null Values and the ISNULL Function

A common operation for manipulating and maintaining databases is to locate null values in fields. The expressions IS NULL and IS NOT NULL find field values that are (or are not) null. To locate the rows of the Customers table that have a null value in their CompanyName column, use the following WHERE clause:

WHERE CompanyName IS NULL

You can easily locate the products without prices and edit them. The following statement locates products without prices:

USE Northwind
SELECT * FROM Products WHERE UnitPrice IS NULL

A related function, the ISNULL() function, allows you to specify the value to be returned when a specific field is null. The ISNULL() SQL function accepts two arguments: a column name and a string. The function returns the value of the specified column, unless this value is null, in which case it returns the value of the second argument. To return the string *** for customers without a company name, use the following expression:

USE Northwind
SELECT  CustomerID,
        ISNULL(CompanyName, '***') AS Company, ContactName
FROM Customers

Sorting the Rows with ORDER BY

The rows of a query are not in any particular order. To request that the rows be returned in a specific order, use the ORDER BY clause, which has this syntax:

ORDER BY col1, col2, . . .

You can specify any number of columns in the ORDER BY list. The output of the query is ordered according to the values of the first column. If two rows have identical values in this column, they are sorted according to the second column, and so on. The following statement displays the customers ordered by country and then by city within each country:

USE      Northwind
SELECT   CompanyName, ContactName, Country, City
FROM     Customers
ORDER BY Country, City

Working with Calculated Fields

In addition to column names, you can specify calculated columns in the SELECT statement. The Order Details table contains a row for each invoice line. Invoice 10248, for instance, contains four lines (four items sold), and each detail line appears in a separate row in the Order Details table. Each row holds the number of items sold, the item's price, and the corresponding discount. To display the line's subtotal, you must multiply the quantity by the price minus the discount, as shown in the following statement:

USE Northwind
SELECT Orders.OrderID, [Order Details].ProductID,
       [Order Details].[Order Details].UnitPrice *
       [Order Details].Quantity *
       (1 - [Order Details].Discount) AS SubTotal
FROM   Orders INNER JOIN [Order Details]
  ON   Orders.OrderID = [Order Details].OrderID

Here the selection list contains an expression based on several fields of the Order Details table. This statement calculates the subtotal for each line in the invoices issued to all Northwind customers and displays them along with the order number. The order numbers are repeated as many times as there are products in the order (or lines in the invoice). In the following section, you will find out how to calculate totals too.

Calculating Aggregates

SQL supports some aggregate functions, which act on selected fields of all the rows returned by the query. The basic aggregate functions listed in Table 15.3 perform basic calculations such as summing, counting, and averaging numeric values. There are a few more aggregate functions for calculating statistics such as the variance and standard deviation, but I have omitted them from Table 15.3. Aggregate functions accept field names (or calculated fields) as arguments and return a single value, which is the sum (or average) of all values.

These functions operate on a single column (which could be a calculated column) and return a single value. The rows involved in the calculations are specified with the proper WHERE clause. The SUM() and AVG() functions can process only numeric values. The other three functions can process both numeric and text values.

The aggregate functions are used to summarize data from one or more tables. Let's say you want to know the number of Northwind database customers located in Germany. The following SQL statement returns the desired value:

USE Northwind
SELECT COUNT(CustomerID)
FROM   Customers
WHERE  Country = 'Germany'

Table 15.3. SQL's common aggregate functions

Function

Returns

COUNT()

The number (count) of values in a specified column

SUM()

The sum of values in a specified column

AVG()

The average of the values in a specified column

MIN()

The smallest value in a specified column

MAX()

The largest value in a specified column

The aggregate functions ignore the null values unless you specify the * argument. The following statement returns the count of all rows in the Customers table, even if some of them have a null value in the Country column:

USE Northwind
SELECT COUNT(*)
FROM   Customers

The SUM() function is used to total the values of a specific field in the specified rows. To find out how many units of the product with ID = 11 (queso Cabrales) have been sold, use the following statement:

USE Northwind
SELECT SUM(Quantity)
FROM   [Order Details]
WHERE  ProductID = 11

The SQL statement that returns the total revenue generated by a single product is a bit more complicated. To calculate it, you must multiply the quantities by their prices and then add the resulting products together, taking into consideration each invoice's discount:

USE Northwind
SELECT SUM(Quantity * UnitPrice * (1 - Discount))
FROM   [Order Details]
WHERE  ProductID = 11

Product QuesoCabrales generated a total revenue of $12,901.77. If you want to know the number of items of this product that were sold, add one more aggregate function to the query to sum the quantities of each row that refers to the specific product ID:

USE Northwind
SELECT SUM(Quantity),
       SUM(Quantity * UnitPrice * (1 - Discount))
FROM   [Order Details]
WHERE  ProductID = 11

If you add the ProductID column in the selection list and delete the WHERE clause to retrieve the totals for all products, the query will generate an error message to the effect that the columns haven't been grouped. You will learn how to group the results a little later in this chapter.

Using SQL Joins

Joins specify how you connect multiple tables in a query. There are four types of joins:

  • Left outer, or left, join

  • Right outer, or right, join

  • Full outer, or full, join

  • Inner join

A join operation combines all the rows of one table with the rows of another table. Joins are usually followed by a condition that determines which records on either side of the join appear in the result. The WHERE clause of the SELECT statement is similar to a join, but there are some fine points that will be explained momentarily.

The left, right, and full joins are sometimes called outer joins to differentiate them from an inner join. Left join and left outer join mean the same thing, as do right join and right outer join.

Left Joins

The left join displays all the records in the left table and only those records of the table on the right that match certain user-supplied criteria. This join has the following syntax:

FROM (primary table) LEFT JOIN (secondary table) ON
(primary table).(field) = (secondary table).(field)

The left outer join retrieves all rows in the primary table and the matching rows from a secondary table. The following statement retrieves all the titles from the Pubs database along with their publisher. If some titles have no publisher, they will be included in the result:

USE pubs
SELECT  title, pub_name
FROM    titles LEFT JOIN publishers
           ON titles.pub_id = publishers.pub_id

Right Joins

The right join is similar to the left outer join, except that it selects all rows in the table on the right and only the matching rows from the left table. This join has the following syntax:

FROM (secondary table) RIGHT JOIN (primary table)
ON (secondary table).(field) = (primary table).(field)

The following statement retrieves all the publishers from the Pubs database along with their titles. If a publisher has no titles, the publisher name will be included in the result set. Notice that this statement is almost the same as the example of the left outer join entry. I changed only LEFT to RIGHT:

USE pubs
SELECT  title, pub_name
FROM    titles RIGHT JOIN publishers
           ON titles.pub_id = publishers.pub_id

Full Joins

The full join returns all the rows of the two tables, regardless of whether there are matching rows. In effect, it's a combination of left and right joins. To retrieve all titles and all publishers and to match publishers to their titles, use the following join:

USE pubs
SELECT  title, pub_name
FROM    titles FULL JOIN publishers
           ON titles.pub_id = publishers.pub_id

This query will include titles without a publisher, as well as publishers without a title.

Inner Joins

The inner join returns the matching rows of both tables, similar to the WHERE clause, and has the following syntax:

FROM (primary table) INNER JOIN (secondary table)
ON (primary table).(field) = (secondary table).(field)

The following SQL statement combines records from the Titles and Publishers tables of the Pubs database if their pub_id fields match. It returns all the titles and their publishers. Titles without publishers, or publishers without titles, will not be included in the result.

USE pubs
SELECT titles.title, publishers.pub_name FROM titles, publishers
WHERE  titles.pub_id = publishers.pub_id

You can retrieve the same rows by using an inner join, as follows:

USE pubs
SELECT titles.title, publishers.pub_name
FROM   titles INNER JOIN publishers ON titles.pub_id = publishers.pub_id

Grouping Rows

Sometimes you need to group the results of a query so that you can calculate subtotals. Let's say you need not only the total revenues generated by a single product but also a list of all products and the revenues they generated. The example in the earlier section "Working with Calculated Fields" calculates the total revenue generated by a single product. It is possible to use the SUM() function to break the calculations at each new product ID, as demonstrated in the following statement. To do so, you must group the product IDs together with the GROUP BY clause:

USE Northwind
SELECT   ProductID,
         SUM(Quantity * UnitPrice *(1 - Discount)) AS [Total Revenues]
FROM     [Order Details]
GROUP BY ProductID
ORDER BY ProductID

The preceding statement produces the following output:

ProductID   Total Revenues
1           12788.10
2           16355.96
3            3044.0
4            8567.89
5            5347.20
6            7137.0
7           22044.29

The aggregate functions work in tandem with the GROUP BY clause (when there is one) to produce subtotals. The GROUP BY clause groups all the rows with the same values in the specified column and forces the aggregate functions to act on each group separately. SQL Server sorts the rows according to the column specified in the GROUP BY clause and starts calculating the aggregate functions. Every time it runs into a new group, it generates a new row and resets the aggregate function(s).

If you use the GROUP BY clause in a SQL statement, you must be aware of the following rule:

All the fields included in the SELECT list must be either part of an aggregate function or part of the GROUP BY clause.

Let's say you want to change the previous statement to display the names of the products rather than their IDs. The following statement does just that. Notice that the ProductName field doesn't appear as an argument to an aggregate function, so it must be part of the GROUP BY clause:

USE Northwind
SELECT   ProductName,
         SUM(Quantity * [Order Details].UnitPrice * (1 - Discount))
            AS [Total Revenues]
FROM     [Order Details], Products
WHERE    Products.ProductID = [Order Details].ProductID
GROUP BY ProductName
ORDER BY ProductName

These are the first few lines of the output produced by this statement:

ProductName            Total Revenues
Alice Mutton            32698.38
Aniseed Syrup           3044.0
Boston Crab Meat        17910.63
Camembert Pierrot       46927.48
Carnarvon Tigers        29171.87

If you omit the GROUP BY clause, the query will generate an error message indicating that the ProductName column in the selection list is not involved in an aggregate or a GROUP BY clause.

You can also combine multiple aggregate functions in the selection list. The following statement calculates the total number of items sold for each product, along with the revenue generated and the number of invoices that contain the specific product:

USE Northwind
SELECT   ProductID AS Product,
         COUNT(ProductID) AS Invoices,
         SUM(Quantity) AS [Units Sold],
         SUM(Quantity * UnitPrice *(1 - Discount)) AS Revenue
FROM     [Order Details]
GROUP BY ProductID
ORDER BY ProductID

Here are the first few lines returned by the preceding query:

Product   Invoices    Units Sold   Revenue
1           38           828       12788.1000595092
2           44          1057       16355.9600448608

You should try to revise the preceding statement so that it displays product names instead of IDs, by adding another join to the query as explained already.

Limiting Groups with HAVING

The HAVING clause limits the groups that will appear at the cursor. In a way, it is similar to the WHERE clause, but the HAVING clause is used with aggregate functions and the GROUP BY clause, and the expression used with the HAVING clause usually involves one or more aggregates. The following statement returns the IDs of the products whose sales exceed 1,000 units:

USE NORTHWIND
SELECT   ProductID, SUM(Quantity)
FROM     [Order Details]
GROUP BY ProductID
HAVING   SUM(Quantity) > 1000

You can't use the WHERE clause here, because no aggregates may appear in the WHERE clause. However, you can use the WHERE clause as usual to limit the number of rows involved in the query (for example, limit the aggregate to the products of a specific category, or the products sold to customers in Germany). To see product names instead of IDs, join the Order Details table to the Products table by matching their ProductID columns. Note that the expression in the HAVING clause need not be included in the selection list. You can change the previous statement to retrieve the total quantities sold with a discount of 10 percent or more with the following HAVING clause:

HAVING Discount >= 0.1

However, the Discount column must be included in the GROUP BY clause as well, because it's not part of an aggregate.

Action Queries

In addition to the selection queries we examined so far, you can also execute queries that alter the data in the database's tables. These queries are called action queries, and they're quite simple compared with the selection queries. There are three types of actions you can perform against a database: insertions of new rows, deletions of existing rows, and updates (edits) of existing rows. For each type of action, there's a SQL statement, appropriately named INSERT, DELETE, and UPDATE. Their syntax is very simple, and the only complication is how you specify the affected rows (for deletions and updates). As you can guess, the rows to be affected are specified with a WHERE clause, followed by the criteria discussed with selection queries.

The first difference between action and selection queries is that action queries don't return any rows. They return the number of rows affected, but you can disable this feature by calling the following statement:

SET NOCOUNT ON

This statement can be used when working with a SQL Server database. Let's look at the syntax of the three action SQL statements, starting with the simplest: the DELETE statement.

Deleting Rows

The DELETE statement deletes one or more rows from a table; its syntax is as follows:

DELETE table_name WHERE criteria

The WHERE clause specifies the criteria that the rows must meet in order to be deleted. The criteria expression is no different from the criteria you specify in the WHERE clause of the selection query. To delete the orders placed before 1998, use a statement like this one:

USE Northwind
DELETE Orders
WHERE  OrderDate < '1/1/1998'

Of course, the specified rows will be deleted only if the Orders table allows cascade deletions or if the rows to be deleted are not linked to related rows. If you attempt to execute the preceding query, you'll get an error with the following description:

The DELETE statement conflicted with the REFERENCE
constraint "FK_Order_Details_Orders". The conflict
occurred in database "Northwind",
table "dbo.Order Details", column 'OrderID'.

This error message tells you that you can't delete rows from the Orders table that are referenced by rows in the Order Details table. If you were allowed to delete rows from the Orders table, some rows in the related table would remain orphaned (they would refer to an order that doesn't exist). To delete rows from the Orders table, you must first delete the related rows from the Order Details table and then delete the same rows from the Orders table. Here are the statements that will delete orders placed before 1998. (Do not execute this query unless you're willing to reinstall the Northwind database; there's no undo feature when executing SQL statements against a database.)

USE Northwind
DELETE [Order Details]
WHERE  (OrderID IN
        (  SELECT OrderID
                  FROM   Orders
                  WHERE  (OrderDate < '1/1/1998')))

DELETE Orders WHERE OrderDate < '1/1/1998'

As you can see, the operation takes two action queries: one to delete rows from the Order Details table and another to delete the corresponding rows from the Orders table.

The DELETE statement returns the number of rows deleted. You can retrieve a table with the deleted rows by using the OUTPUT clause:

DELETE Customers
OUTPUT DELETED.*
WHERE  Country IS NULL

To test the OUTPUT clause, insert a few fake rows in the Customers table:

INSERT Customers (CustomerID, CompanyName)
VALUES ('AAAAA', 'Company A')
INSERT Customers (CustomerID, CompanyName)
VALUES ('BBBBB', 'Company B')

And then delete them with the following statement:

DELETE Customers
OUTPUT DELETED.*
WHERE  Country IS NULL

If you execute the preceding statements, the deleted rows will be returned as the output of the query. If you want to be safe, you can insert the deleted rows into a temporary table so you can insert them back into the database (should you delete more rows than intended). My suggestion is that you first execute a selection query that returns the rows you plan to delete, examine the output of this query, and, if you see only the rows you want to delete and no more, write a DELETE statement with the same WHERE clause. To insert the deleted rows to a temporary table, use the INSERT INTO statement, which is described in the following section.

Inserting New Rows

The INSERT statement inserts new rows in a table; its syntax is as follows:

INSERT table_name (column_names) VALUES (values)

column_names and values are comma-separated lists of columns and their respective values. Values are mapped to their columns by the order in which they appear in the two lists.

Notice that you don't have to specify values for all columns in the table, but the values list must contain as many items as there are column names in the first list. To add a new row to the Customers table, use a statement like the following:

INSERT Customers (CustomerID, CompanyName) VALUES ('FRYOG', 'Fruit & Yogurt')

This statement inserts a new row, provided that the FRYOG key isn't already in use. Only two of the new row's columns are set, and they're the columns that can't accept null values.

If you want to specify values for all the columns of the new row, you can omit the list of columns. The following statement retrieves a number of rows from the Products table and inserts them into the SelectedProducts table, which has the same structure:

INSERT INTO SelectedProducts VALUES (values)

If the values come from a table, you can replace the VALUES keyword with a SELECT statement:

INSERT INTO SelectedProducts
       SELECT * FROM Products WHERE CategoryID = 4

The INSERT INTO statement allows you to select columns from one table and insert them into another one. The second table must have the same structure as the output of the selection query. Note that you need not create the new table ahead of time; you can create a new table with the CREATE TABLE statement. The following statement creates a new table to accept the CustomerID, CompanyName, and ContactName columns of the Customers table:

DECLARE @tbl table
    (ID char(5),
     name varchar(100),
     contact varchar(100))

After the table has been created, you can populate it with the appropriate fields of the deleted rows:

DELETE Customers
OUTPUT DELETED.CustomerID,
       DELETED.CompanyName, DELETED.ContactName
INTO @tbl
WHERE Country IS NULL
SELECT * FROM @tbl

Execute these statements, and you will see in the Results pane the two rows that were inserted momentarily into the Customers table and then immediately deleted.

Editing Existing Rows

The UPDATE statement edits a row's fields; its syntax is the following:

UPDATE table_name SET field1 = value1, field2 = value2,. . .
WHERE criteria

The criteria expression is no different from the criteria you specify in the WHERE clause of selection query. To change the country from UK to United Kingdom in the Customers table, use the following statement:

UPDATE Customers SET Country='United Kingdom'
WHERE  Country = 'UK'

This statement will locate all the rows in the Customers table that meet the specified criteria (their Country field is UK) and change this field's value to United Kingdom.

Before you execute a DELETE or UPDATE statement, use a SELECT statement to see the rows that will be affected. Verify that these are the rows you intend to change and then delete or update them. Once an action query has been executed against the database, there's no way back (sorry, no Undo here).

This concludes our overview of SQL, and we're (at last) ready to explore the data access mechanisms of the Framework. In the following section, you'll learn how to submit queries to a database from within your VB application and how to retrieve the results of a query.

Stream- versus Set-Based Data Access

The component of the Framework we use to access databases is known as ADO.NET (ADO stands for Active Data Objects) and it provides two basic methods of accessing data: stream-based data access, which establishes a stream to the database and retrieves the data from the server, and set-based data access, which creates a special data structure at the client and fills it with data. This structure is the DataSet, which resembles a section of the database: It contains one or more DataTable objects, which correspond to tables and are made up of DataRow objects. These DataRow objects have the same structure as the rows in their corresponding tables. DataSets are populated by retrieving data from one or more database tables into the corresponding DataTables. As for submitting the data to the database with the stream-based approach, you must create the appropriate INSERT/UPDATE/DELETE statements and then execute them against the database.

The stream-based approach relies on the DataReader object, which makes the data returned by the database available to your application. The client application reads the data returned by a query through the DataReader object and must store it somehow at the client. Quite frequently, we use custom objects to store the data at the client.

The set-based approach uses the same objects as the stream-based approach behind the scenes, and it abstracts most of the grunt work required to set up a link to the database, retrieve the data, and store it in the client computer's memory. So, it makes sense to start by exploring the stream-based approach and the basic objects provided by ADO.NET for accessing databases. After you understand the nature of ADO.NET and how to use it, you'll find it easy to see the abstraction introduced by the set-based approach and how to make the most of DataSets. As you will see in the following chapter, you can create DataSets and the supporting objects with the visual tools of the IDE.

The Basic Data-Access Classes

A data-driven application should be able to connect to a database and execute queries against it. The selected data is displayed on the appropriate interface, where the user can examine it or edit it. Finally, the edited data is submitted to the database. This is the cycle of a data-driven application:

  1. Retrieve data from the database.

  2. Present data to the user.

  3. Allow the user to edit the data.

  4. Submit changes to the database.

Of course, many issues are not obvious from this outline. Designing the appropriate interface for navigating through the data (going from customers to their orders and from the selected order to its details) can be quite a task. Developing a functional interface for editing the data at the client is also a challenge, especially if several related tables are involved. We must also take into consideration that there are other users accessing the same database. What will happen if the product we're editing has been removed in the meantime by another user? Or what if a user has edited the same customer's data since our application read it? Do we overwrite the changes made by the other user, or do we reject the edits of the user who submits the edits last? I'll address these issues in this chapter and in Chapter 18, but we need to start with the basics: the classes for accessing the database.

To connect to a database, you must create a Connection object, initialize it, and then call its Open method to establish a connection to the database. The Connection object is the channel between your application and the database; every command you want to execute against the same database must use this Connection object. When you're finished, you must close the connection by calling the Connection object's Close method. Because ADO.NET maintains a pool of Connection objects that are reused as needed, it's imperative that you keep connections open for the shortest possible time.

The object that will actually execute the command against the database is the Command object, which you must configure with the statement you want to execute and associate with a Connection object. To execute the statement, you can call one of the Command object's methods. The ExecuteReader method returns a DataReader object that allows you to read the data returned by the selection query, one row at a time. To execute a statement that updates a database table but doesn't return a set of rows, use the ExecuteNonQuery method, which executes the specified command and returns an integer, which is the number of rows affected by the statement. The following sections describe the Connection, Command, and DataReader classes in detail.

To summarize, ADO.NET provides three core classes for accessing databases: the Connection, Command, and DataReader classes. There are more data access–related classes, but they're all based on these three basic classes. After you understand how to interact with a database by using these classes, you'll find it easy to understand the additional classes, as well as the code generated by the visual data tools that come with Visual Studio.

The Connection Class

The Connection class is an abstract one, and you can't use it directly. Instead, you must use one of the classes that derive from the Connection class. Currently, there are three derived classes: SqlConnection, OracleConnection, and OleDbConnection. Likewise, the Command class is an abstract class with three derived classes: SqlCommand, OracleCommand, and OleDbCommand.

The SqlConnection and SqlCommand classes belong to the SqlClient namespace, which you must import into your project via the following statement:

Imports System.Data.SqlClient

The examples in this book use the SQL Server 2008 DBMS, and it's implied that the SqlClient namespace is imported into every project that uses SQL Server.

To connect the application to a database, the Connection object must know the name of the server on which the database resides, the name of the database itself, and the credentials that will allow it to establish a connection to the database. These credentials are either a username and password or a Windows account that has been granted rights to the database. You obviously know what type of DBMS you're going to connect to so you can select the appropriate Connection class. The most common method of initializing a Connection object in your code is the following:

Dim CN As New SqlConnection("Data Source = localhost;" &
           "Initial Catalog = Northwind; uid = user_name;" &
           "password = user_password")

localhost is a universal name for the local machine, Northwind is the name of the database, and user_name and user_password are the username and password of an account configured by the database administrator. The Northwind sample database isn't installed along with SQL Server 2008, but you can download it from MSDN and install it yourself. The process was described in the section "Obtaining the Northwind and Pubs Sample Databases" earlier in this chapter. I'm assuming that you're using the same computer both for SQL Server and to write your VB applications. If SQL Server resides on a different computer in the network, use the server computer's name (or IP address) in place of the localhost name. If SQL Server is running on another machine on the network, use a setting like the following for the Data Source key:

Data Source = \PowerServer

If the database is running on a remote machine, use the remote machine's IP address. If you're working from home, for example, you can establish a connection to your company's server with a connection string like the following:

Data Source = "213.16.178.100; Initial Catalog = BooksDB; uid = xxx; password
   = xxx"

The uid and password keys are those of an account created by the database administrator, and not a Windows account. If you want to connect to the database by using each user's Windows credentials, you should omit the uid and password keys and instead use the Integrated Security key. If your network is based on a domain controller, you should use integrated security so that users can log in to SQL Server with their Windows account. This way you won't have to store any passwords in your code or even have an auxiliary file with the application settings.

If you're using an IP address to specify the database server, you may also have to include SQL Server's port by specifying an address such as 213.16.178.100, 1433. The default port for SQL Server is 1433, and you can omit it. If the administrator has changed the default port or has hidden the server's IP address behind another IP address for security purposes, you should contact the administrator to get the server's address. If you're connecting over a local network, you shouldn't have to use an IP address. If you want to connect to the company server remotely, you will probably have to request the server's IP address and the proper credentials from the server's administrator.

The basic property of the Connection object is the ConnectionString property, which is a semicolon-separated string of key-value pairs that specifies the information needed to establish a connection to the desired database. It's basically the same information you provide in various dialog boxes when you open SQL Server Management Studio and select a database to work with. An alternate method of setting up a Connection object is to set its ConnectionString property:

Dim CN As New SqlConnection
CN.ConnectionString =
        "Data Source = localhost; Initial Catalog = Northwind; " &
        "Integrated Security = True"

One of the Connection class's properties is the State property, which returns the state of a connection; its value is a member of the ConnectionState enumeration: Connecting, Open, Executing, Fetching, Broken, and Closed. If you call the Close method on a Connection object that's already closed or the Open method on a Connection object that's already open, an exception will be thrown. To avoid the exception, you must examine the Connection's State property and act accordingly.

The following code segment outlines the process of opening a connection to a database:

Dim CNstring As String =
         "Data Source=localhost;Initial " &
         "Catalog=Northwind;Integrated Security=True"
CNstring = InputBox(
        "Please enter a Connection String",
        "CONNECTION STRING", CNstring)
If CNstring.Trim = "" Then Exit Sub
Dim CN As New SqlConnection(CNstring)
Try
    CN.Open()
    If CN.State = ConnectionState.Open Then
        MsgBox("Workstation " & CN.WorkstationId &
               " connected to database " & CN.Database &
               " on the " & CN.DataSource & " server")
    End If
Catch ex As Exception
    MsgBox( _
      "FAILED TO OPEN CONNECTION TO DATABASE DUE TO THE FOLLOWING ERROR" &
            vbCrLf & ex.Message)
End Try
' use the Connection object to execute statements
' against the database and then close the connection
If CN.State = ConnectionState.Open Then CN.Close()

The Command Class

The second major component of the ADO.NET model is the Command class, which allows you to execute SQL statements against the database. The two basic parameters of the Command object are a Connection object that specifies the database where the command will be executed and specifies the actual SQL command. To execute a SQL statement against a database, you must initialize a Command object and set its Connection property to the appropriate Connection object. It's the Connection object that knows how to connect to the database; the Command object simply submits a SQL statement to the database and retrieves the results.

The Command object exposes a number of methods for executing SQL statements against the database, depending on the type of statement you want to execute. The ExecuteNonQuery method executes INSERT/DELETE/UPDATE statements that do not return any rows, just an integer value, which is the number of rows affected by the query. The ExecuteScalar method returns a single value, which is usually the result of an aggregate operation, such as the count of rows meeting some criteria, the sum or average of a column over a number of rows, and so on. Finally, the ExecuteReader method is used with SELECT statements that return rows from one or more tables.

To execute an UPDATE statement, for example, you must create a new Command object and associate the appropriate SQL statement with it. One overloaded form of the constructor of the Command object allows you to specify the statement to be executed against the database, as well as a Connection object that points to the desired database as arguments:

Dim CMD As New SqlCommand(
           "UPDATE Products SET UnitPrice = UnitPrice * 1.07 " &
           "WHERE CategoryID = 3", CN)
CN.Open
Dim rows As Integer
rows = CMD.ExecuteNonQuery
If rows = 1 Then
    MsgBox("Table Products updated successfully")
Else
    MsgBox("Failed to update the Products table")
End If
If CN.State = ConnectionState.Open Then CN.Close

The ExecuteNonQuery method returns the number of rows affected by the query, and it's the same value that appears in the Output window of SQL Server Management Studio when you execute an action query. The preceding statements mark up the price of all products in the Confections category by 7 percent. You can use the same structure to execute INSERT and DELETE statements; all you have to change is the actual SQL statement in the SqlCommand object's constructor. You can also set up a Command object by setting its Connection and CommandText properties:

Command.Connection = Connection
Command.CommandText = "SELECT COUNT(*) FROM Customers"

After you're finished with the Command object, you should close the Connection object. Although you can initialize a Connection object anywhere in your code, you should call its Open method as late as possible (that is, just before executing a statement) and its Close method as early as possible (that is, as soon as you have retrieved the results of the statement you executed).

The ExecuteScalar method executes the SQL statement associated with the Command object and returns a single value, which is the first value that the SQL statement would print in the Output window of SQL Server Management Studio. The following statements read the number of rows in the Customers table of the Northwind database and store the result in the count variable:

Dim CMD As New SqlCommand(
               "SELECT COUNT(*) FROM Customers", CN)
Dim count As Integer
CN.Open
count = CMD.ExecuteScalar
If CN.State = ConnectionState.Open Then CN.Close

If you want to execute a SELECT statement that retrieves multiple rows, you must use the ExecuteReader method of the Command object, as shown here:

Dim CMD As New SqlCommand(
               "SELECT * FROM Customers", CN)
CN.Open
Dim Reader As SqlDataReader
Reader = CMD.ExecuteReader
While Reader.Read
   ' process the current row in the result set
End While
If CN.State = ConnectionState.Open Then CN.Close

You'll see shortly how to access the fields of each row returned by the ExecuteReader method through the properties of the SqlDataReader class.

Executing Stored Procedures

The command to be executed through the Command object is not always a SQL statement; it could be the name of a stored procedure, or the name of a table, in which case it retrieves all the rows of the table. You can specify the type of statement you want to execute with the CommandType property, whose value is a member of the CommandType enumeration: Text (for SQL statements), StoredProcedure (for stored procedures), and TableDirect (for a table). You don't have to specify the type of the command you want to execute, but then the Command object will have to figure it out, a process that will take a few moments, and you can avoid this unnecessary delay. The Northwind database comes with the Ten Most Expensive Products stored procedure. To execute this stored procedure, set up a Command object with the following statements:

Dim CMD As New SqlCommand
CMD.Connection = CN
CMD.CommandText = "[Ten Most Expensive Products]"
CMD.CommandType = CommandType.StoredProcedure

Finally, you can retrieve all the rows of the Customers table by setting up a Command object like the following:

Dim CMD As New SqlCommand
CMD.Connection = CN
CMD.CommandText = "Customers"
CMD.CommandType = CommandType.TableDirect

Executing Selection Queries

The most common SQL statements, the SELECT statements, retrieve a set of rows from one or more joined tables, the result set. These statements are executed with the ExecuteReader method, which returns a DataReader object — a SqlDataReader object for statements executed against SQL Server databases. The DataReader class provides the members for reading the results of the query in a forward-only manner. The connection remains open while you read the rows returned by the query, so it's imperative to read the rows and store them in a structure in the client computer's memory as soon as possible and then close the connection. The DataReader object is read-only (you can't use it to update the underlying rows), so there's no reason to keep it open for long periods. Let's execute the following SELECT statement to retrieve selected columns of the rows of the Employees table of the Northwind database:

SELECT LastName + ' ' + FirstName AS Name,
       Title, Extension, HomePhone
FROM   Employees

Here are the VB statements that set up the appropriate Command object and retrieve the SqlDataReader object with the result set:

Dim CMD As New SqlCommand
Dim CN As New SqlConnection("Data Source = localhost;Initial Catalog=Northwind;" &
                   "Integrated Security=True")
CMD.Connection = CN
CMD.CommandText =
     "SELECT LastName + ' ' + FirstName AS Name, " &
     "Title, Extension, HomePhone FROM Employees"
CN.Open()
Dim Reader As SqlDataReader
Reader = Command.ExecuteReader
Dim str As String = ""
Dim cont As Integer = 0
While Reader.Read
    str &= Convert.ToString(Reader.Item("Name")) & vbTab
    str &= Convert.ToString(Reader.Item("Title")) & vbTab
    str &= Convert.ToString(Reader.Item("Extension")) & vbTab
    str &= Convert.ToString(Reader.Item("HomePhone")) & vbTab
    str &= vbCrLf
    count += 1
End While
Debug.WriteLine(vbCrLf & vbCrLf & "Read " & count.ToString &
                " rows: " & vbCrLf & vbCrLf & str)
CN.Close()

The DataReader class provides the Read method, which advances the current pointer to the next row in the result set. To read the individual columns of the current row, you use the Item property, which allows you to specify the column by name and returns an object variable. It's your responsibility to cast the object returned by the Item property to the appropriate type. Initially, the DataReader is positioned in front of the first line in the result set, and you must call its Read method to advance to the first row. If the query returns no rows, the Read method will return False, and the While loop won't be executed at all. In the preceding sample code, the fields of each row are concatenated to form the str string, which is printed in the Immediate window; it looks something like this:

Davolio Nancy     Sales Representative    5467  (206) 555-9857
Fuller Andrew     Vice President, Sales   3457  (206) 555-9482
Leverling Janet   Sales Representative    3355  (206) 555-3412

Using Commands with Parameters

Most SQL statements and stored procedures accept parameters, and you should pass values for each parameter before executing the query. Consider a simple statement that retrieves the customers from a specific country, whose name is passed as an argument:

SELECT * FROM Customers WHERE Country = @country

The @country parameter must be set to a value, or an exception will be thrown as you attempt to execute this statement. Stored procedures also accept parameters. The Sales By Year stored procedure of the Northwind database, for example, expects two Date values and returns sales between the two dates. To accommodate the passing of parameters to a parameterized query or stored procedure, the Command object exposes the Parameters property, which is a collection of Parameter objects. To pass parameter values to a command, you must set up a Parameter object for each parameter; set its name, type, and value; and then add the Parameter object to the Parameters collection of the Command object. The following statements set up a Command object with a parameter of the varchar type with a maximum size of 15 characters:

Dim Command As New SqlCommand
Command.CommandText = "SELECT * FROM Customers WHERE Country = @country"
Command.Parameters.Add("@country", SqlDbType.VarChar, 15)
Command.Parameters("@country").Value = "Italy"

At this point, you're ready to execute the SELECT statement with the ExecuteReader method and retrieve the customers from Italy. You can also configure the Parameter object in its constructor:

Dim param As New SqlParameter(paramName, paramType, paramSize)

Here's the constructor of the @country parameter of the preceding example:

Dim param As New SqlParameter("@country", SqlDbType.VarChar, 15)
param.Value = "Italy"

CMD.Parameters.Add param

Finally, you can combine all these statements into a single one:

CMD.Parameters.Add("@country", SqlDbType.VarChar, 15).Value = "Italy"

In the last statement, I initialize the parameter as I add it to the Parameters collection and then set its value to the string Italy. Oddly, there's no overloaded form of the Add method that allows you to specify the parameter's value, but there is an AddWithValue method, which adds a new parameter and sets its value. This method accepts two arguments: a string with the parameter's name and an object with the parameter's value. The actual type of the value is determined by the type of the query or stored procedure's argument, and it's resolved at runtime. The simplest method of adding a new parameter to the CMD.Parameters collection is the following:

CMD.Parameters.Add("@country", "Italy")

After the parameter has been set up, you can call the ExecuteReader method to retrieve the customers from the country specified by the argument and then read the results through an instance of the DataReader class.

Retrieving Multiple Values from a Stored Procedure

Another property of the Parameter class is the Direction property, which determines whether the stored procedure can alter the value of the parameter. The Direction property's setting is a member of the ParameterDirection enumeration: Input, Output, InputOutput, and ReturnValue. A parameter that's set by the procedure should have its Direction property set to Output: The parameter's value is not going to be used by the procedure, but the procedure's code can set it to return information to the calling application. If the parameter is used to pass information to the procedure, as well as to pass information back to the calling application, its Direction property should be set to InputOutput.

Let's look at a stored procedure that returns the total of all orders, as well as the total number of items ordered by a specific customer. This stored procedure accepts as a parameter the ID of a customer, obviously, and it returns two values: the total of all orders placed by the specified customer and the number of items ordered. A procedure (be it a SQL Server stored procedure or a regular VB function) can't return two or more values. The only way to retrieve multiple results from a single stored procedure is to pass output parameters so that the stored procedure can set their value. To make the stored procedure a little more interesting, we'll add a return value, which will be the number of orders placed by the customer. Listing 15.1 shows the implementation of the CustomerTotals stored procedure.

Example 15.1. The CustomerTotals stored procedure

CREATE PROCEDURE CustomerTotals
@customerID varchar(5),
@customerTotal money OUTPUT,
@customerItems int OUTPUT
AS
SELECT @customerTotal = SUM(UnitPrice * Quantity * (1 - Discount))
FROM [Order Details] INNER JOIN Orders
ON [Order Details].OrderID = Orders.OrderID
WHERE Orders.CustomerID = @customerID
SELECT @customerItems = SUM(Quantity)
FROM [Order Details] INNER JOIN Orders
ON [Order Details].OrderID = Orders.OrderID
WHERE Orders.CustomerID = @customerID

DECLARE @customerOrders int
SELECT @customerOrders = COUNT(*) FROM Orders
WHERE Orders.CustomerID = @customerID

RETURN @customerOrders

To attach the CustomerTotals stored procedure to the database, create a new stored procedure, paste the preceding statements in the code window, and press F5 to execute it. Make sure the database is Northwind (not master). The stored procedure calculates three totals for the specified customer and stores them to three local variables. The @customerTotal and @customerItems variables are output parameters, which the calling application can read after executing the stored procedure. The @customerOrders variable is the procedure's return value. We can return the number of orders for the customer through the stored procedure's return value, because this variable happens to be an integer, and the return value is always an integer. In more-complex stored procedures, we'd use output parameters for all the values we want to return to the calling application, and the procedure would return a value to indicate the execution status: 0 or 1 if the procedure completed its execution successfully and a negative value to indicate the error, should the procedure fail to execute.

Before using the CustomerTotals stored procedure with our VB application, let's test it in SQL Server Management Studio. We must declare a variable for each of the output parameters: the @Total, @Items, and @Orders variables. These three variables must be passed to the stored procedure with the OUTPUT attribute, as shown here:

DECLARE @Total money
DECLARE @Items int
DECLARE @Orders int
DECLARE @custID varchar(5)
DECLARE @CustomerTotal decimal
DECLARE @CustomerItems int
SET @custID = 'BLAUS'
EXEC @orders = CustomerTotals @custID,
          @customerTotal OUTPUT, @customerItems OUTPUT
PRINT 'Customer ' + @custId + ' has placed a total of ' +
      CAST(@orders AS varchar(8)) + ' orders ' +
      ' totaling $' + CAST(ROUND(@customerTotal, 2) AS varchar(12)) +
      ' and ' + CAST(@customerItems AS varchar(4)) + ' items.'

Open a new query window in SQL Server Management Studio, and enter the preceding statements. Press F5 to execute them, and you will see the following message printed in the Output window:

Customer BLAUS has placed a total of 8 orders totaling $10355.45 and 653 items.

The customer's ID is an INPUT parameter, and we could pass it to the procedure as a literal. You can omit the declaration of the @custID variable and call the stored procedure with the following statement:

DECLARE @CustomerTotal decimal
DECLARE @CustomerItems int
EXEC @orders = _
          CustomerTotals 'BLAUS', @customerTotal OUTPUT, @customerItems OUTPUT

Now that we've tested our stored procedure and know how to call it, we'll do the same from within our sample application. To execute the CustomerTotals stored procedure, we must set up a Command object, create the appropriate Parameter objects (one Parameter object per stored procedure parameter plus another Parameter object for the stored procedure's return value), and then call the Command.ExecuteNonQuery method. Upon return, we'll read the values of the output parameters and the stored procedure's return value. Listing 15.2 shows the code that executes the stored procedure (see the SimpleQueries sample project available for download from www.sybex.com/go/masteringvb2010).

Example 15.2. Executing a stored procedure with output parameters

Private Sub bttnExecSP_Click(...) Handles bttnExecSP.Click
    Dim customerID As String = InputBox("Please enter a customer ID",
          "CustomerTotals Stored Procedure", "ALFKI")
    If customerID.Trim.Length = 0 Then Exit Sub
    Dim CMD As New SqlCommand
    CMD.Connection = CN
    CMD.CommandText = "CustomerTotals"
    CMD.CommandType = CommandType.StoredProcedure
    CMD.Parameters.Add(
          "@customerID", SqlDbType.VarChar, 5).Value = customerID
    CMD.Parameters.Add("@customerTotal", SqlDbType.Money)
    CMD.Parameters("@customerTotal").Direction = ParameterDirection.Output
    CMD.Parameters.Add("@customerItems", SqlDbType.Int)
    CMD.Parameters("@customerItems").Direction = ParameterDirection.Output
    CMD.Parameters.Add("@orders", SqlDbType.Int)
    CMD.Parameters("@orders").Direction = ParameterDirection.ReturnValue
    CN.Open()
    CMD.ExecuteNonQuery()
    CN.Close()
    Dim items As Integer
    Items = Convert.ToInt32(CMD.Parameters("@customerItems").Value
    Dim orders As Integer
    Orders = Convert.ToInt32(CMD.Parameters("@orders").Value
    Dim ordersTotal As Decimal
    ordersTotal = Convert.ToDouble(
         CMD.Parameters("@customerTotal").Value
    MsgBox("Customer BLAUS has placed " &
           orders.ToString & " orders " &
           "totaling $" & Math.Round(ordersTotal, 2).ToString("#,###.00") &
           " and " & items.ToString & " items")
End Sub

In most applications, the same Command object will be reused again and again with different parameter values, so it's common to add the parameters to a Command object's Parameters collection and assign values to them every time we want to execute the command. Let's say you've designed a form with text boxes, where users can edit the values of the various fields; here's how you'd set the values of the UPDATECMD variable's parameters:

UPDATECMD.Parameters("@CustomerID").Value = txtID.Text.Trim
UPDATECMD.Parameters("@CompanyName").Value = txtCompany.Text.Trim

After setting the values of all parameters, you can call the ExecuteNonQuery method to submit the changes to the database. To update another customer, just assign different values to the existing parameters, and call the UPDATECMD object's ExecuteNonQuery method.

Handling Special Characters

Another problem you will avoid with parameterized queries and stored procedures is that of handling single quotes, which are used to delimit literals in T-SQL. Consider the following UPDATE statement, which picks up the company name from a TextBox control and updates a single row in the Customers table:

CMD.CommandText =
       "UPDATE Customers SET CompanyName = '" &
       txtCompany.Text & "'" &
       "WHERE CustomerID = "' & txtID.Text & "'"

If the user enters a company name that contains a single quote, such as B's Beverages, the command will become the following:

UPDATE Customers SET CompanyName = 'B's Beverages' WHERE CustomerID = 'BSBEV'

If you attempt to execute this statement, SQL Server will reject it because it contains a syntax error (you should be able to figure out the error easily by now). The exact error message is as follows:

Msg 102, Level 15, State 1, Line 1
Incorrect syntax near 's'.
Msg 105, Level 15, State 1, Line 1
Unclosed quotation mark after the character string ".

The single quote is used to delimit literals, and there should be an even number of single quotes in the statement. The compiler determines that there's an unclosed quotation mark in the statement and doesn't execute it. If the same statement was written as a parameterized query, such as the following, you could pass the same company name to the statement as an argument without a hitch:

CMD.CommandText =
    "UPDATE Customers SET CompanyName = @CompanyName " &
    "WHERE CustomerID = @ID"
CMD.Parameters.Add("@CompanyName",
         SqlDbType.VarChar, 40).Value = "B's Beverages"
CMD.Parameters.Add("@ID",
         SqlDbType.Char, 5).Value = "BSBEV"
CMD.ExecuteNonQuery

The same is true for other special characters, such as the percentage symbol. It's possible to escape the special symbols; you can replace the single-quote mark with two consecutive single quotes, but the most elegant method of handling special characters, such as quotation marks, percent signs, and so on, is to use parameterized queries or stored procedures. You just assign a string to the parameter and don't have to worry about escaping any characters; the Command object will take care of all necessary substitutions.

Empty Strings versus Null Values

The values you assign to the arguments of a query or stored procedure usually come from controls on a Windows form and in most cases from TextBox controls. The following statement reads the text in the txtFax TextBox control and assigns it to the @FAX parameter of a Command object:

Command.Parameters("@FAX").Value = txtFax.Text

But what if the user has left the txtFax TextBox blank? Should we pass to the INSERT statement an empty string or a null value? If you collect the values from various controls on a form and use them as parameter values, you'll never send null values to the database. If you want to treat empty strings as null values, you must pass a null value to the appropriate parameter explicitly. Let's say that the txtFax TextBox control on the form corresponds to the @FAX parameter. You can use the IIf() statement of Visual Basic to assign the proper value to the corresponding parameter as follows:

UPDATECommand.Parameters("@FAX").Value =
     IIf(txtFax.Text.Trim.Length = 0,
             System.DBNull.Value, txtFax.Text)

This is a lengthy statement, but here's how it works: The IIf() function evaluates the specified expression. If the length of the text in the txtFax control is zero, it returns the value specified by its second argument, which is the null value. If not — in other words, if the TextBox control isn't empty — it returns the text on the control. This value is then assigned to the @FAX parameter of the UPDATECommand object.

The DataReader Class

To read the rows returned by a selection query, you must call the Command object's ExecuteReader method, which returns a DataReader object (a SqlDataReader object for queries executed against SQL Server). The DataReader is a stream to the data retrieved by the query, and it provides many methods for reading the data sent to the client by the database. The underlying Connection object remains open while you read the data off the DataReader, so you must read it as quickly as possible, store it at the client, and close the connection as soon as possible.

To read a set of rows with the DataReader, you must call its Read method, which advances the pointer to the next row in the set. Initially, the pointer is in front of the first row, so you must call the Read method before accessing the first row. Despite its name, the Read method doesn't actually fetch any data; to read individual fields, you must use the various Get methods of the DataReader object, described next (GetDecimal, GetString, and so on). After reading the fields of the current row, call the Read method again to advance to the next row. There's no method to move to a previous row, so make sure you've read all the data of the current row before moving to the next one. Table 15.4 explains the basic properties and methods of the DataReader object.

Table 15.4. Properties and methods of a DataReader object

Name

Description

HasRows

This is a Boolean property that specifies whether there's a result set to read data from. If the query selected no rows at all, the HasRows property will return False.

FieldCount

This property returns the number of columns in the current result set. Note that the DataReader object doesn't know the number of rows returned by the query. Because it reads the rows in a forward-only fashion, you must iterate through the entire result set to find out the number of rows returned by the query.

Read

This method moves the pointer in front of the next row in the result set. Use this method to read the rows of the result set, usually from within a While loop.

Get<type>

There are many versions of the Get method with different names, depending on the type of column you want to read. To read a Decimal value, use the GetDecimal method; to retrieve a string, use the GetString method; to retrieve an integer, call one of the GetInt16, GetInt32, or GetSqlInt64 methods; and so on. To specify the column you want to read, use an integer index value that represents the column's ordinal, such as Reader.GetString(2). The index of the first column in the result set is zero.

GetSql<type>

There are many versions of the GetSql method with different names, depending on the SQL type of the column you want to read. To read a Decimal value, use the GetSqlDecimal method; to retrieve a string, use the GetSqlString method; to retrieve an integer, call one of the GetSqlInt16, GetSqlInt32, or GetSqlInt64 methods; and so on. To specify the column you want to read, use an integer index value that represents the column's ordinal, such as Reader.GetSqlString(2). The index of the first column in the result set is zero.

GetValue

If you can't be sure about the type of a column, use the GetValue method, which returns a value of the Object type. This method accepts as an argument the ordinal of the column you want to read.

GetValues

This method reads all the columns of the current row and stores them into an array of objects, which is passed to the method as an argument. This method returns an integer value, which is the number of columns read from the current row.

GetName

Use this method to retrieve the name of a column, which must be specified by its order in the result set. To retrieve the name of the first column, use the expression Reader.GetName(0). The column's name in the result set is the original column name, unless the SELECT statement used an alias to return a column with a different name.

GetOrdinal

This is the counterpart of the GetName method, and it returns the ordinal of a specific column from its name. To retrieve the ordinal of the CompanyName column, use the expression Reader.GetName("CompanyName").

IsDbNull

This method returns True if the column specified by its ordinal in the current row is null. If you attempt to assign a null column to a variable, a runtime exception will be thrown, so you should use this method to determine whether a column has a value and handle the null values from within your code.

Note that you can't reset the DataReader object and reread the same result set. To go through the same rows, you must execute the query again. However, there's no guarantee that the same query executed a few moments later will return the same result set. Rows may have been added to, or removed from, the database, so your best bet is to go through the result set once and store all the data to a structure at the client computer's memory. Moreover, while you're using the DataReader, the connection to the server remains open. This means that you shouldn't process the data as you read it, unless it is a trivial form of processing, such as keeping track of sums and counts. If you need to perform some substantial processing on your data, read the data into an ArrayList or other structure in the client computer's memory, close the connection, and then access the data in the ArrayList. In the following chapter, you'll learn about the DataSet object, which was designed to maintain relational data at the client. The DataSet is a great structure for storing relational data at the client; it's almost like a small database that resides in the client computer's memory. However, the DataSet is not ideal for all situations.

Listing 15.3 shows the code that retrieves all products along with category names and supplier names and populates a ListView control. The ListView control's columns aren't specified at design time; the code adds the appropriate columns at runtime (as long as the View property has been set to Details). The code goes through the columns of the result set and adds a new column to the ListView control for each data column. Then it reads the rows returned by the query and displays them on the control. The statements in Listing 15.3 are part of the SimpleQueries sample project.

Example 15.3. Displaying product information on a ListView control

Dim Command As New SqlCommand
Command.Connection = CN
' a simple SELECT query
Command.CommandText =
   "SELECT ProductName AS Product, " &
   "CategoryName AS Category, " &
   "CompanyName AS Supplier, UnitPrice AS Price " &
   "FROM Products LEFT JOIN Categories " &
   "ON Products.CategoryID = Categories.CategoryID " &
   "LEFT JOIN Suppliers ON Products.SupplierID = Suppliers.SupplierID"
Connection.Open()
Dim count As Integer = 0
Dim Reader As SqlDataReader
Reader = Command.ExecuteReader
ListView1.Clear()
Dim i As Integer
' setup ListView control to display the headers
' of the columns read from the database
For i = 0 To Reader.FieldCount - 1
    ListView1.Columns.Add(Reader.GetName(i), 130)
Next
While Reader.Read
    Dim LI As New ListViewItem
    LI.Text = Convert.ToString(Reader.Item("Product"))
    LI.SubItems.Add(Convert.ToString(
                     Reader.Item("Category")))
    LI.SubItems.Add(Convert.ToString(
                     Reader.Item("Supplier")))
    LI.SubItems.Add(Convert.ToString(
                     Reader.Item("Price")))
    ListView1.Items.Add(LI)
    Count += 1
End While
MsgBox("Read " & count.ToString & " Product rows")
Connection.Close()

Reading Multiple Result Sets

Another interesting aspect of the DataReader object is that you can use it to read multiple result sets, such as the ones returned by multiple queries. You can execute a batch query such as the following with a single Command object:

Command.CommandText = "SELECT * FROM Customers; SELECT * FROM Employees"
Dim Reader As SqlDataReader = Command.ExecuteReader

We'll use the same DataReader object to read the rows of both tables, but we need to know when we're finished with the first result set (the customers) and start reading the second result set. The NextResult property of the DataReader does exactly that: After exhausting the first result set (by iterating through its rows with the Read method), we can request the NextResult property to find out whether the DataReader contains additional result sets. If so, we can start reading the next result set with the Read method. Here's the outline of the code for reading two result sets from the same DataReader:

While Reader.Read
   ' read the fields of the current row in the 1st result set
End While
If Reader.NextResult
    While Reader.Read
       ' read the fields of the current row in the 2nd result set
    End While
End If

VB 2010 at Work: Building a Simple Data-Driven Application

In this section I'll put together all the information presented so far in this chapter to build a data-driven application — an application that actually talks to a database, retrieves data, and displays it on a Windows form. The application is not new to you. In Chapter 5, "Basic Windows Controls," you created an application for maintaining a list of contacts, based on a ListBox control, where you stored the names of the contacts. The same ListBox control was also used as a navigational tool, because users could select a contact and view their details, as shown in Figure 15.8.

The Contacts application's interface

Figure 15.8. The Contacts application's interface

The contacts were stored in a List of Contact objects, and they were persisted to a file. Now let's revise this application so that it works with a database, namely, the Customers table of the Northwind database. The columns of the Customers table are almost identical to the fields of the Contact object, so you'll use the same data type to store the data. Instead of persisting the data to a file, you'll read your contacts from the database and submit the edited rows back to the same database. Because you'll use a central data storage, the revised application can be used by multiple users at the same time.

Start by copying the Contacts project folder to a new folder with a different name. Then edit the menu, as follows:

  1. Delete the New and Save items, because the new application doesn't maintain a copy of the contacts in memory. Instead, it retrieves the requested contact from the Customer table as requested and submits the updates to the database as soon as the user edits an existing contact or inserts a new one. The New command is meaningless with a database. You'll never have to remove all rows from a table, unless you're reinitializing a test database, in which you will probably run a script that initializes all tables.

  2. Change the Open command of the File menu to Load Customers. Instead of loading the contacts from a file, we'll load them from the Customers table of the Northwind database.

  3. Add the CustomerID field to the main form, because the Customers table uses user-supplied strings as primary keys.

Most of the code remains the same. You only need to replace the routines that perform the basic operations against the database. We no longer read the data from an external file. Instead, we must execute a query against the database and retrieve the company names along with their IDs. The Load command's code follows in Listing 15.4.

Example 15.4. Loading the Customers table

Private Sub LoadToolStripMenuItem_Click(...) Handles
            LoadToolStripMenuItem.Click
    CMD.CommandText = "SELECT * FROM Customers"
    CN.Open()
    Dim RDR As SqlDataReader
    RDR = CMD.ExecuteReader
    ListBox1.Items.Clear()
    While RDR.Read
        Dim C As New Contact
        C.CustomerID = RDR.Item("CustomerID")
        C.CompanyName = RDR.Item("CompanyName")
        C.ContactName = Convert.ToString(
                  IIf(RDR.IsDBNull(
                             RDR.GetOrdinal("ContactName")),
                             "", RDR.Item("ContactName")))
        C.Address1 = Convert.ToString(
                  IIf(RDR.IsDBNull(
                             RDR.GetOrdinal("Address")),
                             "", RDR.Item("Address")))
        ' Similar statements for the remaining fields
        ListBox1.Items.Add(C)
End While
    CN.Close()
    currentContact = 0
    ShowContact()
End Sub

The code executes a simple SELECT query against the database and then loads the ListBox control with Contact objects as before. The only difference is that now the data comes from the DataReader. You will also notice the lengthy expressions that assign the values read from the database to the TextBox controls on the form. The expression RDR.Item(field_name) reads the corresponding column's value from the current row. This row, however, may be null, and you can't assign a null value to the Text property. So, the code uses the IIf() function to determine whether the current field's value is null. If the method IsDBNull returns True, then the current field is null, and the IIf() function returns an empty string. If not, it returns the actual field value, which is the expression RDR.Item(field_name). But what about the call to the GetOrdinal method? Well, the IsDBNull method doesn't accept a column name as an argument, only an integer that is the order of the column in the result. Instead of hard-coding numeric values to our code (and make it impossible to maintain later), we can retrieve the ordinal of a column with the GetOrdinal method. The expression GetOrdinal("CustomerID") returns 0 because the CustomerID column is the first one, GetOrdinal("CompanyName") returns 1, and so on. Finally, because the IIf() function returns an object, we must cast it to a string before assigning it to the Text property.

Now, this is the type of code that can be streamlined, and there are tools that do it for you. Even better, there are tools that convert each row to a custom object (a Contact object, in our example), so you can handle the data you retrieve from the database with the object-oriented techniques you have learned in this book. The new data access technologies, including LINQ to SQL that was discussed in Chapter 14, "An Introduction to LINQ," bridge the gap between databases and object-oriented programming. As you have already noticed, there's a mismatch between objects and the way we access databases. Having to set up a DataReader, execute commands directly against the database, and then having to worry about null and converting the fields to the proper type takes a lot of code, and the entire approach just isn't elegant. You'll see better techniques for accessing databases in the following chapter, but I wanted to show you the basic data access mechanisms first (actually you already saw how to access databases with LINQ to SQL queries in the preceding chapter).This may not be so elegant, but they're the foundation on which more elaborate tools were built. Moreover, they're the fastest way to get data out of a database and back. That said, let's continue with our sample application.

Next, we must implement the procedures for adding a new contact (the SubmitContact subroutine), for updating an existing contact (the UpdateContact subroutine), and for deleting a contact (the RemoveContact subroutine). The first two subroutines accept as an argument a variable of the Contact type, form the appropriate SQL statement, and execute it against the database. The RemoveContact subroutine accepts the ID of the row to be deleted, forms the appropriate DELETE statement, and executes it likewise. You may wonder why I haven't implemented these routines as functions that return a True/False value indicating whether the operation completed successfully. The reason is that a simple indication about the success of an operation won't suffice; we need to display a more specific error message to the user, so I've decided to throw an exception with the error description, as shown in Listings 15.5, 15.6, and 15.7.

Example 15.5. Adding a new row to the Customers table

Private Sub SubmitContact(ByVal C As Contact)
    CMD.CommandText = "INSERT Customers " &
              "(CustomerID, CompanyName, ContactName, Address, " &
              " City, Region, PostalCode, Country) " &
              "VALUES (@CustomerID, @CompanyName, @ContactName, " &
              "@Address, @City, @Region, @PostalCode, @Country) "
    CMD.Parameters.Clear()
    CMD.Parameters.AddWithValue("@CustomerID", C.CustomerID)
    CMD.Parameters.AddWithValue("@CompanyName", C.CompanyName)
    CMD.Parameters.AddWithValue("@ContactName", C.ContactName)
    CMD.Parameters.AddWithValue("@Address", C.Address1)
    CMD.Parameters.AddWithValue("@City", C.City)
    CMD.Parameters.AddWithValue("@Region", C.State)
    CMD.Parameters.AddWithValue("@PostalCode", C.ZIP)
    CMD.Parameters.AddWithValue("@Country", C.ZIP)
    CN.Open()
    Try
        CMD.ExecuteNonQuery()
    Catch ex As Exception
        Throw New Exception(
                 "Failed to update contact in database. " &
                 vbCrLf & "ERROR MESSAGE: " & vbCrLf & ex.Message)
    Finally
        CN.Close()
    End Try
    CN.Close()
End Sub

Example 15.6. Updating a row in the Customers table

Private Sub UpdateContact(ByVal C As Contact)
    CMD.CommandText = "UPDATE Customers " &
                "SET CompanyName = @CompanyName, " &
                "    ContactName = @ContactName, " &
                "    Address = @Address, " &
                "    City = @City, " &
                "    Region = @Region, " &
                "    PostalCode = PostalCode" &
                "    Country = @Country " &
                "WHERE CustomerID = @CustomerID"
    CMD.Parameters.Clear()
    CMD.Parameters.AddWithValue("@CustomerID", C.CustomerID)
    CMD.Parameters.AddWithValue("@CompanyName", C.CompanyName)
    CMD.Parameters.AddWithValue("@ContactName", C.ContactName)
CMD.Parameters.AddWithValue("@Address", C.Address1)
    CMD.Parameters.AddWithValue("@City", C.City)
    CMD.Parameters.AddWithValue("@Region", C.State)
    CMD.Parameters.AddWithValue("@PostalCode", C.ZIP)
    CN.Open()
    Try
        CMD.ExecuteNonQuery()
    Catch ex As Exception
        Throw New Exception(
             "Failed to update contact in database. " & vbCrLf &
             "ERROR MESSAGE: " & vbCrLf & ex.Message)
    End Try
    CN.Close()
End Sub

Example 15.7. Removing a row from the Customers table

Private Sub RemoveContact(ByVal ContactID As String)
    CMD.CommandText = "DELETE Customers WHERE CustomerID=@contactID "
    CMD.Parameters.Clear()
    CMD.Parameters.AddWithValue("@contactID", ContactID)
    CN.Open()
    Try
        CMD.ExecuteNonQuery()
    Catch ex As Exception
        Throw New Exception(
                "Failed to delete contact in database. " & vbCrLf &
                "ERROR MESSAGE: " & vbCrLf & ex.Message)
    Finally
        CN.Close()
    End Try
End Sub

The rest of the code is basically the same. We started with an application that manipulates Contact objects in a List and converted it to an application that manipulates the rows of a database table. We changed the routines that read data and submit data to the database. The routines that read the data from the database create Contact objects that are stored to the ListBox control as with the original application. Likewise, every time the user inserts a new contact or updates an existing one, instead of modifying an item in the List control, we submit it to the database and update the underlying source (the Customers table). Finally, when a contact is removed, the application removes it directly from the Customers table. The deletion operation may fail if the customer has placed an order. As you realize, we're no longer dealing with an isolated table but with a larger system with related tables, and the DBMS maintains the integrity of our data.

This application has a serious flaw, though. What if the Customers table has thousands of customers and there are a few dozen users who may need access the database? Would it make sense to download the entire table to the client and maintain such an enormous list of customers at every client? The first improvement I'd suggest is to download only the customer names and display them on the ListBox control. Every time the user clicks a different customer's name, your code should execute a SELECT statement for the specific customer and bring the entire row to the client and display its fields.

Even so, the table may grow so large that it wouldn't make sense to display all company names in a list. In this case, you should provide some selection mechanism to force users to download the names of selected companies only. For example, you could display a list of countries and download only customers from the selected customer every time users selected another country. You could also filter by company or contact name or provide combinations of criteria. If you decide to modify the application, here are the SELECT statements that limit customers by country and company name:

SELECT CustomerID, CompanyName FROM Customers WHERE Country = @country
SELECT CustomerID, CompanyName FROM Customers WHERE CompanyName LIKE @name + '%'

(The percent symbol at the end of the string is a wildcard, indicating that the company name should start with the string stored in the @name variable, followed by any other combination of characters.)

One last hitch in this application is the following: What will happen if other users delete a contact you're editing or someone else updates the contact you're editing? Issues of concurrency are very important in designing data-driven applications, and we'll look at this in the following chapter. The Contacts project assumes that the same row won't be edited by multiple users at once. Even when this happens, the last user to save the edits overwrites the edits of the other users. It's a bit of a crude approach, but it's quite appropriate in many situations. In other situations, most notably in reservation applications, it's unacceptable.

The Bottom Line

Store data in relational databases.

Relational databases store their data in tables and are based on relationships between these tables. The data are stored in tables, and tables contain related data, or entities, such as persons, products, orders, and so on. Relationships are implemented by inserting columns with matching values in the two related tables.

Master It

How will you relate two tables with a many-to-many relationship?

Query databases with SQL.

Structured Query Language (SQL) is a universal language for manipulating tables. SQL is a nonprocedural language, which specifies the operation you want to perform against a database at a high level, unlike a traditional language such as Visual Basic, which specifies how to perform the operation. The details of the implementation are left to the DBMS. SQL consists of a small number of keywords and is optimized for selecting, inserting, updating, and deleting data.

Master It

How would you write a SELECT statement to retrieve selected data from multiple tables?

Submit queries to the database using ADO.NET.

ADO.NET is the component of the Framework that provides the mechanism to contact a database, submit queries, and retrieve the results of a query. There are two types of queries you can execute against a database: selection queries that retrieve data from the database based on certain criteria you supply and action queries that manipulate the data at the database. Action queries do not return any results, except for the number of rows that were affected by the query.

Master It

Describe the basic mechanism for submitting a selection query to the database and reading the results returned by the query.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.145.35.194