With this chapter, we start exploring applications that manipulate large sets of data stored in a database. After a quick introduction to databases, you'll learn about the basic mechanisms of interacting with databases. As you will see, it's fairly straightforward to write a few VB statements to execute SQL queries against the database in order to either edit or retrieve selected rows. The real challenge is the design and implementation of functional interfaces that display the data requested by the user, allow the user to navigate through the data and edit it, and finally submit the changes to the database. You'll learn how to execute queries against the database, retrieve data, and submit modified or new data to the database.
In this chapter, you'll learn how to do the following:
Store data in relational databases
Query databases with SQL
Submit queries to the database using ADO.NET
A database is a container for storing relational, structured information. The same is true for a file or even for the file system on your hard disk. What makes a database unique is that it is designed to preserve relationships and make data easily retrievable. The purpose of a database is not so much the storage of information as its quick retrieval. In other words, you must structure your database so that it can be queried quickly and efficiently. It's fairly easy to create a database for storing products and invoices and add new invoices every day. In addition to just storing information, you should also be able to retrieve invoices by period, retrieve invoices by customer, or retrieve invoices that include specific products. Unless the database is designed properly, you won't be able to retrieve the desired information efficiently.
Databases are maintained by special programs, such as Microsoft Office Access and SQL Server. These programs are called database management systems (DBMSs), and they're among the most complicated applications. A fundamental characteristic of a DBMS is that it isolates much of the complexity of the database from the developer. Regardless of how each DBMS stores data on disk, you see your data organized in tables with relationships between tables. To access or update the data stored in the database, you use a special language, the Structured Query Language (SQL). Unlike other areas of programming, SQL is a truly universal language, and all major DBMSs support it.
The recommended DBMS for Visual Studio 2010 is SQL Server 2008. In fact, the Visual Studio 2008 setup program offers to install a developer version of SQL Server 2008 called SQL Server 2008 Express. However, you can use Access as well as non-Microsoft databases such as Oracle. Although this chapter was written with SQL Server 2008, most of the examples will work with Access as well.
Data is stored in tables, and each table contains entities of the same type. In a database that stores information about books, there could be a table with titles, another table with authors, and a table with publishers. The table with the titles contains information such as the title of the book, the number of pages, and the book's description. Author names are stored in a different table because each author might appear in multiple titles. If author information were stored along with each title, we'd be repeating author names. So, every time we wanted to change an author's name, we'd have to modify multiple entries in the titles table. Even retrieving a list of unique author names would be a challenge because you'd have to scan the entire table with the titles, retrieve all the authors, and then get rid of the duplicate entries. Of course, you need a mechanism to associate titles with their authors, and you'll see how this is done in the following section. The same is true for publishers. Publishers are stored in a separate table, and each title contains a pointer to the appropriate row in the publishers table.
The reason for breaking the information we want to store in a database into separate tables is to avoid duplication of information. This is a key point in database design. Duplication of information will sooner or later lead to inconsistencies in the database. The process of breaking the data into related tables that eliminate all possible forms of information duplication is called normalization, and there are rules for normalizing databases. The topic of database normalization is not discussed further in this book. However, all it really takes to design a functional database is common sense. In short, you identify the entities you want to store in the database (such as customers, products, hotels, books, and the like) and store them in separate tables. You also avoid duplicating information at all costs. If you design a table for storing books along with their authors, you'll soon realize that the same author names are repeated in multiple books. Data duplication means that you have combined entities, and you need to break the original table into one with books and another with authors. Of course, you'll have to establish a relationship between the two tables so you can locate a book's author(s) or an author's books. This is done through relations between the tables; hence the term relational databases. Don't worry if you haven't worked with databases before; the following sections demonstrate the structure of a database through examples. After you learn how to extract data from your database's tables with SQL statements, you'll develop a much better understanding of the way databases should be structured.
The databases we're interested in are called relational because they are based on relationships among the data they contain. The data is stored in tables, and tables contain related data, or entities, such as people, products, orders, and so on. Of course, entities are not independent of each other. For example, orders are placed by specific customers, so the rows of the Customers table must be linked to the rows of the Orders table that stores the orders of the customers. Figure 15.1 shows a segment of a table with customers (top) and the rows of a table with orders that correspond to one of the customers (bottom).
As you can see in Figure 15.1, relationships are implemented by inserting columns with matching values in the two related tables; the CustomerID
column is repeated in both tables. The rows with a common value in the CustomerID
fields are related. In other words, the lines that connect the two tables simply indicate that there are two fields, one on each side of the relationship, with a common value. The customer with the ID value ALFKI
has placed the orders 10643 and 10692 (among others). To find all the orders placed by a customer, we can scan the Orders table and retrieve the rows in which the CustomerID
field has the same value as the ID of the specific customer in the Customers table. Likewise, you can locate customer information for each order by looking up the row of the Customers table that has the same ID as the one in the CustomerID
field of the Orders table.
The two fields used in a relationship are called key fields. The CustomerID
field of the Customers table is the primary key because it identifies a single customer. Each customer has a unique value in the CustomerID
field. The CustomerID
field of the Orders table is the foreign key of the relationship. A CustomerID
value appears in a single row of the Customers table and identifies that row; it's the table's primary key. However, it might appear in multiple rows of the Orders table because the CustomerID
field is the foreign key in this table. In fact, it will appear in as many rows of the Orders table as there are orders for the specific customer. Note that the primary and foreign keys need not have the same names, but it's convenient to use the same name because they both represent the same entity.
The concept of relationships between tables is pretty straightforward and very easy to implement through a pair of keys. Yet, this is the foundation of relational databases.
To help you understand relational databases, I will present the structure of the two sample databases used for the examples in this and the following chapters. If you're not familiar with the Northwind and Pubs databases, read the following two sections and you'll find it easier to follow the examples.
SQL Server 2008 developers will wonder where the Northwind and Pubs databases have gone. Microsoft has replaced both databases with a single new database called AdventureWorks. Microsoft made the change to demonstrate new SQL Server features in an environment that more closely matches large enterprise systems. Because the AdventureWorks database is extremely complex and not very friendly for teaching database principles, this book won't rely on it. However, you might want to look at the AdventureWorks database anyway to see what it provides and understand how complex databases can become.
Many developers are used to working with the Northwind and Pubs databases with other Microsoft products. These two databases have become so standard that many authors, including myself, rely on the presence of these databases to ensure that everyone can see example code without a lot of extra effort. Unfortunately, you won't find an option for installing them as part of the standard SQL Server 2008 installation. However, you can find scripts for creating these databases in SQL Server Express online at http://www.microsoft.com/downloads/details.aspx?FamilyID=06616212-0356-46A0-8DA2-EEBC53A68034&displaylang=en
. The name of the file you'll receive is SQL2000SampleDb.MSI
. Even though Microsoft originally created this file for SQL Server 2000, it works just fine with SQL Server 2008.
After you download the script files, you need to install them. Right-click the file, and choose Install from the context menu. You will see a Welcome dialog box, telling you that this file contains the sample databases for SQL Server 2000. Click Next, read the licensing agreement, and agree to it. Keep following the prompts until you install the sample database scripts in the appropriate directory.
At this point, you have two scripts for creating the sample databases. If you used the default installation settings, these files appear in the Program FilesMicrosoft SQL Server 2000 Sample Database Scripts
folder of your machine. The InstNwnd.SQL
file will create the Northwind database, and the InstPubs.SQL
file will create the Pubs database.
Double-click the name of each SQL file, and each will open in SQL Server Management Studio. Then click the Execute button in the toolbar (it's the button with the icon of an exclamation mark) to run the script, which will install the appropriate database.
To install the databases for the Express version of SQL Server 2008, open a command prompt. Type OSQL -E -i InstNwnd.SQL, and press Enter. The OSQL utility will create the Northwind database for you (this process can take quite some time). After the Northwind database is complete, type OSQL -E -i InstPubs.SQL, and press Enter. The process will repeat itself.
If you try to run the OSQL utility and receive an error message at the command prompt, the SQL Server 2008 installation didn't modify the path information for your system as it should have. In some cases, this makes your installation suspect, and you should reinstall the product if you experience other problems. To use the installation scripts, copy them from the installation folder to the Program FilesMicrosoft SQL Server90Toolsinn
folder. You can run the OSQL utility at the command prompt from this folder to create the two sample databases.
You'll want to test the installation to make sure it worked. Open Visual Studio, and choose View
In the Server Name field, type the name of your machine, or select one with the mouse. Click the down arrow in the Select Or Enter A Database Name field. You should see both the Northwind and Pubs databases, as shown in Figure 15.2. If you don't see these entries, it means that an error occurred. Try running the scripts a second time.
In this section, you'll explore the structure of the Northwind sample database. The Northwind database stores products, customers, and sales data, and many of you are already familiar with the structure of the database.
To view a table's contents, expand the Table section of the tree under the Northwind connection in Server Explorer, and locate the name of the table you want to examine. Right-click the name, and choose Show Table Data from the context menu. This will open the table, and you can view and edit its rows. If you choose the Open Table Definition command from the same menu, you will see the definitions of the table's columns. You can change the type of the columns (each column stores items of the same type), change their length, and set a few more properties that are discussed a little later in this chapter. To follow the description of the sample databases, open the tables in view mode.
If you have installed SQL Server 2008, you can use SQL Server Management Studio to explore the same database. Just right-click the Northwind database, and from the context menu select Open Table to view the data or select Design to change the table definition.
The Products table stores information about the products of the fictional Northwind corporation. This information includes the product name, packaging information, price, and other relevant fields. Each product (or row) in the table is identified by a unique numeric ID. Because each ID is unique, the ProductID
column is the table's primary key. This column is an Identity column: It's a numeric value, which is generated automatically by the database every time you insert a new row to the table. The rows of the Products table are referenced by invoices (the Order Details table, which is discussed later), so the product IDs appear in the Order Details table as well. The ProductID
column, as well as most primary keys in any database, has a unique property: It's an Identity column. Every time you add a new product to the table, SQL Server assigns the next available value to this column. If the ID of the last row in the Products table is 72, the first product that will be added will take the primary key value of 73 automatically. SQL Server will always assign the proper value to this column, and it will always be unique.
Each product has a supplier, too. Because the same supplier can offer more than one product, the supplier information is stored in a different table, and a common field, the SupplierID
field, is used to link each product to its supplier (as shown in Figure 15.3). For example, the products Chai, Chang, and Aniseed Syrup are purchased from the same supplier: Exotic Liquids. Its SupplierID
fields all point to the same row in the Suppliers table.
In addition to having a supplier, each product belongs to a category. Categories are not stored along with product names; they are stored separately in the Categories table. Again, each category is identified by a numeric value (field CategoryID
) and has a name (field CategoryName
). In addition, the Categories table has two more columns: Description
, which contains text, and Picture
, which stores a bitmap. The CategoryID
field in the Categories table is the primary key, and the field by the same name in the Products table is the corresponding foreign key.
The Customers table stores information about the company's customers. Each customer is stored in a separate row of this table, and customers are referenced by the Orders table. Unlike product IDs, customer IDs are five-character strings and are stored in the CustomerID
column. This is an unusual choice for IDs, which are usually numeric values. The CustomerID
column isn't an Identity column; the user must determine the key of each customer and submit it along with the other customer data. The database has a unique constraint for this column: The customer's ID must be unique, and the database won't accept any rows with a duplicate CustomerID
value.
The Orders table stores information about the orders placed by Northwind's customers. The OrderID
field, which is an integer value, identifies each order. Orders are numbered sequentially, so this field is also the order's number. Each time you append a new row to the Orders table, the value of the new OrderID
field is generated automatically by the database. Not only is the OrderID
column the table's primary key, but it's also an Identity column.
The Orders table is linked to the Customers table through the CustomerID
column. By matching rows that have identical values in their CustomerID
fields in the two tables, we can recombine customers with their orders. Refer back to Figure 15.1 to see how customers are linked to their orders.
The Orders table doesn't store any details about the items ordered; this information is stored in the Order Details table (see Figure 15.4). Each order consists of one or more items, and each item has a price, a quantity, and a discount. In addition to these fields, the Order Details table contains an OrderID
column, which holds the ID of the order to which the detail line belongs.
The reason why details aren't stored along with the order's header is that the Orders and Order Details tables store different entities. The order's header, which contains information about the customer who placed the order, the date of the order, and so on, is quite different from the information you must store for each item ordered. If you attempt to store the entire order into a single table, you'll end up repeating a lot of information. Notice also that the Order Details table stores the IDs of the products, not the product names.
This table holds employee information. Each employee is identified by a numeric ID, which appears in each order. When a sale is made, the ID of the employee who made the sale is recorded in the Orders table. An interesting technique was used in the design of the Employees table: Each employee has a manager, which is another employee. The employee's manager is identified by the ReportsTo
field, which is set to the ID of the employee's manager. The rows of the Employees table contain references to the same table. This table contains a foreign key that points to the primary key of the same table, a relation that allows you to identify the hierarchy of employees in the corporation.
Before looking at SQL and more practical techniques for manipulating tables, let's look at the structure of another sample database I'm going to use in this chapter, the Pubs database. Pubs is a database for storing book, author, and publisher information, not unlike a database you might build for an online bookstore.
The Pubs database consists of really small tables, but it was carefully designed to demonstrate many of the features of SQL, so it's a prime candidate for sample code. Just about any book about SQL Server uses the Pubs database. In the examples of the following sections, I will use the Northwind database because it's closer to a typical business database, and the type of information stored in the Northwind database is closer to the needs of the average VB programmer than the Pubs database. Some of the fine points of SQL, however, can't be demonstrated with the data of the Northwind database, so in this section I'll show examples that use the ubiquitous Pubs database.
The Titles table contains information about individual books (the book's title, ID, price, and so on). Each title is identified by an ID, which is not a numeric value, that's stored in the title_id
column. The IDs of the books look like this: BU2075.
The Authors table contains information about authors. Each author is identified by an ID, which is stored in the au_id
field. This field is a string with a value such as 172-32-1176 (they resemble U.S. Social Security numbers).
The Titles and Authors tables are not directly related because they can't be joined via a one-to-many relationship; the relationship between the two tables is many-to-many. The relations you have seen so far are one-to-many because they relate one row in the table that has the primary key to one or more rows in the table that has the foreign key: One order contains many detail lines, one customer has many orders, one category contains many products, and so on.
The relation between titles and authors is many-to-many because each book may have multiple authors, and each author may have written multiple titles. If you stop to think about the relationship between these two tables, you'll realize that it can't be implemented with a primary key and a foreign key (like the Order-Customer relationship or the Order-Shipper relationship in the Northwind database). To establish a many-to-many relationship, you must create a join table between the other two, and this table must have a one-to-many relationship with both tables.
Figure 15.5 shows how the Titles and Authors tables of the Pubs database are related to one another. The table between them holds pairs of title IDs and author IDs. If a book was written by two authors, the TitleAuthor table contains two entries with the same title ID and different author IDs. The book with a title_id
of BU1111 was written by two authors. The IDs of the authors appear in the TitleAuthor table along with the ID of the book. The IDs of these two authors are 267-41-2394 and 724-80-9391. Likewise, if an author has written more than one book, the author's ID will appear many times in the TitleAuthor table — each time paired with a different title ID.
At times you won't be able to establish the desired relationship directly between two tables because the relationship is many-to-many. When you discover a conflict between two tables, you must create a join table between them. A many-to-many relation is actually implemented as two one-to-many relations.
In a database, each table has a field with a unique value for every row. As indicated earlier in this chapter, this field is the table's primary key. The primary key does not have to be a meaningful entity because in most cases there's no single field that's unique for each row. Books can be identified by their ISBNs and employees by their SSNs, but these are exceptions to the rule. In general, you can't come up with a meaningful key that's universally unique. The primary key need not resemble the entity it identifies. The only requirement is that primary keys be unique in the entire table. In most designs, we use an integer as the primary key. To make sure they're unique, we even let the DBMS generate a new integer for each row added to the table. Each table can have one primary key only, and the DBMS can automatically generate an integer value for a primary key field every time a new row is added. SQL Server uses the term Identity
for this data type, and there can be only one Identity field in each table.
Database designers sometimes use unique global identifiers, which are lengthy strings like 281167b3-7cbc-41a0-ad89-8c0995ac2d07 to identify the rows. These strings, which are guaranteed to be globally unique, are generated automatically by both Visual Basic and SQL Server. To generate a GUID value with Visual Basic, call the Guid.NewGuid
method. GUIDs are not as easy to handle as integers, and they entail some performance penalty. If you need a unique identity across multiple databases, you have to use GUIDs.
The related rows in a table repeat the primary key of the row they are related to in another table. The copies of the primary keys in all other tables are called foreign keys. Foreign keys need not be unique (in fact, by definition they aren't), and any field can serve as a foreign key. What makes a field a foreign key is that it matches the primary key of another table. The CategoryID
field is the primary key of the Categories table because it identifies each category. The CategoryID
field in the Products table is the foreign key because the same value might appear in many rows (many products can belong to the same category). Whereas the primary key refers to a table, the foreign key refers to a relationship. The CategoryID
column of the Products table is the foreign key in the relationship between the Categories and Products tables. The Products table contains another foreign key, the SupplierID
column, which forms the relationship between the Suppliers and Products tables.
Maintaining the links between tables is not a trivial task. When you add an invoice line, for instance, you must make sure that the product ID that you insert in the Order Details table corresponds to a row in the Products table. An important aspect of a database is its integrity. To be specific, you must ensure that the relations are always valid, and this type of integrity is called referential integrity. There are other types of integrity (for example, setting a product's value to a negative value will compromise the integrity of the database), but this is not nearly as important as referential integrity. The wrong price can be easily fixed. But issuing an invoice to a customer who doesn't exist isn't easy (if even possible) to fix. Modern databases come with many tools to help ensure their integrity, especially referential integrity. These tools are constraints you enter when you design the database, and the DBMS makes sure that the constraints are not violated as the various programs manipulate the database.
When you relate the Products and Categories tables, for example, you must also ensure the following:
Every product added to the foreign table must point to a valid entry in the primary table. If you are not sure which category the product belongs to, you can leave the CategoryID
field of the Products table empty (the field will have a null value). Or, you can create a generic category, the UNKNOWN or UNDECIDED category, and use this category if no information is available.
No rows in the Categories table should be removed if there are rows in the Products table pointing to the specific category. This situation would make the corresponding rows of the Products table point to an invalid category (the rows that have no matching row in the primary table are called orphan rows).
These two restrictions would be quite a burden on the programmer if the DBMS didn't protect the database against actions that could impair its integrity. The referential integrity of your database depends on the validity of the relations. Fortunately, all DBMSs can enforce rules to maintain their integrity, and you'll learn how to enforce rules that guarantee the integrity of your database later in this chapter. In fact, when you create the relationship, you can select a couple of check boxes that tell SQL Server to enforce the relationship (that is, not to accept any changes in the data that violate the relationship). If you leave these check boxes deselected, be ready to face a real disaster sooner or later.
To simplify the development of database applications, Visual Studio 2010 comes with some visual tools, the most important of which are briefly described in Table 15.1 and then discussed in the following sections.
Table 15.1. Visual database tools
Name | Description |
---|---|
Server Explorer | This is the most prominent tool. Server Explorer is the toolbox for database applications, in the sense that it contains all the basic tools for connecting to databases and manipulating their objects. |
Query Builder | This is a tool for creating SQL queries (statements that retrieve the data you want from a database or update the data in the database). SQL is a language in its own right, and we'll discuss it later in this chapter. Query Builder lets you specify the operations you want to perform on the tables of a database with point-and-click operations. In the background, Query Builder builds the appropriate SQL statement and executes it against the database. |
Database Designer and Tables Designer | These tools allow you to work with an entire database or its tables. When you work with the database, you can add new tables, establish relationships between the tables, and so on. When you work with individual tables, you can manipulate the structure of the tables, edit their data, and add constraints. You can use these tools to manipulate a complicated object — the database — with point-and-click operations. |
SQL is a universal language for manipulating data in database tables. Every DBMS supports it, so you should invest the time and effort to learn it. You can generate SQL statements with point-and-click operations (Query Builder is a visual tool for generating SQL statements), but this is no substitute for understanding SQL and writing your own statements. The visual tools are nothing more than a user-friendly interface for specifying SQL statements. In the background, they generate the appropriate SQL statement, and you will get the most out of these tools if you understand the basics of SQL. I will start with an overview of SQL, and then I'll show you how to use the Query Builder utility to specify a few advanced queries. If you're familiar with SQL, you can skip this section or just glance through it and take a look at the examples.
By the way, the SQL version of SQL Server is called T-SQL, which stands for Transact-SQL. T-SQL is a superset of SQL and provides advanced programming features that are not available with SQL. I'm not going to discuss T-SQL in this book, but once you understand SQL, you'll find it easy to leverage this knowledge to T-SQL.
SQL is a nonprocedural language, which means that SQL doesn't provide traditional programming structures such as If
statements or loops. Instead, it's a language for specifying the operation you want to perform against a database at a high level. The details of the implementation are left to the DBMS. SQL is an imperative language, like Language Integrated Query (LINQ), as opposed to a traditional programming language, such as VB. Traditional languages are declarative: The statements you write tell the compiler how to perform the desired actions. This is good news for nonprogrammers, but many programmers new to SQL might wish it had the structure of a more traditional language. You will get used to SQL and soon be able to combine the best of both worlds: the programming model of VB and the simplicity of SQL. Besides, there are many similarities between SQL and LINQ, and you'll be able to leverage your skills in any of the two areas.
To retrieve all the company names from the Customers table of the Northwind database, you issue a statement like this one:
SELECT CompanyName FROM Customers
To select customers from a specific country, you must use the WHERE
clause to limit the selected rows, as in the following statement:
SELECT CompanyName FROM Customers WHERE Country = 'Germany'
The DBMS will retrieve and return the rows you requested. As you can see, this is not the way you'd retrieve rows with Visual Basic. With a procedural language such as VB, you'd have to write loops to scan the entire table, examine the value of the Country
column, and either select or reject the row. Then you would display the selected rows. With SQL, you don't have to specify how the selection operation will take place; you simply specify what you want the database to do for you — not how to do it. As a reminder, the equivalent LINQ statement would be as follows (I'm using LINQ to SQL syntax):
Dim selectedCustomers = From cust In Customers Where cust.Country = "Germany" Select cust
SQL statements are divided into two major categories, which are actually considered separate languages: the statements for manipulating the data, which form the Data Manipulation Language (DML), and the statements for defining database objects, such as tables or their indexes, which form the Data Definition Language (DDL). The DDL is not of interest to every database developer, and I will not discuss it in this book. The DML is covered in depth because you'll use these statements to retrieve data, insert new data into the database, and edit or delete existing data.
The statements of the DML part of the SQL language are also known as queries, and there are two types of queries: selection queries and action queries. Selection queries retrieve information from the database. A selection query returns a set of rows with identical structure. The columns can come from different tables, but all the rows returned by the query have the same number of columns. Action queries modify the database's objects or create new objects and add them to the database (new tables, relationships, and so on).
If you are not familiar with SQL, I suggest that you follow the examples in this chapter and experiment with the sample databases. To follow the examples, you have two options: SQL Server Management Studio (SSMS) and Query Designer of Visual Studio. SSMS helps you manage databases in various ways, including creating queries to extract data. Query Designer is an editor for SQL statements that also allows you to execute them and see the results. In addition to Query Designer, you can also use Query Builder, which is part of SSMS and Visual Studio. Query Builder lets you build the statements with visual tools, and you don't have to know the syntax of SQL in order to create queries with Query Builder. After a quick overview of the SQL statements, I will describe Query Builder and show you how to use its interface to build fairly elaborate queries.
One of the applications installed with SQL Server is SQL Server Management Studio. To start it, choose Start
After you're connected, right-click the database you want to use, and choose New Query from the context menu. Enter the SQL statement you want to execute in the blank query that SSMS creates. The SQL statement will be executed against the selected database when you press Ctrl+E or click the Execute button (it's the button with the exclamation point icon). Alternatively, you can prefix the SQL statement with the USE
statement, which specifies the database against which the statement will be executed. To retrieve all the Northwind customers located in Germany, enter this statement:
USE Northwind SELECT CompanyName FROM Customers WHERE Country = 'Germany'
The USE
statement isn't part of the query; it simply tells SSMS the database against which it must execute the query. I'm including the USE
statement with all the queries so you know the database used for each example. If you're executing the sample code from within Visual Studio, you need not use the USE
statement, because all queries are executed against the selected database. Actually, the statement isn't supported by Query Designer of Visual Studio.
The results of the query, known as the result set, will appear in a grid in the lower pane. An action query that updates a table (adds a new row, edits a row, or deletes an existing row) doesn't return any rows; it simply displays the number of rows affected on the Messages tab.
To execute another query, enter another statement in the upper pane, or edit the previous statement and press Ctrl+E again. You can also save SQL statements into files so that you won't have to type them again. To do so, open the File menu, choose Save As or Save, and enter the name of the file in which the contents of the Query pane will be stored. The statement will be stored in a text file with the extension .sql
.
To execute the same queries with Visual Studio, open the Server Explorer window, and right-click the name of the database against which you want to execute the query. From the context menu, choose New Query, and a new query window will open. You will also see a dialog box prompting you to select one or more tables. For the time being, close this dialog box, because you will supply the names of the tables in the query; later in this chapter, you'll learn how to use the visual tools to build queries.
Query Designer of Visual Studio consists of four panes (Figure 15.7). The upper pane (which is the Table Diagram pane) displays the tables involved in the query, their fields, and the relationships between the tables — if any. The next pane shows the fields that will be included in the output of the query. Here you specify the output of the query, as well as the selection criteria. This pane is Query Builder, the tool that lets you design queries visually. It's discussed later in this chapter. In the next pane, the SQL pane, you see the SQL statement produced by the visual tools. If you modify the query with the visual tools, the SQL statement is updated automatically; likewise, when you edit the query, the other two panes are updated automatically to reflect the changes. The last pane, the Results pane, contains a grid with the query's output. Every time you execute the query by clicking the button with the exclamation mark in the toolbar, the bottom pane is populated with the results of the query. For the examples in this section, ignore the top two panes. Just enter the SQL statements in the SQL pane, and execute them.
We'll start our discussion of SQL with the SELECT
statement. After you learn how to express the criteria for selecting the desired rows with the SELECT
statement, you can apply this information to other data-manipulation statements. The simplest form of the SELECT
statement is
SELECT fields FROM tables
where fields
and tables
are comma-separated lists of the fields you want to retrieve from the database and the tables they belong to. The list of fields following the SELECT
statement is referred to as the selection list. To select the contact information from all the companies in the Customers table, use this statement:
USE Northwind SELECT CompanyName, ContactName, ContactTitle FROM Customers
To retrieve all the fields, use the asterisk (*
). The following statement selects all the fields from the Customers table:
SELECT * FROM Customers
As soon as you execute a statement that uses the asterisk to select all columns, Query Designer will replace the asterisk with the names of all columns in the table.
The unconditional form of the SELECT
statement used in the previous section is quite trivial. You rarely retrieve data from all rows in a table. Usually you specify criteria, such as "all companies from Germany," "all customers who have placed three or more orders in the last six months," or even more-complicated expressions. To restrict the rows returned by the query, use the WHERE
clause of the SELECT
statement. The most common form of the SELECT
statement is the following:
SELECT fields FROM tables WHERE condition
The fields
and tables
arguments are the same as before, and condition
is an expression that limits the rows to be selected. The syntax of the WHERE
clause can get quite complicated, so we'll start with the simpler forms of the selection criteria. The condition
argument can be a relational expression, such as the ones you use in VB. To select all the customers from Germany, use the following condition:
WHERE Country = 'Germany'
To select customers from multiple countries, use the OR
operator to combine multiple conditions:
WHERE Country = 'Germany' OR Country = 'Austria'
You can also combine multiple conditions with the AND
operator.
It is possible to retrieve data from two or more tables by using a single statement. (This is the most common type of query, actually.) When you combine multiple tables in a query, you can use the WHERE
clause to specify how the rows of the two tables will be combined. Let's say you want a list of all product names, along with their categories. For this query, you must extract the product names from the Products table and the category names from the Categories table and specify that the ProductID
field in the two tables must match. The statement
USE Northwind SELECT ProductName, CategoryName FROM Products, Categories WHERE Products.CategoryID = Categories.CategoryID
retrieves the names of all products, along with their category names. Here's how this statement is executed. For each row in the Products table, the SQL engine locates the matching row in the Categories table and then appends the ProductName
and CategoryName
fields to the result. In other words, it creates pairs of a product and the matching category. Products that don't belong to a category are not included in the result set.
To avoid typing long table names, you can alias them with a shorter name and use this shorthand notation in the rest of the query. The query that retrieves titles and publishers can be written as follows:
USE pubs SELECT T.title FROM titles T, publishers P WHERE T.pub_id = P.pub_id
The table names are aliased in the FROM
clause, and the alias is used in the rest of the query. There's one situation where table name aliasing is mandatory, and this is when you want to refer to the same table twice. This is a fairly advanced topic, but I'm including a typical example to demonstrate an interesting technique, because you may run into it. Some tables contain references to themselves, and the Employees table of the Northwind database belongs to this category. Each employee reports to a manager, who's also an employee. The manager is identified by the column ReportsTo, which contains the ID of an employee's manager. Here are three rows of the Employees table that demonstrate this hierarchy:
EmployeeID | LastName | FirstName | Title | ReportsTo |
2 | Fuller | Andrew | Vice President, Sales | NULL |
5 | Buchanan | Steven | Sales Manager | 2 |
6 | Suyama | Michael | Sales Representative | 5 |
As you can see here, Suyama reports to Buchanan, and Buchanan in turn reports to Fuller. How would you retrieve each employee's name and title along with his manager's name? Here, we must make a query that involves two tables, one with the employee names and another one with the manager names. It just so happens that the two tables are in reality the same table. The trick is to treat the Employees table as two separate tables using aliases. Here's the query that retrieves employees and managers in a single result set:
SELECT Employees.EmployeeID, Employees.LastName + ' ' + Employees.FirstName, Employees.Title, Managers.FirstName + ' ' + Managers.LastName, Managers.Title FROM Employees, Employees Managers WHERE Employees.ReportsTo = Managers.EmployeeID
If the database contained two different tables, one for employees and another one for managers, you'd have no problem coding the query. Because we want to use the same table for both managers and employees, we use aliases to create two virtual tables and associate them with the ReportsTo
and EmployeeID
columns. The result is a neat list of names that shows clearly the hierarchy of the Northwind corporation's employees:
1 Davolio Nancy Sales Representative Andrew Fuller 3 Leverling Janet Sales Representative Andrew Fuller 4 Peacock Margaret Sales Representative Andrew Fuller 5 Buchanan Steven Sales Manager Andrew Fuller 6 Suyama Michael Sales Representative Steven Buchanan 7 King Robert Sales Representative Steven Buchanan 8 Callahan Laura Inside Sales Coordinator Andrew Fuller 9 Dodsworth Anne Sales Representative Steven Buchanan
I skipped the manager's title in the listing because it wouldn't fit on the printed page. Note that because the Last Name
and First Name
column names contain spaces, they're embedded in square brackets. The same is true for table names that contain spaces.
By default, each column of a query is labeled after the actual field name in the output. If a table contains two fields named CustLName
and CustFName
, you can display them with different labels by using the AS
keyword. The following SELECT
statement produces two columns labeled CustLName
and CustFName
:
SELECT CustLName, CustFName
The query's output looks much better if you change the labels of these two columns with a statement like the following:
SELECT CustLName AS [Last Name], CustFName AS [First Name]
It is also possible to concatenate two fields in the SELECT
list with the concatenation operator. Concatenated fields are labeled automatically as Expr1, Expr2
, and so on, so you must supply your own name for the combined field. The following statement creates a single column for the customer's name and labels it Customer Name
:
SELECT CustLName + ', ' + CustFName AS [Customer Name]
The DISTINCT
keyword eliminates from the cursor any duplicates retrieved by the SELECT
statement. Let's say you want a list of all countries with at least one customer. If you retrieve all country names from the Customers table, you'll end up with many duplicates. To eliminate them, use the DISTINCT
keyword, as shown in the following statement:
USE Northwind SELECT DISTINCT Country FROM Customers
The LIKE
operator uses pattern-matching characters like the ones you use to select multiple files in DOS. The LIKE
operator recognizes several pattern-matching characters (or wildcard characters) to match one or more characters, numeric digits, ranges of letters, and so on. Table 15.2 describes these characters.
Table 15.2. SQL wildcard characters
Wildcard Character | Description |
---|---|
| Matches any number of characters. The pattern |
| (Underscore character.) Matches any single alphabetic character. The pattern |
| Matches any single character within the brackets. The pattern |
| Matches any character not in the brackets. The pattern |
| Matches any one of a range of characters. The characters must be consecutive in the alphabet and specified in ascending order (A to Z, not Z to A). The pattern |
| Matches any single numeric character. The pattern |
You can use the LIKE
operator to retrieve all titles about Windows from the Pubs database, by using a statement like the following one:
USE pubs SELECT titles.title FROM titles WHERE titles.title LIKE '%Windows%'
The percent signs mean that any character(s) may appear in front of or after the word Windows in the title.
To include a wildcard character itself in your search argument, enclose it in square brackets. The pattern %50[%]%
will match any field that contains the string 50%.
A common operation for manipulating and maintaining databases is to locate null values in fields. The expressions IS NULL
and IS NOT NULL
find field values that are (or are not) null. To locate the rows of the Customers table that have a null value in their CompanyName
column, use the following WHERE
clause:
WHERE CompanyName IS NULL
You can easily locate the products without prices and edit them. The following statement locates products without prices:
USE Northwind SELECT * FROM Products WHERE UnitPrice IS NULL
A related function, the ISNULL()
function, allows you to specify the value to be returned when a specific field is null. The ISNULL()
SQL function accepts two arguments: a column name and a string. The function returns the value of the specified column, unless this value is null, in which case it returns the value of the second argument. To return the string ***
for customers without a company name, use the following expression:
USE Northwind SELECT CustomerID, ISNULL(CompanyName, '***') AS Company, ContactName FROM Customers
The rows of a query are not in any particular order. To request that the rows be returned in a specific order, use the ORDER BY
clause, which has this syntax:
ORDER BY col1, col2, . . .
You can specify any number of columns in the ORDER BY
list. The output of the query is ordered according to the values of the first column. If two rows have identical values in this column, they are sorted according to the second column, and so on. The following statement displays the customers ordered by country and then by city within each country:
USE Northwind SELECT CompanyName, ContactName, Country, City FROM Customers ORDER BY Country, City
In addition to column names, you can specify calculated columns in the SELECT
statement. The Order Details table contains a row for each invoice line. Invoice 10248, for instance, contains four lines (four items sold), and each detail line appears in a separate row in the Order Details table. Each row holds the number of items sold, the item's price, and the corresponding discount. To display the line's subtotal, you must multiply the quantity by the price minus the discount, as shown in the following statement:
USE Northwind SELECT Orders.OrderID, [Order Details].ProductID, [Order Details].[Order Details].UnitPrice * [Order Details].Quantity * (1 - [Order Details].Discount) AS SubTotal FROM Orders INNER JOIN [Order Details] ON Orders.OrderID = [Order Details].OrderID
Here the selection list contains an expression based on several fields of the Order Details table. This statement calculates the subtotal for each line in the invoices issued to all Northwind customers and displays them along with the order number. The order numbers are repeated as many times as there are products in the order (or lines in the invoice). In the following section, you will find out how to calculate totals too.
SQL supports some aggregate functions, which act on selected fields of all the rows returned by the query. The basic aggregate functions listed in Table 15.3 perform basic calculations such as summing, counting, and averaging numeric values. There are a few more aggregate functions for calculating statistics such as the variance and standard deviation, but I have omitted them from Table 15.3. Aggregate functions accept field names (or calculated fields) as arguments and return a single value, which is the sum (or average) of all values.
These functions operate on a single column (which could be a calculated column) and return a single value. The rows involved in the calculations are specified with the proper WHERE
clause. The SUM()
and AVG()
functions can process only numeric values. The other three functions can process both numeric and text values.
The aggregate functions are used to summarize data from one or more tables. Let's say you want to know the number of Northwind database customers located in Germany. The following SQL statement returns the desired value:
USE Northwind SELECT COUNT(CustomerID) FROM Customers WHERE Country = 'Germany'
The aggregate functions ignore the null values unless you specify the *
argument. The following statement returns the count of all rows in the Customers table, even if some of them have a null value in the Country
column:
USE Northwind SELECT COUNT(*) FROM Customers
The SUM()
function is used to total the values of a specific field in the specified rows. To find out how many units of the product with ID = 11 (queso Cabrales) have been sold, use the following statement:
USE Northwind SELECT SUM(Quantity) FROM [Order Details] WHERE ProductID = 11
The SQL statement that returns the total revenue generated by a single product is a bit more complicated. To calculate it, you must multiply the quantities by their prices and then add the resulting products together, taking into consideration each invoice's discount:
USE Northwind SELECT SUM(Quantity * UnitPrice * (1 - Discount)) FROM [Order Details] WHERE ProductID = 11
Product QuesoCabrales generated a total revenue of $12,901.77. If you want to know the number of items of this product that were sold, add one more aggregate function to the query to sum the quantities of each row that refers to the specific product ID:
USE Northwind SELECT SUM(Quantity), SUM(Quantity * UnitPrice * (1 - Discount))
FROM [Order Details] WHERE ProductID = 11
If you add the ProductID
column in the selection list and delete the WHERE
clause to retrieve the totals for all products, the query will generate an error message to the effect that the columns haven't been grouped. You will learn how to group the results a little later in this chapter.
Joins specify how you connect multiple tables in a query. There are four types of joins:
Left outer, or left, join
Right outer, or right, join
Full outer, or full, join
Inner join
A join operation combines all the rows of one table with the rows of another table. Joins are usually followed by a condition that determines which records on either side of the join appear in the result. The WHERE
clause of the SELECT
statement is similar to a join, but there are some fine points that will be explained momentarily.
The left, right, and full joins are sometimes called outer joins to differentiate them from an inner join. Left join and left outer join mean the same thing, as do right join and right outer join.
The left join displays all the records in the left table and only those records of the table on the right that match certain user-supplied criteria. This join has the following syntax:
FROM (primary table) LEFT JOIN (secondary table) ON (primary table).(field) = (secondary table).(field)
The left outer join retrieves all rows in the primary table and the matching rows from a secondary table. The following statement retrieves all the titles from the Pubs database along with their publisher. If some titles have no publisher, they will be included in the result:
USE pubs SELECT title, pub_name FROM titles LEFT JOIN publishers ON titles.pub_id = publishers.pub_id
The right join is similar to the left outer join, except that it selects all rows in the table on the right and only the matching rows from the left table. This join has the following syntax:
FROM (secondary table) RIGHT JOIN (primary table) ON (secondary table).(field) = (primary table).(field)
The following statement retrieves all the publishers from the Pubs database along with their titles. If a publisher has no titles, the publisher name will be included in the result set. Notice that this statement is almost the same as the example of the left outer join entry. I changed only LEFT
to RIGHT
:
USE pubs SELECT title, pub_name FROM titles RIGHT JOIN publishers ON titles.pub_id = publishers.pub_id
The full join returns all the rows of the two tables, regardless of whether there are matching rows. In effect, it's a combination of left and right joins. To retrieve all titles and all publishers and to match publishers to their titles, use the following join:
USE pubs SELECT title, pub_name FROM titles FULL JOIN publishers ON titles.pub_id = publishers.pub_id
This query will include titles without a publisher, as well as publishers without a title.
The inner join returns the matching rows of both tables, similar to the WHERE
clause, and has the following syntax:
FROM (primary table) INNER JOIN (secondary table) ON (primary table).(field) = (secondary table).(field)
The following SQL statement combines records from the Titles and Publishers tables of the Pubs database if their pub_id
fields match. It returns all the titles and their publishers. Titles without publishers, or publishers without titles, will not be included in the result.
USE pubs SELECT titles.title, publishers.pub_name FROM titles, publishers WHERE titles.pub_id = publishers.pub_id
You can retrieve the same rows by using an inner join, as follows:
USE pubs SELECT titles.title, publishers.pub_name FROM titles INNER JOIN publishers ON titles.pub_id = publishers.pub_id
Sometimes you need to group the results of a query so that you can calculate subtotals. Let's say you need not only the total revenues generated by a single product but also a list of all products and the revenues they generated. The example in the earlier section "Working with Calculated Fields" calculates the total revenue generated by a single product. It is possible to use the SUM()
function to break the calculations at each new product ID, as demonstrated in the following statement. To do so, you must group the product IDs together with the GROUP BY
clause:
USE Northwind SELECT ProductID, SUM(Quantity * UnitPrice *(1 - Discount)) AS [Total Revenues] FROM [Order Details] GROUP BY ProductID ORDER BY ProductID
The preceding statement produces the following output:
ProductID Total Revenues
1 12788.10
2 16355.96
3 3044.0
4 8567.89
5 5347.20
6 7137.0
7 22044.29
The aggregate functions work in tandem with the GROUP BY
clause (when there is one) to produce subtotals. The GROUP BY
clause groups all the rows with the same values in the specified column and forces the aggregate functions to act on each group separately. SQL Server sorts the rows according to the column specified in the GROUP BY
clause and starts calculating the aggregate functions. Every time it runs into a new group, it generates a new row and resets the aggregate function(s).
If you use the GROUP BY
clause in a SQL statement, you must be aware of the following rule:
All the fields included in the SELECT
list must be either part of an aggregate function or part of the GROUP BY
clause.
Let's say you want to change the previous statement to display the names of the products rather than their IDs. The following statement does just that. Notice that the ProductName
field doesn't appear as an argument to an aggregate function, so it must be part of the GROUP BY
clause:
USE Northwind SELECT ProductName, SUM(Quantity * [Order Details].UnitPrice * (1 - Discount)) AS [Total Revenues] FROM [Order Details], Products WHERE Products.ProductID = [Order Details].ProductID GROUP BY ProductName ORDER BY ProductName
These are the first few lines of the output produced by this statement:
ProductName Total Revenues
Alice Mutton 32698.38
Aniseed Syrup 3044.0
Boston Crab Meat 17910.63
Camembert Pierrot 46927.48
Carnarvon Tigers 29171.87
If you omit the GROUP BY
clause, the query will generate an error message indicating that the ProductName
column in the selection list is not involved in an aggregate or a GROUP BY
clause.
You can also combine multiple aggregate functions in the selection list. The following statement calculates the total number of items sold for each product, along with the revenue generated and the number of invoices that contain the specific product:
USE Northwind SELECT ProductID AS Product, COUNT(ProductID) AS Invoices, SUM(Quantity) AS [Units Sold], SUM(Quantity * UnitPrice *(1 - Discount)) AS Revenue FROM [Order Details] GROUP BY ProductID ORDER BY ProductID
Here are the first few lines returned by the preceding query:
Product Invoices Units Sold Revenue
1 38 828 12788.1000595092
2 44 1057 16355.9600448608
You should try to revise the preceding statement so that it displays product names instead of IDs, by adding another join to the query as explained already.
The HAVING
clause limits the groups that will appear at the cursor. In a way, it is similar to the WHERE
clause, but the HAVING
clause is used with aggregate functions and the GROUP BY
clause, and the expression used with the HAVING
clause usually involves one or more aggregates. The following statement returns the IDs of the products whose sales exceed 1,000 units:
USE NORTHWIND SELECT ProductID, SUM(Quantity) FROM [Order Details] GROUP BY ProductID HAVING SUM(Quantity) > 1000
You can't use the WHERE
clause here, because no aggregates may appear in the WHERE
clause. However, you can use the WHERE clause as usual to limit the number of rows involved in the query (for example, limit the aggregate to the products of a specific category, or the products sold to customers in Germany). To see product names instead of IDs, join the Order Details table to the Products table by matching their ProductID
columns. Note that the expression in the HAVING
clause need not be included in the selection list. You can change the previous statement to retrieve the total quantities sold with a discount of 10 percent or more with the following HAVING
clause:
HAVING Discount >= 0.1
However, the Discount
column must be included in the GROUP BY
clause as well, because it's not part of an aggregate.
In addition to the selection queries we examined so far, you can also execute queries that alter the data in the database's tables. These queries are called action queries, and they're quite simple compared with the selection queries. There are three types of actions you can perform against a database: insertions of new rows, deletions of existing rows, and updates (edits) of existing rows. For each type of action, there's a SQL statement, appropriately named INSERT, DELETE
, and UPDATE
. Their syntax is very simple, and the only complication is how you specify the affected rows (for deletions and updates). As you can guess, the rows to be affected are specified with a WHERE
clause, followed by the criteria discussed with selection queries.
The first difference between action and selection queries is that action queries don't return any rows. They return the number of rows affected, but you can disable this feature by calling the following statement:
SET NOCOUNT ON
This statement can be used when working with a SQL Server database. Let's look at the syntax of the three action SQL statements, starting with the simplest: the DELETE
statement.
The DELETE
statement deletes one or more rows from a table; its syntax is as follows:
DELETE table_name WHERE criteria
The WHERE
clause specifies the criteria that the rows must meet in order to be deleted. The criteria expression is no different from the criteria you specify in the WHERE
clause of the selection query. To delete the orders placed before 1998, use a statement like this one:
USE Northwind DELETE Orders WHERE OrderDate < '1/1/1998'
Of course, the specified rows will be deleted only if the Orders table allows cascade deletions or if the rows to be deleted are not linked to related rows. If you attempt to execute the preceding query, you'll get an error with the following description:
The DELETE statement conflicted with the REFERENCE constraint "FK_Order_Details_Orders". The conflict occurred in database "Northwind", table "dbo.Order Details", column 'OrderID'.
This error message tells you that you can't delete rows from the Orders table that are referenced by rows in the Order Details table. If you were allowed to delete rows from the Orders table, some rows in the related table would remain orphaned (they would refer to an order that doesn't exist). To delete rows from the Orders table, you must first delete the related rows from the Order Details table and then delete the same rows from the Orders table. Here are the statements that will delete orders placed before 1998. (Do not execute this query unless you're willing to reinstall the Northwind database; there's no undo feature when executing SQL statements against a database.)
USE Northwind DELETE [Order Details] WHERE (OrderID IN ( SELECT OrderID FROM Orders WHERE (OrderDate < '1/1/1998'))) DELETE Orders WHERE OrderDate < '1/1/1998'
As you can see, the operation takes two action queries: one to delete rows from the Order Details table and another to delete the corresponding rows from the Orders table.
The DELETE
statement returns the number of rows deleted. You can retrieve a table with the deleted rows by using the OUTPUT
clause:
DELETE Customers OUTPUT DELETED.* WHERE Country IS NULL
To test the OUTPUT
clause, insert a few fake rows in the Customers table:
INSERT Customers (CustomerID, CompanyName) VALUES ('AAAAA', 'Company A') INSERT Customers (CustomerID, CompanyName) VALUES ('BBBBB', 'Company B')
And then delete them with the following statement:
DELETE Customers OUTPUT DELETED.* WHERE Country IS NULL
If you execute the preceding statements, the deleted rows will be returned as the output of the query. If you want to be safe, you can insert the deleted rows into a temporary table so you can insert them back into the database (should you delete more rows than intended). My suggestion is that you first execute a selection query that returns the rows you plan to delete, examine the output of this query, and, if you see only the rows you want to delete and no more, write a DELETE
statement with the same WHERE
clause. To insert the deleted rows to a temporary table, use the INSERT INTO
statement, which is described in the following section.
The INSERT
statement inserts new rows in a table; its syntax is as follows:
INSERT table_name (column_names) VALUES (values)
column_names
and values
are comma-separated lists of columns and their respective values. Values are mapped to their columns by the order in which they appear in the two lists.
Notice that you don't have to specify values for all columns in the table, but the values
list must contain as many items as there are column names in the first list. To add a new row to the Customers table, use a statement like the following:
INSERT Customers (CustomerID, CompanyName) VALUES ('FRYOG', 'Fruit & Yogurt')
This statement inserts a new row, provided that the FRYOG
key isn't already in use. Only two of the new row's columns are set, and they're the columns that can't accept null values.
If you want to specify values for all the columns of the new row, you can omit the list of columns. The following statement retrieves a number of rows from the Products table and inserts them into the SelectedProducts table, which has the same structure:
INSERT INTO SelectedProducts VALUES (values)
If the values come from a table, you can replace the VALUES
keyword with a SELECT
statement:
INSERT INTO SelectedProducts SELECT * FROM Products WHERE CategoryID = 4
The INSERT INTO
statement allows you to select columns from one table and insert them into another one. The second table must have the same structure as the output of the selection query. Note that you need not create the new table ahead of time; you can create a new table with the CREATE TABLE
statement. The following statement creates a new table to accept the CustomerID, CompanyName
, and ContactName
columns of the Customers table:
DECLARE @tbl table (ID char(5), name varchar(100), contact varchar(100))
After the table has been created, you can populate it with the appropriate fields of the deleted rows:
DELETE Customers OUTPUT DELETED.CustomerID, DELETED.CompanyName, DELETED.ContactName INTO @tbl WHERE Country IS NULL SELECT * FROM @tbl
Execute these statements, and you will see in the Results pane the two rows that were inserted momentarily into the Customers table and then immediately deleted.
The UPDATE
statement edits a row's fields; its syntax is the following:
UPDATE table_name SET field1 = value1, field2 = value2,. . . WHERE criteria
The criteria
expression is no different from the criteria you specify in the WHERE
clause of selection query. To change the country from UK to United Kingdom in the Customers table, use the following statement:
UPDATE Customers SET Country='United Kingdom' WHERE Country = 'UK'
This statement will locate all the rows in the Customers table that meet the specified criteria (their Country
field is UK) and change this field's value to United Kingdom.
Before you execute a DELETE or UPDATE statement, use a SELECT statement to see the rows that will be affected. Verify that these are the rows you intend to change and then delete or update them. Once an action query has been executed against the database, there's no way back (sorry, no Undo here).
This concludes our overview of SQL, and we're (at last) ready to explore the data access mechanisms of the Framework. In the following section, you'll learn how to submit queries to a database from within your VB application and how to retrieve the results of a query.
The component of the Framework we use to access databases is known as ADO.NET (ADO stands for Active Data Objects) and it provides two basic methods of accessing data: stream-based data access, which establishes a stream to the database and retrieves the data from the server, and set-based data access, which creates a special data structure at the client and fills it with data. This structure is the DataSet, which resembles a section of the database: It contains one or more DataTable objects, which correspond to tables and are made up of DataRow objects. These DataRow objects have the same structure as the rows in their corresponding tables. DataSets are populated by retrieving data from one or more database tables into the corresponding DataTables. As for submitting the data to the database with the stream-based approach, you must create the appropriate INSERT
/UPDATE
/DELETE
statements and then execute them against the database.
The stream-based approach relies on the DataReader object, which makes the data returned by the database available to your application. The client application reads the data returned by a query through the DataReader object and must store it somehow at the client. Quite frequently, we use custom objects to store the data at the client.
The set-based approach uses the same objects as the stream-based approach behind the scenes, and it abstracts most of the grunt work required to set up a link to the database, retrieve the data, and store it in the client computer's memory. So, it makes sense to start by exploring the stream-based approach and the basic objects provided by ADO.NET for accessing databases. After you understand the nature of ADO.NET and how to use it, you'll find it easy to see the abstraction introduced by the set-based approach and how to make the most of DataSets. As you will see in the following chapter, you can create DataSets and the supporting objects with the visual tools of the IDE.
A data-driven application should be able to connect to a database and execute queries against it. The selected data is displayed on the appropriate interface, where the user can examine it or edit it. Finally, the edited data is submitted to the database. This is the cycle of a data-driven application:
Retrieve data from the database.
Present data to the user.
Allow the user to edit the data.
Submit changes to the database.
Of course, many issues are not obvious from this outline. Designing the appropriate interface for navigating through the data (going from customers to their orders and from the selected order to its details) can be quite a task. Developing a functional interface for editing the data at the client is also a challenge, especially if several related tables are involved. We must also take into consideration that there are other users accessing the same database. What will happen if the product we're editing has been removed in the meantime by another user? Or what if a user has edited the same customer's data since our application read it? Do we overwrite the changes made by the other user, or do we reject the edits of the user who submits the edits last? I'll address these issues in this chapter and in Chapter 18, but we need to start with the basics: the classes for accessing the database.
To connect to a database, you must create a Connection object, initialize it, and then call its Open
method to establish a connection to the database. The Connection object is the channel between your application and the database; every command you want to execute against the same database must use this Connection object. When you're finished, you must close the connection by calling the Connection object's Close
method. Because ADO.NET maintains a pool of Connection objects that are reused as needed, it's imperative that you keep connections open for the shortest possible time.
The object that will actually execute the command against the database is the Command object, which you must configure with the statement you want to execute and associate with a Connection object. To execute the statement, you can call one of the Command object's methods. The ExecuteReader
method returns a DataReader object that allows you to read the data returned by the selection query, one row at a time. To execute a statement that updates a database table but doesn't return a set of rows, use the ExecuteNonQuery
method, which executes the specified command and returns an integer, which is the number of rows affected by the statement. The following sections describe the Connection, Command, and DataReader classes in detail.
To summarize, ADO.NET provides three core classes for accessing databases: the Connection, Command, and DataReader classes. There are more data access–related classes, but they're all based on these three basic classes. After you understand how to interact with a database by using these classes, you'll find it easy to understand the additional classes, as well as the code generated by the visual data tools that come with Visual Studio.
The Connection class is an abstract one, and you can't use it directly. Instead, you must use one of the classes that derive from the Connection class. Currently, there are three derived classes: SqlConnection, OracleConnection, and OleDbConnection. Likewise, the Command class is an abstract class with three derived classes: SqlCommand, OracleCommand, and OleDbCommand.
The SqlConnection and SqlCommand classes belong to the SqlClient
namespace, which you must import into your project via the following statement:
Imports System.Data.SqlClient
The examples in this book use the SQL Server 2008 DBMS, and it's implied that the SqlClient
namespace is imported into every project that uses SQL Server.
To connect the application to a database, the Connection object must know the name of the server on which the database resides, the name of the database itself, and the credentials that will allow it to establish a connection to the database. These credentials are either a username and password or a Windows account that has been granted rights to the database. You obviously know what type of DBMS you're going to connect to so you can select the appropriate Connection class. The most common method of initializing a Connection object in your code is the following:
Dim CN As New SqlConnection("Data Source = localhost;" & "Initial Catalog = Northwind; uid = user_name;" & "password = user_password")
localhost
is a universal name for the local machine, Northwind
is the name of the database, and user_name
and user_password
are the username and password of an account configured by the database administrator. The Northwind sample database isn't installed along with SQL Server 2008, but you can download it from MSDN and install it yourself. The process was described in the section "Obtaining the Northwind and Pubs Sample Databases" earlier in this chapter. I'm assuming that you're using the same computer both for SQL Server and to write your VB applications. If SQL Server resides on a different computer in the network, use the server computer's name (or IP address) in place of the localhost
name. If SQL Server is running on another machine on the network, use a setting like the following for the Data Source
key:
Data Source = \PowerServer
If the database is running on a remote machine, use the remote machine's IP address. If you're working from home, for example, you can establish a connection to your company's server with a connection string like the following:
Data Source = "213.16.178.100; Initial Catalog = BooksDB; uid = xxx; password = xxx"
The uid
and password
keys are those of an account created by the database administrator, and not a Windows account. If you want to connect to the database by using each user's Windows credentials, you should omit the uid
and password
keys and instead use the Integrated Security
key. If your network is based on a domain controller, you should use integrated security so that users can log in to SQL Server with their Windows account. This way you won't have to store any passwords in your code or even have an auxiliary file with the application settings.
If you're using an IP address to specify the database server, you may also have to include SQL Server's port by specifying an address such as 213.16.178.100, 1433
. The default port for SQL Server is 1433, and you can omit it. If the administrator has changed the default port or has hidden the server's IP address behind another IP address for security purposes, you should contact the administrator to get the server's address. If you're connecting over a local network, you shouldn't have to use an IP address. If you want to connect to the company server remotely, you will probably have to request the server's IP address and the proper credentials from the server's administrator.
The basic property of the Connection object is the ConnectionString
property, which is a semicolon-separated string of key-value pairs that specifies the information needed to establish a connection to the desired database. It's basically the same information you provide in various dialog boxes when you open SQL Server Management Studio and select a database to work with. An alternate method of setting up a Connection object is to set its ConnectionString
property:
Dim CN As New SqlConnection CN.ConnectionString = "Data Source = localhost; Initial Catalog = Northwind; " & "Integrated Security = True"
One of the Connection class's properties is the State
property, which returns the state of a connection; its value is a member of the ConnectionState
enumeration: Connecting
, Open
, Executing
, Fetching
, Broken
, and Closed
. If you call the Close
method on a Connection object that's already closed or the Open
method on a Connection object that's already open, an exception will be thrown. To avoid the exception, you must examine the Connection's State
property and act accordingly.
The following code segment outlines the process of opening a connection to a database:
Dim CNstring As String = "Data Source=localhost;Initial " & "Catalog=Northwind;Integrated Security=True" CNstring = InputBox( "Please enter a Connection String", "CONNECTION STRING", CNstring) If CNstring.Trim = "" Then Exit Sub Dim CN As New SqlConnection(CNstring) Try CN.Open() If CN.State = ConnectionState.Open Then MsgBox("Workstation " & CN.WorkstationId & " connected to database " & CN.Database & " on the " & CN.DataSource & " server") End If Catch ex As Exception MsgBox( _ "FAILED TO OPEN CONNECTION TO DATABASE DUE TO THE FOLLOWING ERROR" & vbCrLf & ex.Message) End Try ' use the Connection object to execute statements ' against the database and then close the connection If CN.State = ConnectionState.Open Then CN.Close()
The second major component of the ADO.NET model is the Command class, which allows you to execute SQL statements against the database. The two basic parameters of the Command object are a Connection object that specifies the database where the command will be executed and specifies the actual SQL command. To execute a SQL statement against a database, you must initialize a Command object and set its Connection property to the appropriate Connection object. It's the Connection object that knows how to connect to the database; the Command object simply submits a SQL statement to the database and retrieves the results.
The Command object exposes a number of methods for executing SQL statements against the database, depending on the type of statement you want to execute. The ExecuteNonQuery
method executes INSERT
/DELETE
/UPDATE
statements that do not return any rows, just an integer value, which is the number of rows affected by the query. The ExecuteScalar
method returns a single value, which is usually the result of an aggregate operation, such as the count of rows meeting some criteria, the sum or average of a column over a number of rows, and so on. Finally, the ExecuteReader
method is used with SELECT
statements that return rows from one or more tables.
To execute an UPDATE
statement, for example, you must create a new Command object and associate the appropriate SQL statement with it. One overloaded form of the constructor of the Command object allows you to specify the statement to be executed against the database, as well as a Connection object that points to the desired database as arguments:
Dim CMD As New SqlCommand( "UPDATE Products SET UnitPrice = UnitPrice * 1.07 " & "WHERE CategoryID = 3", CN) CN.Open Dim rows As Integer rows = CMD.ExecuteNonQuery If rows = 1 Then MsgBox("Table Products updated successfully") Else MsgBox("Failed to update the Products table") End If If CN.State = ConnectionState.Open Then CN.Close
The ExecuteNonQuery
method returns the number of rows affected by the query, and it's the same value that appears in the Output window of SQL Server Management Studio when you execute an action query. The preceding statements mark up the price of all products in the Confections category by 7 percent. You can use the same structure to execute INSERT
and DELETE
statements; all you have to change is the actual SQL statement in the SqlCommand
object's constructor. You can also set up a Command object by setting its Connection
and CommandText
properties:
Command.Connection = Connection Command.CommandText = "SELECT COUNT(*) FROM Customers"
After you're finished with the Command object, you should close the Connection object. Although you can initialize a Connection object anywhere in your code, you should call its Open
method as late as possible (that is, just before executing a statement) and its Close
method as early as possible (that is, as soon as you have retrieved the results of the statement you executed).
The ExecuteScalar
method executes the SQL statement associated with the Command object and returns a single value, which is the first value that the SQL statement would print in the Output window of SQL Server Management Studio. The following statements read the number of rows in the Customers table of the Northwind database and store the result in the count
variable:
Dim CMD As New SqlCommand( "SELECT COUNT(*) FROM Customers", CN) Dim count As Integer CN.Open count = CMD.ExecuteScalar If CN.State = ConnectionState.Open Then CN.Close
If you want to execute a SELECT
statement that retrieves multiple rows, you must use the ExecuteReader
method of the Command object, as shown here:
Dim CMD As New SqlCommand( "SELECT * FROM Customers", CN) CN.Open Dim Reader As SqlDataReader Reader = CMD.ExecuteReader While Reader.Read ' process the current row in the result set End While If CN.State = ConnectionState.Open Then CN.Close
You'll see shortly how to access the fields of each row returned by the ExecuteReader
method through the properties of the SqlDataReader class.
The command to be executed through the Command object is not always a SQL statement; it could be the name of a stored procedure, or the name of a table, in which case it retrieves all the rows of the table. You can specify the type of statement you want to execute with the CommandType
property, whose value is a member of the CommandType
enumeration: Text
(for SQL statements), StoredProcedure
(for stored procedures), and TableDirect
(for a table). You don't have to specify the type of the command you want to execute, but then the Command object will have to figure it out, a process that will take a few moments, and you can avoid this unnecessary delay. The Northwind database comes with the Ten Most Expensive Products
stored procedure. To execute this stored procedure, set up a Command object with the following statements:
Dim CMD As New SqlCommand CMD.Connection = CN CMD.CommandText = "[Ten Most Expensive Products]" CMD.CommandType = CommandType.StoredProcedure
Finally, you can retrieve all the rows of the Customers table by setting up a Command object like the following:
Dim CMD As New SqlCommand CMD.Connection = CN CMD.CommandText = "Customers" CMD.CommandType = CommandType.TableDirect
The most common SQL statements, the SELECT
statements, retrieve a set of rows from one or more joined tables, the result set. These statements are executed with the ExecuteReader
method, which returns a DataReader object — a SqlDataReader
object for statements executed against SQL Server databases. The DataReader class provides the members for reading the results of the query in a forward-only manner. The connection remains open while you read the rows returned by the query, so it's imperative to read the rows and store them in a structure in the client computer's memory as soon as possible and then close the connection. The DataReader object is read-only (you can't use it to update the underlying rows), so there's no reason to keep it open for long periods. Let's execute the following SELECT
statement to retrieve selected columns of the rows of the Employees table of the Northwind database:
SELECT LastName + ' ' + FirstName AS Name, Title, Extension, HomePhone FROM Employees
Here are the VB statements that set up the appropriate Command object and retrieve the SqlDataReader object with the result set:
Dim CMD As New SqlCommand Dim CN As New SqlConnection("Data Source = localhost;Initial Catalog=Northwind;" & "Integrated Security=True") CMD.Connection = CN CMD.CommandText = "SELECT LastName + ' ' + FirstName AS Name, " & "Title, Extension, HomePhone FROM Employees" CN.Open() Dim Reader As SqlDataReader Reader = Command.ExecuteReader Dim str As String = "" Dim cont As Integer = 0 While Reader.Read str &= Convert.ToString(Reader.Item("Name")) & vbTab str &= Convert.ToString(Reader.Item("Title")) & vbTab str &= Convert.ToString(Reader.Item("Extension")) & vbTab str &= Convert.ToString(Reader.Item("HomePhone")) & vbTab str &= vbCrLf count += 1 End While Debug.WriteLine(vbCrLf & vbCrLf & "Read " & count.ToString & " rows: " & vbCrLf & vbCrLf & str) CN.Close()
The DataReader class provides the Read
method, which advances the current pointer to the next row in the result set. To read the individual columns of the current row, you use the Item
property, which allows you to specify the column by name and returns an object variable. It's your responsibility to cast the object returned by the Item
property to the appropriate type. Initially, the DataReader is positioned in front of the first line in the result set, and you must call its Read
method to advance to the first row. If the query returns no rows, the Read
method will return False, and the While
loop won't be executed at all. In the preceding sample code, the fields of each row are concatenated to form the str
string, which is printed in the Immediate window; it looks something like this:
Davolio Nancy Sales Representative 5467 (206) 555-9857 Fuller Andrew Vice President, Sales 3457 (206) 555-9482 Leverling Janet Sales Representative 3355 (206) 555-3412
Most SQL statements and stored procedures accept parameters, and you should pass values for each parameter before executing the query. Consider a simple statement that retrieves the customers from a specific country, whose name is passed as an argument:
SELECT * FROM Customers WHERE Country = @country
The @country
parameter must be set to a value, or an exception will be thrown as you attempt to execute this statement. Stored procedures also accept parameters. The Sales By Year
stored procedure of the Northwind database, for example, expects two Date values and returns sales between the two dates. To accommodate the passing of parameters to a parameterized query or stored procedure, the Command object exposes the Parameters
property, which is a collection of Parameter
objects. To pass parameter values to a command, you must set up a Parameter object for each parameter; set its name, type, and value; and then add the Parameter object to the Parameters collection of the Command object. The following statements set up a Command object with a parameter of the varchar type with a maximum size of 15 characters:
Dim Command As New SqlCommand Command.CommandText = "SELECT * FROM Customers WHERE Country = @country" Command.Parameters.Add("@country", SqlDbType.VarChar, 15) Command.Parameters("@country").Value = "Italy"
At this point, you're ready to execute the SELECT
statement with the ExecuteReader
method and retrieve the customers from Italy. You can also configure the Parameter object in its constructor:
Dim param As New SqlParameter(paramName, paramType, paramSize)
Here's the constructor of the @country
parameter of the preceding example:
Dim param As New SqlParameter("@country", SqlDbType.VarChar, 15) param.Value = "Italy" CMD.Parameters.Add param
Finally, you can combine all these statements into a single one:
CMD.Parameters.Add("@country", SqlDbType.VarChar, 15).Value = "Italy"
In the last statement, I initialize the parameter as I add it to the Parameters collection and then set its value to the string Italy
. Oddly, there's no overloaded form of the Add
method that allows you to specify the parameter's value, but there is an AddWithValue
method, which adds a new parameter and sets its value. This method accepts two arguments: a string with the parameter's name and an object with the parameter's value. The actual type of the value is determined by the type of the query or stored procedure's argument, and it's resolved at runtime. The simplest method of adding a new parameter to the CMD.Parameters
collection is the following:
CMD.Parameters.Add("@country", "Italy")
After the parameter has been set up, you can call the ExecuteReader
method to retrieve the customers from the country specified by the argument and then read the results through an instance of the DataReader class.
Another property of the Parameter class is the Direction
property, which determines whether the stored procedure can alter the value of the parameter. The Direction
property's setting is a member of the ParameterDirection
enumeration: Input, Output, InputOutput
, and ReturnValue
. A parameter that's set by the procedure should have its Direction
property set to Output
: The parameter's value is not going to be used by the procedure, but the procedure's code can set it to return information to the calling application. If the parameter is used to pass information to the procedure, as well as to pass information back to the calling application, its Direction
property should be set to InputOutpu
t.
Let's look at a stored procedure that returns the total of all orders, as well as the total number of items ordered by a specific customer. This stored procedure accepts as a parameter the ID of a customer, obviously, and it returns two values: the total of all orders placed by the specified customer and the number of items ordered. A procedure (be it a SQL Server stored procedure or a regular VB function) can't return two or more values. The only way to retrieve multiple results from a single stored procedure is to pass output parameters so that the stored procedure can set their value. To make the stored procedure a little more interesting, we'll add a return value, which will be the number of orders placed by the customer. Listing 15.1 shows the implementation of the CustomerTotals
stored procedure.
Example 15.1. The CustomerTotals stored procedure
CREATE PROCEDURE CustomerTotals @customerID varchar(5), @customerTotal money OUTPUT, @customerItems int OUTPUT AS SELECT @customerTotal = SUM(UnitPrice * Quantity * (1 - Discount)) FROM [Order Details] INNER JOIN Orders ON [Order Details].OrderID = Orders.OrderID WHERE Orders.CustomerID = @customerID SELECT @customerItems = SUM(Quantity) FROM [Order Details] INNER JOIN Orders ON [Order Details].OrderID = Orders.OrderID WHERE Orders.CustomerID = @customerID DECLARE @customerOrders int SELECT @customerOrders = COUNT(*) FROM Orders WHERE Orders.CustomerID = @customerID RETURN @customerOrders
To attach the CustomerTotals
stored procedure to the database, create a new stored procedure, paste the preceding statements in the code window, and press F5 to execute it. Make sure the database is Northwind (not master). The stored procedure calculates three totals for the specified customer and stores them to three local variables. The @customerTotal
and @customerItems
variables are output parameters, which the calling application can read after executing the stored procedure. The @customerOrders
variable is the procedure's return value. We can return the number of orders for the customer through the stored procedure's return value, because this variable happens to be an integer, and the return value is always an integer. In more-complex stored procedures, we'd use output parameters for all the values we want to return to the calling application, and the procedure would return a value to indicate the execution status: 0 or 1 if the procedure completed its execution successfully and a negative value to indicate the error, should the procedure fail to execute.
Before using the CustomerTotals
stored procedure with our VB application, let's test it in SQL Server Management Studio. We must declare a variable for each of the output parameters: the @Total
, @Items
, and @Orders
variables. These three variables must be passed to the stored procedure with the OUTPUT
attribute, as shown here:
DECLARE @Total money DECLARE @Items int DECLARE @Orders int DECLARE @custID varchar(5) DECLARE @CustomerTotal decimal DECLARE @CustomerItems int SET @custID = 'BLAUS' EXEC @orders = CustomerTotals @custID, @customerTotal OUTPUT, @customerItems OUTPUT PRINT 'Customer ' + @custId + ' has placed a total of ' + CAST(@orders AS varchar(8)) + ' orders ' + ' totaling $' + CAST(ROUND(@customerTotal, 2) AS varchar(12)) + ' and ' + CAST(@customerItems AS varchar(4)) + ' items.'
Open a new query window in SQL Server Management Studio, and enter the preceding statements. Press F5 to execute them, and you will see the following message printed in the Output window:
Customer BLAUS has placed a total of 8 orders totaling $10355.45 and 653 items.
The customer's ID is an INPUT
parameter, and we could pass it to the procedure as a literal. You can omit the declaration of the @custID
variable and call the stored procedure with the following statement:
DECLARE @CustomerTotal decimal DECLARE @CustomerItems int EXEC @orders = _ CustomerTotals 'BLAUS', @customerTotal OUTPUT, @customerItems OUTPUT
Now that we've tested our stored procedure and know how to call it, we'll do the same from within our sample application. To execute the CustomerTotals
stored procedure, we must set up a Command object, create the appropriate Parameter objects (one Parameter object per stored procedure parameter plus another Parameter object for the stored procedure's return value), and then call the Command.ExecuteNonQuery
method. Upon return, we'll read the values of the output parameters and the stored procedure's return value. Listing 15.2 shows the code that executes the stored procedure (see the SimpleQueries sample project available for download from www.sybex.com/go/masteringvb2010
).
Example 15.2. Executing a stored procedure with output parameters
Private Sub bttnExecSP_Click(...) Handles bttnExecSP.Click Dim customerID As String = InputBox("Please enter a customer ID", "CustomerTotals Stored Procedure", "ALFKI") If customerID.Trim.Length = 0 Then Exit Sub Dim CMD As New SqlCommand CMD.Connection = CN CMD.CommandText = "CustomerTotals" CMD.CommandType = CommandType.StoredProcedure CMD.Parameters.Add( "@customerID", SqlDbType.VarChar, 5).Value = customerID CMD.Parameters.Add("@customerTotal", SqlDbType.Money) CMD.Parameters("@customerTotal").Direction = ParameterDirection.Output CMD.Parameters.Add("@customerItems", SqlDbType.Int) CMD.Parameters("@customerItems").Direction = ParameterDirection.Output CMD.Parameters.Add("@orders", SqlDbType.Int) CMD.Parameters("@orders").Direction = ParameterDirection.ReturnValue CN.Open() CMD.ExecuteNonQuery() CN.Close() Dim items As Integer Items = Convert.ToInt32(CMD.Parameters("@customerItems").Value Dim orders As Integer Orders = Convert.ToInt32(CMD.Parameters("@orders").Value Dim ordersTotal As Decimal ordersTotal = Convert.ToDouble( CMD.Parameters("@customerTotal").Value MsgBox("Customer BLAUS has placed " & orders.ToString & " orders " & "totaling $" & Math.Round(ordersTotal, 2).ToString("#,###.00") & " and " & items.ToString & " items") End Sub
In most applications, the same Command object will be reused again and again with different parameter values, so it's common to add the parameters to a Command object's Parameters collection and assign values to them every time we want to execute the command. Let's say you've designed a form with text boxes, where users can edit the values of the various fields; here's how you'd set the values of the UPDATECMD
variable's parameters:
UPDATECMD.Parameters("@CustomerID").Value = txtID.Text.Trim UPDATECMD.Parameters("@CompanyName").Value = txtCompany.Text.Trim
After setting the values of all parameters, you can call the ExecuteNonQuery
method to submit the changes to the database. To update another customer, just assign different values to the existing parameters, and call the UPDATECMD object's ExecuteNonQuery
method.
Another problem you will avoid with parameterized queries and stored procedures is that of handling single quotes, which are used to delimit literals in T-SQL. Consider the following UPDATE
statement, which picks up the company name from a TextBox control and updates a single row in the Customers table:
CMD.CommandText = "UPDATE Customers SET CompanyName = '" & txtCompany.Text & "'" & "WHERE CustomerID = "' & txtID.Text & "'"
If the user enters a company name that contains a single quote, such as B's Beverages, the command will become the following:
UPDATE Customers SET CompanyName = 'B's Beverages' WHERE CustomerID = 'BSBEV'
If you attempt to execute this statement, SQL Server will reject it because it contains a syntax error (you should be able to figure out the error easily by now). The exact error message is as follows:
Msg 102, Level 15, State 1, Line 1 Incorrect syntax near 's'. Msg 105, Level 15, State 1, Line 1 Unclosed quotation mark after the character string ".
The single quote is used to delimit literals, and there should be an even number of single quotes in the statement. The compiler determines that there's an unclosed quotation mark in the statement and doesn't execute it. If the same statement was written as a parameterized query, such as the following, you could pass the same company name to the statement as an argument without a hitch:
CMD.CommandText = "UPDATE Customers SET CompanyName = @CompanyName " & "WHERE CustomerID = @ID" CMD.Parameters.Add("@CompanyName", SqlDbType.VarChar, 40).Value = "B's Beverages" CMD.Parameters.Add("@ID", SqlDbType.Char, 5).Value = "BSBEV" CMD.ExecuteNonQuery
The same is true for other special characters, such as the percentage symbol. It's possible to escape the special symbols; you can replace the single-quote mark with two consecutive single quotes, but the most elegant method of handling special characters, such as quotation marks, percent signs, and so on, is to use parameterized queries or stored procedures. You just assign a string to the parameter and don't have to worry about escaping any characters; the Command object will take care of all necessary substitutions.
The values you assign to the arguments of a query or stored procedure usually come from controls on a Windows form and in most cases from TextBox controls. The following statement reads the text in the txtFax
TextBox control and assigns it to the @FAX
parameter of a Command object:
Command.Parameters("@FAX").Value = txtFax.Text
But what if the user has left the txtFax
TextBox blank? Should we pass to the INSERT
statement an empty string or a null value? If you collect the values from various controls on a form and use them as parameter values, you'll never send null values to the database. If you want to treat empty strings as null values, you must pass a null value to the appropriate parameter explicitly. Let's say that the txtFax
TextBox control on the form corresponds to the @FAX
parameter. You can use the IIf()
statement of Visual Basic to assign the proper value to the corresponding parameter as follows:
UPDATECommand.Parameters("@FAX").Value = IIf(txtFax.Text.Trim.Length = 0, System.DBNull.Value, txtFax.Text)
This is a lengthy statement, but here's how it works: The IIf()
function evaluates the specified expression. If the length of the text in the txtFax
control is zero, it returns the value specified by its second argument, which is the null value. If not — in other words, if the TextBox control isn't empty — it returns the text on the control. This value is then assigned to the @FAX
parameter of the UPDATECommand
object.
To read the rows returned by a selection query, you must call the Command object's ExecuteReader
method, which returns a DataReader object (a SqlDataReader object for queries executed against SQL Server). The DataReader is a stream to the data retrieved by the query, and it provides many methods for reading the data sent to the client by the database. The underlying Connection object remains open while you read the data off the DataReader, so you must read it as quickly as possible, store it at the client, and close the connection as soon as possible.
To read a set of rows with the DataReader, you must call its Read
method, which advances the pointer to the next row in the set. Initially, the pointer is in front of the first row, so you must call the Read
method before accessing the first row. Despite its name, the Read
method doesn't actually fetch any data; to read individual fields, you must use the various Get
methods of the DataReader object, described next (GetDecimal, GetString
, and so on). After reading the fields of the current row, call the Read
method again to advance to the next row. There's no method to move to a previous row, so make sure you've read all the data of the current row before moving to the next one. Table 15.4 explains the basic properties and methods of the DataReader object.
Table 15.4. Properties and methods of a DataReader object
Name | Description |
---|---|
| This is a Boolean property that specifies whether there's a result set to read data from. If the query selected no rows at all, the |
| This property returns the number of columns in the current result set. Note that the |
| This method moves the pointer in front of the next row in the result set. Use this method to read the rows of the result set, usually from within a |
| There are many versions of the |
| There are many versions of the |
| If you can't be sure about the type of a column, use the |
This method reads all the columns of the current row and stores them into an array of objects, which is passed to the method as an argument. This method returns an integer value, which is the number of columns read from the current row. | |
| Use this method to retrieve the name of a column, which must be specified by its order in the result set. To retrieve the name of the first column, use the expression |
| This is the counterpart of the |
| This method returns True if the column specified by its ordinal in the current row is null. If you attempt to assign a null column to a variable, a runtime exception will be thrown, so you should use this method to determine whether a column has a value and handle the null values from within your code. |
Note that you can't reset the DataReader object and reread the same result set. To go through the same rows, you must execute the query again. However, there's no guarantee that the same query executed a few moments later will return the same result set. Rows may have been added to, or removed from, the database, so your best bet is to go through the result set once and store all the data to a structure at the client computer's memory. Moreover, while you're using the DataReader, the connection to the server remains open. This means that you shouldn't process the data as you read it, unless it is a trivial form of processing, such as keeping track of sums and counts. If you need to perform some substantial processing on your data, read the data into an ArrayList or other structure in the client computer's memory, close the connection, and then access the data in the ArrayList. In the following chapter, you'll learn about the DataSet object, which was designed to maintain relational data at the client. The DataSet is a great structure for storing relational data at the client; it's almost like a small database that resides in the client computer's memory. However, the DataSet is not ideal for all situations.
Listing 15.3 shows the code that retrieves all products along with category names and supplier names and populates a ListView control. The ListView control's columns aren't specified at design time; the code adds the appropriate columns at runtime (as long as the View
property has been set to Details
). The code goes through the columns of the result set and adds a new column to the ListView control for each data column. Then it reads the rows returned by the query and displays them on the control. The statements in Listing 15.3 are part of the SimpleQueries sample project.
Example 15.3. Displaying product information on a ListView control
Dim Command As New SqlCommand Command.Connection = CN ' a simple SELECT query Command.CommandText = "SELECT ProductName AS Product, " & "CategoryName AS Category, " & "CompanyName AS Supplier, UnitPrice AS Price " & "FROM Products LEFT JOIN Categories " & "ON Products.CategoryID = Categories.CategoryID " & "LEFT JOIN Suppliers ON Products.SupplierID = Suppliers.SupplierID" Connection.Open() Dim count As Integer = 0 Dim Reader As SqlDataReader Reader = Command.ExecuteReader ListView1.Clear() Dim i As Integer ' setup ListView control to display the headers ' of the columns read from the database For i = 0 To Reader.FieldCount - 1 ListView1.Columns.Add(Reader.GetName(i), 130) Next While Reader.Read Dim LI As New ListViewItem LI.Text = Convert.ToString(Reader.Item("Product")) LI.SubItems.Add(Convert.ToString( Reader.Item("Category"))) LI.SubItems.Add(Convert.ToString( Reader.Item("Supplier"))) LI.SubItems.Add(Convert.ToString( Reader.Item("Price"))) ListView1.Items.Add(LI) Count += 1 End While MsgBox("Read " & count.ToString & " Product rows") Connection.Close()
Another interesting aspect of the DataReader
object is that you can use it to read multiple result sets, such as the ones returned by multiple queries. You can execute a batch query such as the following with a single Command object:
Command.CommandText = "SELECT * FROM Customers; SELECT * FROM Employees" Dim Reader As SqlDataReader = Command.ExecuteReader
We'll use the same DataReader object to read the rows of both tables, but we need to know when we're finished with the first result set (the customers) and start reading the second result set. The NextResult property of the DataReader does exactly that: After exhausting the first result set (by iterating through its rows with the Read
method), we can request the NextResult
property to find out whether the DataReader contains additional result sets. If so, we can start reading the next result set with the Read
method. Here's the outline of the code for reading two result sets from the same DataReader:
While Reader.Read ' read the fields of the current row in the 1st result set End While If Reader.NextResult While Reader.Read ' read the fields of the current row in the 2nd result set End While End If
In this section I'll put together all the information presented so far in this chapter to build a data-driven application — an application that actually talks to a database, retrieves data, and displays it on a Windows form. The application is not new to you. In Chapter 5, "Basic Windows Controls," you created an application for maintaining a list of contacts, based on a ListBox control, where you stored the names of the contacts. The same ListBox control was also used as a navigational tool, because users could select a contact and view their details, as shown in Figure 15.8.
The contacts were stored in a List of Contact objects, and they were persisted to a file. Now let's revise this application so that it works with a database, namely, the Customers table of the Northwind database. The columns of the Customers table are almost identical to the fields of the Contact object, so you'll use the same data type to store the data. Instead of persisting the data to a file, you'll read your contacts from the database and submit the edited rows back to the same database. Because you'll use a central data storage, the revised application can be used by multiple users at the same time.
Start by copying the Contacts project folder to a new folder with a different name. Then edit the menu, as follows:
Delete the New and Save items, because the new application doesn't maintain a copy of the contacts in memory. Instead, it retrieves the requested contact from the Customer table as requested and submits the updates to the database as soon as the user edits an existing contact or inserts a new one. The New command is meaningless with a database. You'll never have to remove all rows from a table, unless you're reinitializing a test database, in which you will probably run a script that initializes all tables.
Change the Open command of the File menu to Load Customers. Instead of loading the contacts from a file, we'll load them from the Customers table of the Northwind database.
Add the CustomerID field to the main form, because the Customers table uses user-supplied strings as primary keys.
Most of the code remains the same. You only need to replace the routines that perform the basic operations against the database. We no longer read the data from an external file. Instead, we must execute a query against the database and retrieve the company names along with their IDs. The Load command's code follows in Listing 15.4.
Example 15.4. Loading the Customers table
Private Sub LoadToolStripMenuItem_Click(...) Handles LoadToolStripMenuItem.Click CMD.CommandText = "SELECT * FROM Customers" CN.Open() Dim RDR As SqlDataReader RDR = CMD.ExecuteReader ListBox1.Items.Clear() While RDR.Read Dim C As New Contact C.CustomerID = RDR.Item("CustomerID") C.CompanyName = RDR.Item("CompanyName") C.ContactName = Convert.ToString( IIf(RDR.IsDBNull( RDR.GetOrdinal("ContactName")), "", RDR.Item("ContactName"))) C.Address1 = Convert.ToString( IIf(RDR.IsDBNull( RDR.GetOrdinal("Address")), "", RDR.Item("Address"))) ' Similar statements for the remaining fields ListBox1.Items.Add(C)
End While CN.Close() currentContact = 0 ShowContact() End Sub
The code executes a simple SELECT
query against the database and then loads the ListBox control with Contact objects as before. The only difference is that now the data comes from the DataReader. You will also notice the lengthy expressions that assign the values read from the database to the TextBox controls on the form. The expression RDR.Item(field_name)
reads the corresponding column's value from the current row. This row, however, may be null, and you can't assign a null value to the Text
property. So, the code uses the IIf()
function to determine whether the current field's value is null. If the method IsDBNull
returns True, then the current field is null, and the IIf()
function returns an empty string. If not, it returns the actual field value, which is the expression RDR.Item(field_name)
. But what about the call to the GetOrdinal
method? Well, the IsDBNull
method doesn't accept a column name as an argument, only an integer that is the order of the column in the result. Instead of hard-coding numeric values to our code (and make it impossible to maintain later), we can retrieve the ordinal of a column with the GetOrdinal
method. The expression GetOrdinal("CustomerID")
returns 0 because the CustomerID
column is the first one, GetOrdinal("CompanyName")
returns 1, and so on. Finally, because the IIf()
function returns an object, we must cast it to a string before assigning it to the Text property.
Now, this is the type of code that can be streamlined, and there are tools that do it for you. Even better, there are tools that convert each row to a custom object (a Contact object, in our example), so you can handle the data you retrieve from the database with the object-oriented techniques you have learned in this book. The new data access technologies, including LINQ to SQL that was discussed in Chapter 14, "An Introduction to LINQ," bridge the gap between databases and object-oriented programming. As you have already noticed, there's a mismatch between objects and the way we access databases. Having to set up a DataReader, execute commands directly against the database, and then having to worry about null and converting the fields to the proper type takes a lot of code, and the entire approach just isn't elegant. You'll see better techniques for accessing databases in the following chapter, but I wanted to show you the basic data access mechanisms first (actually you already saw how to access databases with LINQ to SQL queries in the preceding chapter).This may not be so elegant, but they're the foundation on which more elaborate tools were built. Moreover, they're the fastest way to get data out of a database and back. That said, let's continue with our sample application.
Next, we must implement the procedures for adding a new contact (the SubmitContact
subroutine), for updating an existing contact (the UpdateContact
subroutine), and for deleting a contact (the RemoveContact
subroutine). The first two subroutines accept as an argument a variable of the Contact type, form the appropriate SQL statement, and execute it against the database. The RemoveContact
subroutine accepts the ID of the row to be deleted, forms the appropriate DELETE
statement, and executes it likewise. You may wonder why I haven't implemented these routines as functions that return a True/False value indicating whether the operation completed successfully. The reason is that a simple indication about the success of an operation won't suffice; we need to display a more specific error message to the user, so I've decided to throw an exception with the error description, as shown in Listings 15.5, 15.6, and 15.7.
Example 15.5. Adding a new row to the Customers table
Private Sub SubmitContact(ByVal C As Contact) CMD.CommandText = "INSERT Customers " & "(CustomerID, CompanyName, ContactName, Address, " & " City, Region, PostalCode, Country) " & "VALUES (@CustomerID, @CompanyName, @ContactName, " & "@Address, @City, @Region, @PostalCode, @Country) " CMD.Parameters.Clear() CMD.Parameters.AddWithValue("@CustomerID", C.CustomerID) CMD.Parameters.AddWithValue("@CompanyName", C.CompanyName) CMD.Parameters.AddWithValue("@ContactName", C.ContactName) CMD.Parameters.AddWithValue("@Address", C.Address1) CMD.Parameters.AddWithValue("@City", C.City) CMD.Parameters.AddWithValue("@Region", C.State) CMD.Parameters.AddWithValue("@PostalCode", C.ZIP) CMD.Parameters.AddWithValue("@Country", C.ZIP) CN.Open() Try CMD.ExecuteNonQuery() Catch ex As Exception Throw New Exception( "Failed to update contact in database. " & vbCrLf & "ERROR MESSAGE: " & vbCrLf & ex.Message) Finally CN.Close() End Try CN.Close() End Sub
Example 15.6. Updating a row in the Customers table
Private Sub UpdateContact(ByVal C As Contact) CMD.CommandText = "UPDATE Customers " & "SET CompanyName = @CompanyName, " & " ContactName = @ContactName, " & " Address = @Address, " & " City = @City, " & " Region = @Region, " & " PostalCode = PostalCode" & " Country = @Country " & "WHERE CustomerID = @CustomerID" CMD.Parameters.Clear() CMD.Parameters.AddWithValue("@CustomerID", C.CustomerID) CMD.Parameters.AddWithValue("@CompanyName", C.CompanyName) CMD.Parameters.AddWithValue("@ContactName", C.ContactName)
CMD.Parameters.AddWithValue("@Address", C.Address1) CMD.Parameters.AddWithValue("@City", C.City) CMD.Parameters.AddWithValue("@Region", C.State) CMD.Parameters.AddWithValue("@PostalCode", C.ZIP) CN.Open() Try CMD.ExecuteNonQuery() Catch ex As Exception Throw New Exception( "Failed to update contact in database. " & vbCrLf & "ERROR MESSAGE: " & vbCrLf & ex.Message) End Try CN.Close() End Sub
Example 15.7. Removing a row from the Customers table
Private Sub RemoveContact(ByVal ContactID As String) CMD.CommandText = "DELETE Customers WHERE CustomerID=@contactID " CMD.Parameters.Clear() CMD.Parameters.AddWithValue("@contactID", ContactID) CN.Open() Try CMD.ExecuteNonQuery() Catch ex As Exception Throw New Exception( "Failed to delete contact in database. " & vbCrLf & "ERROR MESSAGE: " & vbCrLf & ex.Message) Finally CN.Close() End Try End Sub
The rest of the code is basically the same. We started with an application that manipulates Contact objects in a List and converted it to an application that manipulates the rows of a database table. We changed the routines that read data and submit data to the database. The routines that read the data from the database create Contact objects that are stored to the ListBox control as with the original application. Likewise, every time the user inserts a new contact or updates an existing one, instead of modifying an item in the List control, we submit it to the database and update the underlying source (the Customers table). Finally, when a contact is removed, the application removes it directly from the Customers table. The deletion operation may fail if the customer has placed an order. As you realize, we're no longer dealing with an isolated table but with a larger system with related tables, and the DBMS maintains the integrity of our data.
This application has a serious flaw, though. What if the Customers table has thousands of customers and there are a few dozen users who may need access the database? Would it make sense to download the entire table to the client and maintain such an enormous list of customers at every client? The first improvement I'd suggest is to download only the customer names and display them on the ListBox control. Every time the user clicks a different customer's name, your code should execute a SELECT
statement for the specific customer and bring the entire row to the client and display its fields.
Even so, the table may grow so large that it wouldn't make sense to display all company names in a list. In this case, you should provide some selection mechanism to force users to download the names of selected companies only. For example, you could display a list of countries and download only customers from the selected customer every time users selected another country. You could also filter by company or contact name or provide combinations of criteria. If you decide to modify the application, here are the SELECT
statements that limit customers by country and company name:
SELECT CustomerID, CompanyName FROM Customers WHERE Country = @country SELECT CustomerID, CompanyName FROM Customers WHERE CompanyName LIKE @name + '%'
(The percent symbol at the end of the string is a wildcard, indicating that the company name should start with the string stored in the @name
variable, followed by any other combination of characters.)
One last hitch in this application is the following: What will happen if other users delete a contact you're editing or someone else updates the contact you're editing? Issues of concurrency are very important in designing data-driven applications, and we'll look at this in the following chapter. The Contacts project assumes that the same row won't be edited by multiple users at once. Even when this happens, the last user to save the edits overwrites the edits of the other users. It's a bit of a crude approach, but it's quite appropriate in many situations. In other situations, most notably in reservation applications, it's unacceptable.
Relational databases store their data in tables and are based on relationships between these tables. The data are stored in tables, and tables contain related data, or entities, such as persons, products, orders, and so on. Relationships are implemented by inserting columns with matching values in the two related tables.
How will you relate two tables with a many-to-many relationship?
Structured Query Language (SQL) is a universal language for manipulating tables. SQL is a nonprocedural language, which specifies the operation you want to perform against a database at a high level, unlike a traditional language such as Visual Basic, which specifies how to perform the operation. The details of the implementation are left to the DBMS. SQL consists of a small number of keywords and is optimized for selecting, inserting, updating, and deleting data.
How would you write a SELECT
statement to retrieve selected data from multiple tables?
ADO.NET is the component of the Framework that provides the mechanism to contact a database, submit queries, and retrieve the results of a query. There are two types of queries you can execute against a database: selection queries that retrieve data from the database based on certain criteria you supply and action queries that manipulate the data at the database. Action queries do not return any results, except for the number of rows that were affected by the query.
Describe the basic mechanism for submitting a selection query to the database and reading the results returned by the query.
18.224.54.136