D. Accessing Data with ADO.NET

An important part of any real-world application is obtaining and managing the data that drives the application. The application could be something as simple as a game that is retrieving and storing user preferences and scoring data. At the other end of the spectrum, it could be a complex business application that retrieves, manipulates, and updates data from multiple data sources over a network or the Web and within the scope of a distributed transaction using asynchronous message-based communications. The spectrum of what constitutes data access is extremely broad and complicated, and is the topic of many other books.

This book, on the other hand, is focused on the presentation of data using Windows Forms, and this appendix isn’t intended to teach you everything there is to know about data access. What I do want to accomplish is to give you a solid foundation in how to get data into and out of your Windows Forms application architecture. I also want to ensure that you understand enough about ADO.NET data access to follow the samples in the chapters of the book and, more importantly, how the different forms of data affect the presentation of that data.

So basically I am going to show you just enough to make you dangerous. You should be able to develop data-bound UI prototypes without needing a data access expert to come in and build you a data layer before you can get started designing and coding the presentation tier. On the other hand, you shouldn’t build a client application that is tightly bound to the data tier. When it comes to data access, there are a lot of different ways to accomplish the same thing. This appendix covers data access at the lower levels—the basics of data access with ADO.NET. Chapter 2 describes how to work with typed data sets and how to do the kind of data access that you will want to use most often in your data applications.

If you have a solid foundation in ADO.NET, you can feel free to skip this appendix or just use it as a quick review of core concepts. If you will be coding up a data access layer, I’d suggest you pick up some data access books to learn all the various options and best practices for designing a data layer and handling more complex scenarios. (Pragmatic ADO.NET by Shawn Wildermuth [Addison-Wesley, 2003] is an outstanding place to start.)

Data can take on many forms and come from many sources when coding with ADO.NET. This appendix only focuses on two sources of data: relational data coming from a database like Microsoft SQL Server, and XML coming from a file or stream. The two worlds are certainly not mutually exclusive. SQL Server includes a lot of functionality for generating and consuming XML, and SQL Server 2005 includes a native data type for storing XML. You can load XML into a DataSet and work with it as relational data, and you can transfer relational data into an XmlDataDocument and work with it as an XML document. You might also load XML and iterate through the nodes in a document to extract data that you put into business objects that you then use for data binding.

How you code against relational and XML data is very different, as is how you might use them for data binding. This appendix focuses on the various ways to get data into business entities and data sets that you can use in your Windows Forms applications to present that data, and how to take any updates made to the data and push them back to the underlying data store.

Because this is such a dense topic, I am going to have to make a few assumptions about your knowledge. First, I assume that you understand the basic concepts behind storing data in a database and executing queries to retrieve and update it. Second, I will assume you have some basic knowledge of XML storage, the structure of a document, and the kinds of nodes that a document can contain. You certainly don’t need to be an expert in these things, but you should at least have enough background to understand the terminology and what the basic processes and capabilities of those technologies are.

Relational Data Access

When dealing with relational data access, you generally follow the same process each time you need to touch the underlying data store to retrieve data or update it. You need

1.   A connection to the data store.

2.   A way to express the query that you are going to perform.

3.   Something to contain any returned results from the query.

The classes defined in ADO.NET are used to encapsulate the state and behavior associated with each of these steps.

A key concept with relational data access in .NET is that it is based on a disconnected data model. Obviously you can’t execute a query if you aren’t really connected to the database. Disconnected data access means that you are expected to open a connection, use that connection to issue a query or a set of queries, and then close the connection. You may continue to work with the data resulting from the query for some time on the client. At some point in the future, you may reconnect to the database to either issue update queries or to refresh the data you have, but you don’t keep a connection open to the database the entire time you are working with the data. This makes it a very different data model to work with than legacy database APIs, which typically favored a connected model. This is also different from the concept of offline operations of a smart client application. Offline capabilities are focused on being able to cache the operational data on the client to allow it to continue to support certain use cases even when not connected to a network. Disconnected data access generally assumes you have a network connection available, but that you just aren’t keeping database connections open throughout the functioning of your application.

When dealing with data types in ADO.NET, there are two kinds of types that you come across: generic data classes and provider-specific classes The generic relational data classes and interfaces implement abstract constructs in the relational model—tables, rows, relations, constraints—and can be used to contain data coming from a variety of data sources. These generic data classes are all part of the System.Data namespace, which is the root namespace for ADO.NET class library constructs.

For the generic classes to be useful, though, you usually need to be able to connect to a data store and perform the queries that populate the generic data containers or update the underlying data store based on changes in them. The classes you use to make that connection and execute queries are the provider-specific classes that reside in child namespaces of System.Data and are specific to each provider. Table D.1 lists the providers that ship with the .NET Framework.

TABLE D.1: Managed Data Providers in the .NET Framework

Image

Image

The managed provider classes are expected to implement a common set of interfaces that are defined in the System.Data namespace. These interfaces define the basic methods and properties for a set of classes that every managed provider is expected to implement. These include classes for establishing data source connections, creating commands to execute queries, creating data adapters used to fill data sets, and creating parameters that you associate with a command to pass to a query. In ADO.NET 2.0, there is also a set of abstract base classes for data providers in the System.Data.ProviderBase namespace that lets you program against different data sources in a provider-agnostic way. This allows you to switch out the underlying data source without needing to change any of your application code. This topic is a little beyond the scope of what I cover here, but you should be aware of the capability.

However, if you need to tap into provider-specific features such as XML handling in SQL Server, then you will need to program against the specific provider classes instead of the interfaces or generic classes for those capabilities. In general, you should pick a consistent approach in coding against the provider objects instead of switching back and forth between sometimes using the interface and sometimes using the provider-specific classes. You will have to figure out what the best approach is based on your application needs and the degree to which you want your code to be portable across multiple providers.

As a simple example of working with a specific provider, the following code shows how to open a connection to a SQL Server database and execute a command to fill a DataSet with a SqlDataAdapter:

private DataSet GetCustomersFromDb( )
{
   // Create the disconnected data store
   DataSet ds = new DataSet( );

   // Create the connection object
   SqlConnection conn = new SqlConnection(
   "server=localhost;database=Northwind;trusted_connection=true");

   // Create the command that wraps the query
   SqlCommand cmd = new SqlCommand(
      "SELECT CustomerID, CompanyName FROM Customers", conn);

   //Create the DataAdapter that bridges the provider-
   // agnostic DataSet and the provider-specific connection
   // and command
   SqlDataAdapter adapter = new SqlDataAdapter(cmd);

   // Opens the connection, executes the query,
   // puts the result in the DataSet, closes the connection
   adapter.Fill(ds, "Customers");

   // Return the result
   return ds;
}

I’ll get into more details on data sets and data adapters shortly, but the intent here is to start with the basics. The process is:

1.   Create a data set object that you are going to populate.

2.   Create a connection object for the data store that you are going to operate against. Note that creating an instance of a connection object doesn’t open the connection.

3.   Create a command object that encapsulates the query that you are going to execute and associate it with the connection in its constructor.

4.   Create a data adapter and associate it with the command that will be used to populate it.

5.   Call the Fill method to populate the data set and return the results.

I’ll get into more detail on all the magic that is happening in that Fill method later in this appendix. At this point I just want you to get an overview.

When you need to get an entire set of data into memory, use a data set so you can deal with it as a whole on the client or in the middle tier. Each managed provider also implements another way to retrieve the data from a query, called a data reader. For example, the SQL Server managed provider implements a class called SqlDataReader. You can think of a data reader as a fast, forward-only, fire hose-like cursor that you can use to iterate through the results of a query. There are ways to bind a data reader to Windows Forms controls through a BindingSource component, but it isn’t a practice I would encourage, so I won’t go into a lot of detail on it. Binding a data reader to Windows Forms controls tightly couples your presentation tier with your data tier. If it is a very small-scale application, and you are looking for the fastest way to get data presented to the user, that may be okay. However, for enterprise-scale business applications, that is something you should avoid. Using data readers in the presentation tier potentially keeps the connection open longer than necessary, limiting scalability, and also tightly couples the client to the data schema of the data tier. However, data readers are the fastest way to iterate through the rows returned from a query, so you may want to use them if you are going to push the results of a query into a collection of business object classes instead of using data sets. I’ll show an example of this later in this appendix.

The Ubiquitous DataSet

The most common data type you will deal with in Windows Forms data binding is the DataSet, or a derived typed data set class. Data binding to custom objects and collections is also very easy in .NET 2.0, so that will be a very common approach as well and is covered in more detail in several places in the book. But data sets are specifically designed for data binding in .NET, and this appendix is focused on relational data access, so let’s get our hands dirty there first.

A data set basically provides you with an in-memory data collection that can be used for everything from containing a single row of data to a complex schema of many tables with relations and constraints between them. Some would even go so far as to call it an in-memory database, but there are hazards in thinking of it that way if you start bringing too much data into memory. Figure D.1 shows the structure of a data set and the objects that are contained within it.

FIGURE D.1: DataSet Structure

DataSet Structure

For simple tabular access to data returned from a query, you will usually only deal with a single data table in a data set, and that will be the target of your data binding. When you do so, the data set just becomes a fancy wrapper around a set of rows. You can actually just create a data table and populate it, and .NET 2.0 lets you create DataTable objects on their own, without a containing data set.

There are also many scenarios where you may want to retrieve rows from a table that have a parent-child relationship with rows from another table. In those cases, you will start to deal with data sets containing multiple data table objects, one or more data relation objects that maintain the parent-child relationship between the tables, and constraint objects that enforce the integrity of the contained data. A data table object contains a collection of data columns that describe the name and type of each column in a table (the metadata of the column), and it also contains a collection of data row objects that contain the actual data itself.

You can populate a data set in one of three primary ways: by loading it from a file, programmatically, or through a query to a database. In .NET 2.0, you can also work with DataTable objects in isolation from a data set, and you can also load a data table using a data reader with the Load method. Let’s go from the simplest to the most complex by starting with loading and saving data sets from files, specifically XML files.

Loading Data Sets from a File

The DataSet class supports two methods for saving to and loading from a file: WriteXml and ReadXml. If you have data in a data set and you want to save it to disk or a stream, you call WriteXml. When you want to read it back in, you call ReadXml. Because the underlying file format is XML, you could create an XML document some other way that is readable by a data set. It actually doesn’t have to come from a file at all; the ReadXml method will accept a stream, which could represent a file, network stream, or memory stream. For example, you might receive an XML document over a Web service that you could read into a data set and then display it in a DataGridView control. There is also a GetXml method that lets you get back the contents of a data set as an XML string.

The ReadXml method is fairly adept at inferring an appropriate schema from a simple XML file and being able to transform that XML into a data set. It will treat the root element as the container for the data set. It will then treat each element type that contains other elements as a row from a table, and each element that contains only text as a column in that row. It can also handle schemas where the elements represent rows from a table, but the column values are encoded as attributes on that element instead of child elements. For example, the straightforward XML in Listing D.1 will be loaded into a data set that contains a table of Customers and a table of Orders, and will have foreign key constraints and relationships between the two tables representing the nested nature. The corresponding relational schema is shown in Figure D.2.

LISTING D.1: Customers Data in XML


<?xml version="1.0" standalone="yes"?>
<CustomersDataSet>
  <Customers>
    <CustomerID>ALFKI</CustomerID>
    <CompanyName>Alfreds Futterkiste</CompanyName>
    <Orders>
      <OrderID>10643</OrderID>
    </Orders>
    <Orders>
      <OrderID>10692</OrderID>
    </Orders>
  </Customers>
  <Customers>
    <CustomerID>ANATR</CustomerID>
    <CompanyName>Ana Trujillo Emparedados y helados</CompanyName>
    <Orders>
      <OrderID>10308</OrderID>
    </Orders>
    <Orders>
      <OrderID>10625</OrderID>
    </Orders>
    </Customers>
  </CustomersDataSet>


FIGURE D.2: Relational Schema for XML Customers and Orders Document

Relational Schema for XML Customers and Orders Document

In the code in Listing D.1, if the XML contains an embedded XML schema, the data set will try to use it to establish the relational schema of the tables and relationships, and then will load the data into that schema. If the file was created by saving an existing data set, then things should work out fine when you read it back in. However, not every XML file is suitable for loading into a data set. If you have a schema where the same element type can exist at multiple levels in the hierarchy, you will get an ArgumentException if you try to load it into a data set, because the data set will fail to coerce it into a relational schema. For example, if you added a Customers element as a child to an Orders element in the XML in Listing D.1 and tried to read it in with ReadXml, an ArgumentException would be thrown. Additionally, if you have malformed XML, an XmlException will be thrown when you call ReadXml, so you will definitely want to have the call to ReadXml wrapped in an exception handler.

To read a data set in from XML, the code is quite simple:

private void LoadXmlDataSet(string xmlFilePath)
{
   try
   {
      DataSet ds = new DataSet( );
      ds.ReadXml(xmlFilePath);
   }
   catch (ArgumentException ex)
   {
      MessageBox.Show("Error reading XML into DataSet: " +
      ex.GetType( ).ToString( ) + " - " + ex.Message);
   }
}

The ReadXml method can take a path to a file, an open stream object, or an XML or text reader object. It also optionally takes an XmlReadMode enumeration that controls how the inference of the schema and XML operate.

Creating a Data Set Programmatically

To create and populate a data set programmatically, you need nothing more than to understand the object model and how to call methods and properties on those objects. Programmatically constructing data sets and populating them is fairly unusual, because it requires tedious and error-prone coding that will need to be updated any time the schema of your data changes. About the only time you will probably want to create data sets programmatically is if you have some hard-coded or nonstructured information in your program that you want to present in a tabular form; then you might stuff it into a data set so that you can bind the data set to a DataGridView control. But in that case it would be better to separate that structure into an XML file and load it from there so that you don’t have to change your source code to change the structure or contents of the presented data. Another case where you might programmatically deal with a data set in this way is if you want to add a computed column to a table within a data set. The code in Listing D.2 shows how to create a simple data set, including a computed column, and load some data into it.

LISTING D.2: Programmatic DataSet Creation and Population


private DataSet CreateProgrammaticDataSet( )
{
   // Create the DataSet
   DataSet ds = new DataSet( );

   // Create a table with three columns
   DataTable dt = new DataTable("Customers");
   DataColumn col1 = new DataColumn("CustomerID",typeof(int));
   DataColumn col2 = new DataColumn("CompanyName", typeof(string));
   DataColumn col3 = new DataColumn("Computed Column",
      typeof(string), "CustomerID + ':' + CompanyName");
   dt.Columns.Add(col1);
   dt.Columns.Add(col2);
   dt.Columns.Add(col3);

   // Create a primary key constraint on the ID column
   UniqueConstraint pk = new UniqueConstraint("PK_Customers",col1,true);

   // Add the table to the DataSet
   ds.Tables.Add(dt);

   // Add a row to the DataSet
   DataRow row = dt.NewRow( );
   row["CustomerID"] = 1;
   row["CompanyName"] = "Fred";
   dt.Rows.Add(row);

   // Return the results
   return ds;
}


The code in Listing D.2 constructs a data table and adds the desired columns to it, as well as a primary key constraint. Note that the computed column uses a third argument to the DataColumn constructor, which is the expression to use to compute the value of that column at runtime. It adds the new table to the data set and then creates a new row. Once the values of the columns in the new row are set, the row is added to the table’s Rows collection. Note that you have to create the row from the table so that it will have the correct schema (columns) to be part of the table, but it doesn’t actually become part of the table until it is added to the Rows collection.

If you care about maintainability, you will probably cringe at the code that sets the values of the two columns through the row indexer. This indexer exposes an object reference, into which you can attempt to stuff any object. The compiler won’t be able to help you because it doesn’t have enough type information to figure out what the actual type of the column is, so you won’t know until runtime if you have tried to place an inappropriate value into the column. The other problem with this code is the hard-coded strings for the column names. Using column names in this case is more maintainable than the alternative, which is to pass the ordinal position of the column to the row indexer. However, if your schema changes, you are going to have to root out all the places in your code that you hard-coded column names, which is error prone at best. Typed data sets (described in Chapter 2) fix both of these problems, which is why you should favor them for most scenarios involving data sets.

Loading Data Sets from a Database

You may occasionally load data into a data set from an XML file that has been cached on the client machine, and you may even (more rarely) have occasion to programmatically create a data set as shown in the previous section. However, the vast majority of the time you will be using data sets in conjunction with some form of relational data store. When that is the case, you will be loading the data into a data set using one of the managed providers introduced earlier, or you will be using typed data sets, where the schema is built into the type instead of needing to be determined dynamically.

I am going to focus on the use of the SQL Server managed provider, as it is probably the most common database used in .NET applications. If you need to use the OLE DB, Oracle, or ODBC providers, the coding patterns are virtually identical, thanks to the fact that they are all based on the same interfaces. Keep in mind that the SQL Server managed provider can be used with either a full-up instance of SQL Server or with a SQL Server 2005 Express instance, which is just a scaled-down free version of the SQL Server engine.

I recommend that you use SQL Express for creating client-side or small application databases, rather than Microsoft Access databases. The SQL Express engine is much more robust for client-server applications, and you can use many features of the SQL Server engine that aren’t available in Access, such as stored procedures and triggers. You can easily establish a connection to a SQL Express database just by specifying a file path to the MDF file that contains the database.

You have already seen one case of loading a data set from a SQL Server table earlier in the section Relational Data Access. You can also use a SQL Server 2005 Express database connection, as shown in Listing D.3.

LISTING D.3: Filling a DataSet Through a SQL 2005 Express Connection


private DataSet GetCustomersFromDb( )
{
   // Create the DataSet
   DataSet ds = new DataSet( );

   // Create the connection to the SQL Express DB
   SqlConnection conn = new SqlConnection(
     @"server=.SQLEXPRESS;
     AttachDbFileName=|DataDirectory|SimpleDatabase.mdf;
     trusted_connection=true");

   // Create the command and the adapter that uses it
   SqlCommand cmd = new SqlCommand("SELECT * FROM Customers", conn);
   SqlDataAdapter adapter = new SqlDataAdapter(cmd);

   // Opens the connection, executes the query,
   // puts the result in the DataSet, closes the connection
   adapter.Fill(ds, "Customers");

   // Return the result
   return ds;
}


Let’s dissect the code in Listing D.3. First, to create a connection, you construct an instance of the SqlConnection class. The constructor for this class takes a connection string that can contain any number of parameters, as defined by the SQL Server managed provider. The set of allowable parameters in the connection string are determined by each managed provider and will be different for each. There are a number of different parameter names for some of the typical parameters; consult the documentation for the ConnectionString property of the SqlConnection class for a comprehensive list.

The most common parameters to use for a SQL Server database are the server, database, trusted_connection, user id, and password parameters. There are other parameter names that accomplish the same thing as these (such as pwd instead of password and Integrated Security instead of trusted_connection), so you can choose which ones you want to use for common connection scenarios. You can see from this example that with SQL Express 2005, you can use the AttachDBFilename parameter to specify the database based on a file path. The |DataDirectory| keyword is a special syntax that instructs the connection object to look in the working directory of the executing application. The connection string also specifies the server instance to be the default SQLEXPRESS instance, and trusted_connection=true uses integrated security, which will log you into the database using the credentials of the Windows account under which the application is running. You can also pass an explicit username and password in the connection string if you want to use a SQL Server login.

Normally, you wouldn’t want to hard-code connection strings in your application code as shown in Listing D.3. A better way is to place them into your configuration file, and both Visual Studio 2005 and .NET 2.0 support a rich new model for placing connection strings and other custom user and application settings in your configuration file, while still allowing you to programmatically access those settings in a strongly typed way. These features are discussed in detail in Chapters 2 and 5.

Once you have a connection to work with, you create a SqlCommand object, passing the SQL text query to execute and the connection to use to the constructor. A command object can be provided with SQL statements or with the name of a stored procedure to use. For real data access layers, I recommend you wrap your data access at the database level in stored procedures and only consume those stored procedures from your data layer. Doing so decouples your application code from the specific schema of your tables, preventing small changes to the underlying schema from affecting application code. Stored procedures also let you use SQL Server’s own security mechanisms to prevent direct access to the tables, and stored procedures sometimes have higher performance than dynamic SQL statements executed from the code in Listing D.3 (although the performance difference is insignificant for most queries in .NET). I have broken this guidance in many of the code samples in this book to be able to use the Northwind database with minimal modification, and so that you could clearly see what is being retrieved by the queries.

Image NOTE

When performing SQL text queries, you should specify the exact columns that you want returned from your query. Specifying * for the columns may return a lot more information than you need, affecting application performance. It also requires the command object to do more work to determine the exact schema of the result set that will be returned by the query.

Next, the code in Listing D.3 creates a SqlDataAdapter object, which is the bridge between the source-neutral data set and the specific data source that it will work with. In this case, I construct it by passing in the command object. Finally, I call Fill on the data adapter, which performs a number of steps. First, it sees if the connection in the underlying command object is already open. If not, the Fill method will open the connection. Next, it executes the query against the database. The adapter then takes the returned rows, determines if a DataTable with the required schema already exists in the data set, and if not creates one, and then places the returned rows into it. If the data adapter opened the connection, it will close it before returning from the method call. Unless you specify a table name when you fill the data set, the data table instance name will simply be Table, and as you add additional tables, they will be named Table1, Table2, and so on. Figure D.3 shows the entire stack typically used for working with data sets and relational data stores.

FIGURE D.3: The Relational Data Access Stack

The Relational Data Access Stack

Typically a query or stored procedure will only return a single table of results, but you could return multiple tables with a single query or stored procedure as shown in Listing D.4.

LISTING D.4: Loading Multiple Tables in a Single Query


private DataSet LoadMultipleDatasets( )
{
   // Create the DataSet and connection
   DataSet ds = new DataSet( );
   SqlConnection conn = new SqlConnection( );
   conn.ConnectionString =
    "server=localhost;database=Northwind;trusted_connection=true";

   // Create the command and the adapter
   SqlCommand cmd = new SqlCommand( );
   cmd.Connection = conn;
   cmd.CommandText = "SELECT * FROM Customers; SELECT * FROM Orders";

   SqlDataAdapter adapter = new SqlDataAdapter( );
   adapter.SelectCommand = cmd;

   // Opens the connection, executes the query,
   // puts the result in the DataSet, closes the connection
   adapter.Fill(ds);

   // Name the tables correctly
   ds.Tables[0].TableName = "Customers";
   ds.Tables[1].TableName = "Orders";

   // Return the result
   return ds;
}


In Listing D.4, two tables of results are returned in the Fill method, and the data set is populated with two tables, one for Customers and one for Orders. The TableName property of each of the resulting tables is then set to the appropriate string, because the default naming convention would just have them named Table and Table1. Note also the different syntax in creating the data objects in this method. Instead of specifying the various parameters for each object in their constructor, they are supplied through properties after construction instead.

Another way of getting the tables named correctly, as well as to handle the case when the name of a column in the database is different from the name of a column in the target data set, is to use the TableMappings property on the data adapter. The TableMappings property holds a collection of DataTableMapping objects, which is defined in the System.Data.Common namespace. Each DataTableMapping object lets you specify a mapping between the schema of a result set returned from the database and the schema of the data table into which the table adapter will place the results. The following code shows an example of using TableMappings to modify the table name and column names when the data adapter fills the data set.

public DataSet GetCustomersSchemaDiff( )
{
   // Create the data set
   DataSet data = new DataSet( );

   // Create the connection to the SQL Express DB
   SqlConnection conn = new SqlConnection(
     "server=localhost;database=Northwind;trusted_connection=true");

   // Create the command and the adapter that uses it
   SqlCommand cmd = new SqlCommand(
     "SELECT CustomerID, CompanyName FROM Customers", conn);
   SqlDataAdapter adapter = new SqlDataAdapter(cmd);

   // Set up the table mapping to name the table, and to
   // change CustomerID to ID, CompanyName column to Name
   DataColumnMapping[ ] columnMappings = new DataColumnMapping[2];
   columnMappings[0] = new DataColumnMapping("CustomerID", "ID");
   columnMappings[1] = new DataColumnMapping("CompanyName", "Name");
   DataTableMapping mapping = new DataTableMapping("Table",     "Customers", columnMappings);
   adapter.TableMappings.Add(mapping);

   // Opens the connection, executes the query,
   // puts the result in the DataSet, closes the connection
   adapter.Fill(data);

   return data; // one table named Customers, with columns ID, Name
}

In this code, the data coming in from the SELECT statement would have been placed in a data table named Table with columns CustomerID and CompanyName. The TableMapping makes it so the table will be named Customers instead, and it will have columns ID and Name with the respective columns from the database mapped into them.

Most of the data classes in ADO.NET give you the option of using overloaded constructors and methods to specify all of the parameters for the object to do its work, or you can set them through properties. It is really just a matter of preference. The properties approach shown earlier requires more lines of code, but is a little easier to read for some, and is easier to debug because you can step through each line and inspect the property values as they are set. I have favored the constructor approach in this book to avoid killing too many trees in the production of this book. I’ll leave it up to you to decide the best approach for your code.

Loading a DataTable with a DataReader

I’m going to show how to use a data reader to load a custom object model later on, but while we are on the subject of loading data sets, I thought I should highlight a new capability in .NET 2.0. If all you are dealing with is a single table at a time, the need to create a whole data set and index into the tables just to get to your data can be a lot of unnecessary overhead. In .NET 2.0, you can create, populate, and use a DataTable on its own without ever needing to deal with the complexities of a data set. You can also fill it in a quick and efficient way using a data reader instead of having to go through a data adapter or table adapter (table adapters are discussed in Chapter 2). The following code demonstrates this in action.

public DataTable GetCustomers ( )
{
   string connStr =
     "server=localhost;database=Northwind;trusted_connection=true";
   DataTable customersTable = new DataTable("Customers");
   using (SqlConnection conn = new SqlConnection(connStr))
   {
     SqlCommand selectCmd = new SqlCommand(
       "SELECT CustomerID, CompanyName FROM Customers", conn);
     conn.Open( );
     SqlDataReader reader = selectCmd.ExecuteReader( );
     customersTable.Load(reader);
   }
   return customersTable;
}

There are a number of new concepts in this one block of code, so let’s step through them one at a time. The first thing the code does is to create an empty data table and name it Customers. Any time you need to execute a query, you need a connection object, so that is created next within a using statement. If you execute a command on your own (not through a data adapter), you will have to explicitly open the connection first. Any time you open a connection, you need to make sure you close it when you are done, and you need to make sure that happens even if an exception is thrown in the code that precedes the call to Close. One way to do that is to put the code that opens the connection and uses it in a try block, with the call to Close in a finally block. Another way that is used in this code is the using statement in C#.

In this code, the connection object is created in a using block, which causes the Dispose method to be called on the connection when the scope of the using block is left. The Dispose method on a connection object calls Close, and a using block will always set up a try-finally block for you, calling Dispose in the finally block. So the using block approach gives you a clean, readable, safe way to use your connection objects and make sure they are closed before the scope is left.

The next thing that is created is the command object that will execute the SELECT statement used to fill the data reader. That is created just like in the previous examples—by passing the SQL statement and the connection object. The connection is then opened, and the command is executed with the ExecuteReader method. This method returns a SqlDataReader object containing the results of the query. The data reader is then passed to the new Load method on the data table, which will iterate over the data reader to pull its contents into the data table.

Master-Details DataSets

As mentioned earlier, you can do a lot more with multiple tables in a single data set than just containing them as a collection of tables. You can also create parent-child relationships between them, and add primary, unique, and foreign key constraints. Data sets containing this kind of parent-child relationship are often called master-details data sets, because they are typically used in data-binding scenarios where the selection of a row in the parent table (the master) results in the display of all of the associated child rows (the details). The method in Listing D.5 again retrieves both the Customers and Orders tables, this time using two separate data adapters, and creates a master-details or parent-child relationship between them.

LISTING D.5: Creating a Master-Details DataSet


private DataSet GetCustomersAndOrders( )
{
   DataSet ds = new DataSet( );

   SqlConnection conn = new SqlConnection(
    "server=localhost;database=Northwind;trusted_connection=true");

   SqlCommand customersCmd = new SqlCommand("SELECT CustomerID,
     CompanyName, ContactName FROM Customers", conn);

   SqlDataAdapter customersAdapter = new SqlDataAdapter(customersCmd);

   // Set the adapter to add primary key information
   customersAdapter.MissingSchemaAction =
     MissingSchemaAction.AddWithKey;

   SqlCommand ordersCmd = new SqlCommand("SELECT OrderID, CustomerID,

     OrderDate FROM Orders", conn);
   SqlDataAdapter ordersAdapter = new SqlDataAdapter(ordersCmd);

   // Set the adapter to add primary key information
   ordersAdapter.MissingSchemaAction = MissingSchemaAction.AddWithKey;

   // Add the customers to the DataSet
   customersAdapter.Fill(ds,"Customers");

   // Add the orders to the DataSet
   ordersAdapter.Fill(ds, "Orders");

   // Create the parent-child relation
   DataColumn parentCol = ds.Tables["Customers"].Columns["CustomerID"];
   DataColumn childCol = ds.Tables["Orders"].Columns["CustomerID"];
   DataRelation dr = new DataRelation(
     "Customers_Orders",parentCol,childCol);

   // Set the relation as nested for output XML
   // (Orders are child elements of Customers)
   dr.Nested = true;

   // Add the relation to the DataSet's collection
   // Adds a corresponding foreign key constraint as well
   ds.Relations.Add(dr);

   // Return the result
   return ds;
}


Setting the data adapter’s MissingSchemaAction property to AddWithKey before calling Fill will direct the data adapter to also determine the column(s) that comprise the primary key, if any, when the data adapter executes the query, and it will create corresponding unique constraints in the resulting table. To create the data relation, you need to specify what the primary key column in the parent table is, as well as the foreign key column in the child table. The easiest way to do this is using the DataColumn objects representing those columns. This will also implicitly specify which table is the parent table and which table is the child table, because columns can only belong to one table. Alternatively, you could specify the parent and child tables and columns by name. Adding the data relation also creates a corresponding foreign key constraint to enforce the rule that a corresponding row in the parent table must exist when a row in the child table refers to it. Notice that the code also sets the Nested property on the data relation to true. This only affects the XML that is output if you save the data set with WriteXml or get it as a string with GetXml. It would then show child rows as nested elements under the parent row elements.

Retrieving Data with Stored Procedures

When you work with stored procedures, you often need to provide parameters to them that affect their execution. The stored procedure might simply wrap a query that will return rows of data, or it might update, insert, or delete data in the database. In any of these cases, you usually have to tell the stored procedure which sets of data to operate on, and you do this through parameters.

Let’s start with a simple example. In the Northwind database, there is a stored procedure defined named CustOrderHist. You provide it with a customer ID, and it will return a result set containing a summary of products and their quantity ordered by that customer. To call a stored procedure using ADO.NET, you use the SqlCommand class, as demonstrated earlier in this appendix. However, you need to tell the command object that you are calling a stored procedure instead of providing a textual query. You also need to provide the command object with any parameters that need to be passed to the stored procedure that it will execute. You do this as shown in Listing D.6.

LISTING D.6: Retrieving Data Through a Stored Procedure


private DataSet GetCustomerOrderHistory(string custId)
{
   // Validate the input
   if (custId.Length != 5)
      throw new ArgumentException(
        "The customer ID must be a 5 character string.","custId");

   DataSet ds = new DataSet("CustomerOrderHistoryDataSet");

   SqlConnection conn = new SqlConnection(
     "server=localhost;database=Northwind;trusted_connection=true");

   // Create the stored proc command to get customer history
   SqlCommand custHistCmd = new SqlCommand("CustOrderHist", conn);
   custHistCmd.CommandType = CommandType.StoredProcedure;

   // Create the parameter for the customer ID
   SqlParameter custIdParam = new SqlParameter(
     "@CustomerID",SqlDbType.NChar,5);
   custIdParam.Value = custId;

   // Add the parameter to the command
   custHistCmd.Parameters.Add(custIdParam);

   SqlDataAdapter custHistAdapter = new SqlDataAdapter(custHistCmd);

   custHistAdapter.Fill(ds,"CustomerHistory");
   return ds;

}


In this case, you create the command object by specifying the name of the stored procedure you want to call instead of supplying a SQL query string, and you set the CommandType property on the command to StoredProcedure (the default is CommandType.Text). You then need to create a SqlParameter object (discussed later in this appendix) to encapsulate each of the parameters needed by the stored procedure and add them to the command’s Parameters collection. Once everything is set up, you call the Fill method on the data adapter as before, and it will call the specified stored procedure, placing the result set returned into a table in the data set.

Updating the Database Using Data Sets

So far we have only been looking at retrieving data from a database. What happens when you have modified the data and want to save those changes back to the database? Well, another powerful capability of the DataSet class is the ability to keep track of changes made to the data contained in it. Changes come in three forms. You can

•    Add new rows to a table

•    Delete rows from a table

•    Change the values in one or several of the columns of existing rows in a table

The data set handles this through a combination of holding two copies of each row, and through a RowState property on each row, which takes a value from the DataRowState enumeration. When you first retrieve a set of rows from the database, they are added to their table with a RowState value of Unchanged. When you modify a column value within an existing row, the state is set to Modified. When you add a new row or delete an existing one, that row’s state will be marked Added or Deleted, respectively. And if you create a new row from a table, but have not yet added it to the Rows collection, the row’s state will be Detached.

The row state lets a data adapter detect which rows should be used to perform inserts, updates, or deletes in the database table. When you change the values of an existing row, the current values for the columns are kept in one copy of the row, but the original values of that row from when the data was retrieved is also kept in a second copy of the row. This allows the original values to be used for optimistic concurrency detection (discussed the next section). The RowState values and their effects are described in Table D.2.

TABLE D.2: DataRowState Enumeration Values

Image

You update the database with a data set by using a data adapter. Keep in mind that when I say “update” the database, I mean execute any update, insert, or delete queries needed. Try not to confuse this with the specific SQL UPDATE queries, which only affect modified rows and are only one of three forms of “updates.”

So far, when constructing and using a data adapter to retrieve data, we have been setting its SelectCommand property to an instance of a SqlCommand that wraps a SQL text query or a stored procedure that returns the rows used to initially populate the table. To perform updates using a data adapter, you also need to provide command objects for the adapter’s InsertCommand, DeleteCommand, and UpdateCommand properties, depending on which of those you expect to perform. As you might expect, the InsertCommand will be used for any rows marked as Added, the DeleteCommand for any rows marked as Deleted, and the UpdateCommand for any rows marked as Modified. If a data set doesn’t contain any rows with a particular row state, the corresponding command doesn’t have to be supplied since it won’t be used. Generally speaking, you will want to provide all three because you cannot be sure that no rows will be in the modified, added, or deleted state when an update is performed.

Just like the select command used to fill a data set, the commands used for updating the database with a data set can use either SQL text queries or stored procedures, passing the columns of the updated rows as parameters. For simple applications that need to work directly against a single table in the database at a time, you can use parameterized SQL text queries. For larger scale applications, you may want to use stored procedures.

Either way, the commands you associate with the data adapter will be executed once for each row in the data set that needs to perform an update. This can result in a lot of network round-trips to the database for a large data set that has a lot of changes in it. To address this problem, in ADO.NET 2.0 the SqlDataAdapter class includes a batch updating feature that will group all of the queries and send them all to the database in one or several chunks. To use this feature, you set the UpdateBatchSize property on the SqlDataAdapter to some value other than 1 (the default). If you set the value to 0 (zero), then all of the updates will be batched into a single bunch. If you set some positive value greater than 1, the data adapter will batch sets of that many update queries and send them in chunks until all of the updates have completed. You do need to keep in mind, though, that when you send a batch of update queries using this approach, they will all execute as a single transaction in SQL Server. So if any one row in a batch fails to perform its update, the entire batch will fail.

To automatically formulate the update queries for working directly against the tables, the SqlCommandBuilder class can examine the select command that is in use by a data adapter and dynamically build insert, update, and delete commands for you. It will then populate the command properties of the data adapter with the generated queries. The command builder works well for simple data sets that contain tables generated from SELECT statements against a single table with a primary key constraint. This all happens by constructing a command builder object and passing in the data adapter that it is to create commands for. You won’t actually call any of its methods or properties directly. The constructor for the command builder reaches into the data adapter and sets its InsertCommand, UpdateCommand, and DeleteCommand properties to suitable queries based on the columns that are populated with the SelectCommand. The following code demonstrates the use of a command builder to generate the update queries and use them to push the changes from a data set to the database. In this code, the data set and the data adapter are members on the enclosing class because they are used in a different method to populate the data set with a SELECT query.

private void btnSaveChanges_Click(object sender, System.EventArgs e)
{
   // Get the connection off the existing select command of the
adapter
   SqlConnection conn = m_Adapter.SelectCommand.Connection;

   // Create the insert, update, and delete commands
   // Simply constructing the command builder generates
   // and populates those commands on the adapter
   SqlCommandBuilder cmdBuilder = new SqlCommandBuilder(m_Adapter);

   // Call the update method on the adapter to save all changes
   // in the data set to the database
   m_Adapter.Update(m_CustomersDataSet,"Customers");

   // Refill the dataset to make sure any triggered changes
   // in the DB are reflected
   m_Adapter.Fill(m_CustomersDataSet,"Customers");
}

Using the SqlCommandBuilder is very handy for simple cases. However, it has a number of downsides, and you should avoid it in general for production code in favor of explicitly writing your own SQL statements or using stored procedures. The queries generated by the command builder always compare the values of all columns that were retrieved by the SELECT query to detect optimistic concurrency violations, as described in the next section. However, this is usually not the best approach. This also means the queries carry a lot more data along with them for each update than may be necessary. For larger scale systems, working with stored procedures is a better approach for reasons of security, maintainability, and performance. See the later section “Updating with Data Sets and Stored Procedures” for more details.

Handling Concurrency

One fundamental problem that almost always needs to be addressed when performing updates to a database is how to handle concurrent updates to the same data. Figure D.4 depicts a simple example of how concurrent updates can cause problems. In the figure, client 1 retrieves a row from the table. Client 2 comes along next and retrieves the same row. Client 2 is a little quicker in processing the row and commits its updates back to the database before client 1. The question that has to be addressed for concurrency is: What should happen when client 1 subsequently tries to write its changes to that same row?

FIGURE D.4: The Making of a Concurrency Violation

The Making of a Concurrency Violation

You could allow a “last-in-wins” strategy, and always blindly write the values of an update to the row without considering its current contents. This is effectively no concurrency control. If you did this with the scenario in Figure D.4, the changes made by client 2 would be lost. If you are working with a connected data model, there are a number of ways to handle this situation with server-side locks on tables or rows that prevent concurrent access or updates depending on the level of isolation you want to provide. That is a very complex topic of its own that isn’t really relevant in the disconnected data model employed by ADO.NET.

To handle this in ADO.NET, the usual approach is to employ something called optimistic concurrency, which basically means that you are going to assume that between the time you retrieve a row of data and the time you go back to the data source to update it, you don’t expect any other query or client to come along and modify that row. If that assumption is violated, which could happen quite often in a highly concurrent system, your code needs to be able to deal with it in some way.

To deal with a concurrency violation, you first have to detect that it has occurred. This is done in ADO.NET by comparing the current contents of database columns (at the time an update is attempted) to those that were originally retrieved from the database. If the current contents of the database row are different from those that the ADO.NET client code has cached, then the optimistic concurrency assumption has been violated, and something needs to be done about it. As discussed earlier, the data set maintains the original values for any row that gets modified. This makes it very easy to use those original values to make optimistic concurrency decisions once a conflict has been detected.

Depending on the situation, you might want to compare the contents of every column in the row to the original values retrieved by ADO.NET. This is the approach used by the queries generated by the SqlCommandBuilder shown in the last section, as well as by the queries generated by Visual Studio .NET in a table adapter when you use the data set designer discussed in Chapter 2. For example, a parameterized update query for the Region table in Northwind might look like this:

UPDATE Region
SET RegionID = @RegionID, RegionDescription = @RegionDescription
WHERE (RegionID = @Original_RegionID) AND (
       RegionDescription = @Original_RegionDescription)

The @RegionID and @RegionDescription parameters would be populated by the current values in the row being updated, and the @Original_RegionID and @Original_RegionDescription would be populated with the original row values maintained by the data set.

However, for a large table, comparing the values of every column means carrying a lot of data back and forth across the connection to the database. A good compromise if you have control of the database schema is to include a timestamp, rowversion, or last-modified datetime column that gets updated every time any of the columns in the row gets updated. You can then just compare the value of this single column to detect concurrency violations (see Chapter 10 for an example of this approach). Another option is to use some subset of the columns in the row to determine whether a violation has occurred. For example, if the table storing the large images also contained their file name, size, and a hash of the image contents, you could compare those columns to decide whether the image had been updated concurrently or not.

The SqlDataAdapter class has built-in logic for detecting when optimistic concurrency has been violated, based on the @@ROWCOUNT SQL variable that indicates the number of affected rows when a query is executed. If the query that you execute to perform an update, insert, or delete indicates that the number of affected rows was zero, the data adapter will raise a DBConcurrencyException. You can tap into this detection logic both when you are using SQL text queries like the one in this code example and when you are doing your updates through a stored procedure. As long as whatever query gets executed by the command affects zero rows, the data adapter will throw the exception, which you can catch and handle appropriately. This happens because the rows affected should be equal to one, indicating that the update was successful.

However, throwing an exception may not always be the right thing to do. What if the data set contains 100 rows to update, and the violation happens on the 42nd row? The first 41 updates will have already been persisted to the database unless the update commands are associated with a single transaction (discussed later in this appendix), and the remaining 58 updates won’t happen because the execution of the data adapter Update method is interrupted when the DBConcurrencyException is thrown. Instead, you may want to allow updating to continue even if an error occurs, and then decide what to do about any errors after all the rows have been processed.

The data set and data adapter support this scenario as well. Each DataRow in the tables of the data set contains a collection of errors exposed through the Errors property. If a problem occurs performing an update on a row, the errors collection will get populated with information about what happened by the exception that was thrown. That could include not only concurrency exceptions, but other built-in or custom SQL Server errors raised by the query or stored procedure execution. You can set the ContinueUpdateOnError property of the data adapter to true, and then if any exceptions are thrown in the processing of a row, the errors collection will still be populated for that row but processing will continue with the remaining rows. You are then expected to iterate through all the rows in the table being updated after completion of the Update method, and inspect the Errors property on each row and react accordingly. If you are doing batch updating in ADO.NET 2.0, the entire batch will be committed or aborted by the database within the scope of a transaction. If an exception is raised in the processing of a batch, all the rows of that batch will fail.

How you handle the presence of errors, or a thrown exception if you don’t set ContinueUpdateOnError equal to true, is going to be application dependent. You may want to just log the problem, or prompt users with a list of failed updates and the current database values, and provide them with some kind of data merging user interface to resolve the conflict. If you are binding the data to a Windows Forms DataGridView or DataGrid, the grid itself will detect the presence of errors and will depict the problem with an icon and tooltip help.

Updating with Data Sets and Stored Procedures

Not surprisingly, in order to talk about updates with stored procedures, we need some stored procedures that perform updates. If you run the script in Listing D.7 against the Northwind database, you will get simple SELECT, UPDATE, INSERT, and DELETE stored procedures for working with the Region table that are appropriate for working with a data set that will contain the Region table data.

LISTING D.7: Update Stored Procedures for Regions


CREATE PROCEDURE GetRegions
AS
SELECT RegionID, RegionDescription FROM Region
RETURN
GO

CREATE PROCEDURE UpdateRegion
@RegionID int,
@RegionDescription nchar(50)
AS
UPDATE Region
SET RegionDescription = @RegionDescription
WHERE RegionID = @RegionID
RETURN
GO

CREATE PROCEDURE InsertRegion
@RegionID int,
@RegionDescription nchar(50)
AS
INSERT INTO Region
(RegionID, RegionDescription)
VALUES (@RegionID, @RegionDescription)
RETURN
GO

CREATE PROCEDURE DeleteRegion
@RegionID int
AS
DELETE FROM Region WHERE RegionID = @RegionID
RETURN
GO


To keep things simple here and focused on the calling of stored procedures for updates, these stored procedures have no concurrency protection. As mentioned earlier when discussing command builders, queries required to do optimistic concurrency checking based on every column in the row get messy and are very inefficient. A better way is to use a timestamp or rowversion column in your table, or have a column that gets updated with the current date and time every time a row is changed. You can then use that for concurrency violation detection, and you only need to check a single column and carry around one extra parameter to see if anyone has updated the row since one was retrieved.

Note that in this case, the RegionID column in the Regions table isn’t an identity autonumbered column, so you have to pass in the ID as a parameter even for inserts. If you have an identity column that is going to be generated on the server side when the insert is performed, you will want to make the identity column an output parameter on the stored procedure, and then set that parameter within the stored procedure to the @@IDENTITY value. This way the new identity value gets passed back out of the stored procedure and will be placed in the row that originated the insert in the data table on the client side.

CREATE PROCEDURE InsertSimpleTable
    (
    @Val nvarchar(50),
    @ID int OUTPUT
    )

AS

    INSERT INTO SimpleTable (Val) Values (@Val)
    SET @ID = @@IDENTITY

Once you have the stored procedures to call, you need to write a little code to wire up all the commands needed to retrieve and update the data set using those stored procedures. The following code shows a simple method that calls the GetRegions stored procedure to populate a data set that is a member of the containing class.

private void GetRegions( )
{
   // Clear the current contents
   m_RegionsDataSet.Clear( );

   SqlConnection conn = new SqlConnection(
    "server=localhost;database=Northwind;trusted_connection=true");
   SqlCommand selectCmd = new SqlCommand("GetRegions",conn);
   selectCmd.CommandType = CommandType.StoredProcedure;

   // Set the select command on the adapter
   m_Adapter.MissingSchemaAction = MissingSchemaAction.AddWithKey;
   m_Adapter.SelectCommand = selectCmd;

   // Fill the member data set
   m_Adapter.Fill(m_RegionsDataSet,"Regions");
}

The GetRegions method first clears the current contents of a data set that is held as a member variable of the class containing this method. It then sets up a connection object and passes that to a command object’s constructor along with the name of the stored procedure that will be called. Any time you are going to call a stored procedure, you have to set the CommandType property on the command object to the StoredProcedure enumerated value. The adapter is set up to retrieve the primary key information with the AddWithKey enumerated value on the MissingSchemaAction property, and its SelectCommand property is set to the command object just created. Finally, Fill is called, which will execute the stored procedure, placing the returned result set in a table named Regions within the data set.

The code in Listing D.8 shows an update method that uses a data set containing the Regions table of the Northwind database to make updates if any inserts, deletes, or modifications have been made to the table. The data set and the data adapter in this case are members of the containing class; this way, they can be accessed by both the method that populates the data set and by the method that performs updates and returns the number of rows updated.

LISTING D.8: Calling Update Stored Procedures


private int UpdateRegions( )
{
   // Get the connection from the adapter's select command
   SqlConnection conn = m_Adapter.SelectCommand.Connection;

   // Create the insert command
   SqlCommand insertCmd = new SqlCommand("InsertRegion",conn);
   insertCmd.CommandType = CommandType.StoredProcedure;
   insertCmd.Parameters.Add(
      new SqlParameter("@RegionID",SqlDbType.Int,4,"RegionID"));

   insertCmd.Parameters.Add(
      new SqlParameter("@RegionDescription", SqlDbType.NChar, 50,
                        "RegionDescription"));

   // Create the update command
   SqlCommand updateCmd = new SqlCommand("UpdateRegion", conn);
   updateCmd.CommandType = CommandType.StoredProcedure;
   updateCmd.Parameters.Add(
      new SqlParameter("@RegionID",SqlDbType.Int,4,"RegionID"));
   updateCmd.Parameters.Add(
      new SqlParameter("@RegionDescription", SqlDbType.NChar, 50,
                        "RegionDescription"));

   // Create the delete command
   SqlCommand deleteCmd = new SqlCommand("DeleteRegion", conn);
   deleteCmd.CommandType = CommandType.StoredProcedure;
   deleteCmd.Parameters.Add(
      new SqlParameter("@RegionID",SqlDbType.Int,4,"RegionID"));

   // Associate each command with the adapter
   m_Adapter.InsertCommand = insertCmd;
   m_Adapter.UpdateCommand = updateCmd;
   m_Adapter.DeleteCommand = deleteCmd;

   // Perform the update
   int count = m_Adapter.Update(m_RegionsDataSet,"Regions");

   // Refill the dataset to reflect any concurrent changes
   m_Adapter.Fill(m_RegionsDataSet,"Regions");

   return count;

}


This code is a little more verbose than the other examples in this appendix, because the update, insert, and delete stored procedures each take some parameters to decide what updates to perform. Let’s step through it line by line.

The method first retrieves the connection to be used from the select command associated with the data adapter that is a member of the class. It then creates each of the stored procedure commands for the update, insert, and delete procedures. It creates SqlParameter objects to wrap each parameter to each of the commands and adds them to the Parameters collection on the command. The method then associates each of the commands with the data adapter and calls the Update method on the data adapter to execute the commands for each row in the data set that contains changes. The Update method on a data adapter returns the number of rows that were inserted, deleted, or updated using the data set. The method also calls Fill again on the data adapter before returning, so that any concurrent changes made in the database, or any columns that are computed by triggers, are refreshed in the data cached in the data set.

The SqlParameter class encapsulates the passing of parameters to stored procedures and parameterized queries. It is responsible for managing the translation of parameters between two different type systems: the type system of .NET and the type system of the database. The .NET type system contains primitives such as Int32, Float, Double, DateTime, and String, as well as every other class in the .NET Framework and every custom class and structure that you write. This type system stores values in a data set, because a data set is just another type in the .NET world that is capable of containing other .NET types through object references. SQL Server (or any other data source) has its own notion of a type system, and that system is used to store the data in the database.

So the SqlParameter class takes care of wrapping the values in the type system of .NET and passing them to the database in a form that is compatible with the type system in play there. You construct each SqlParameter object by specifying the name of the query parameter that it will provide, the database type of the query parameter, and the size of that parameter if it is variable length in the database (such as a varchar in SQL Server). There are different overloads of the SqlParameter constructor that let you skip specifying a size, and you can always set this information through properties as well. The name parameter passed to the constructor needs to match the name of the query parameter exactly, including the @ symbol for SQL parameters. This is specific to the SQL Server managed provider. If you use the OLE DB managed provider, you will use the OleDbParameter class. However, OLE DB parameters are always passed by position (order) in the statement, so the name is ignored.

When you are using a data set to perform updates through a data adapter, you also need to provide a column mapping between the column in the table and the parameter that it will be passed to. As mentioned before, these SqlParameter objects could be initialized using the properties exposed by the class, but this example instead uses one of the overloaded constructors that takes all of the values needed in one line of code. The column name that you specify for that last parameter allows the data adapter to automatically extract the current column value for each row and set it as the parameter value at the time the update occurs.

If there is any type mismatch between the value in the table and the type of the parameter, you won’t know until runtime. Just as the query can throw exceptions, the setup code that precedes the query can throw exceptions as well. You will want to make sure that you have proper exception handling code wrapping any data access methods or the code that calls them to make sure your application doesn’t blow up on you.

If you are working with stored procedures that have output parameters, such as the INSERT stored procedure shown earlier that returned the value of an identity column, you would need to specify the parameter’s direction when you add it to the command object:

SqlParameter idParam = insertCmd.Parameters.Add(
      new SqlParameter("@ID",SqlDbType.Int,4,"ID"));
idParam.Direction = ParameterDirection.Output;

This would map the ID column in the data table to the output @ID parameter on the stored procedure. After the data adapter called the InsertCommand to perform the insert, it would take the value that was in the idParam after the command was complete and would put it into the respective column in the data table.

Searching Data Sets

Once you get data into a data set, you often need to be able to search and select data from within it, either to perform updates or processing on some subset of the data, or for display purposes. Keep in mind that the DataSet class and its related class aren’t designed to be high-efficiency query engines. Although you can select and filter the data in a data table on the client, it may be faster in many cases to simply execute a new, more specialized query against the database to get the specific results you are looking for. This is especially true when working with a local database instance like with a SQL Express database.

However, if you already have the data in memory in a DataSet and want to select items without incurring a round-trip to the database, there are several approaches you can use. The simplest is to use the Select method of the DataTable class to perform SQL-like queries against the data contained within a table. You can also use data views to filter the data based on a selection criteria, or you can use the XmlDataDocument class to load a data set into an XML document container and perform XPath queries against it (data views and the XmlDataDocument are discussed in subsequent sections).

The Select method has a number of overloads, which let you search based on a string criteria that resembles a WHERE clause in SQL, and optionally provide a sort order or a DataRowViewState enumeration to search among records of a particular state or states. The Select method returns an array of DataRow objects containing references to the rows that matched the selection criteria. Once you have that array of rows, you can iterate over them and perform any processing you need on them. You are working against live references to the actual rows in the data set, so any changes made to them immediately affect the data in the data set. The following code shows a Select operation on a Customers data set using two of the overloads of the method.

private DataRow[ ] SelectRows(string selString)
{
   DataRow[ ] rows = null;
   // Perform the selection
   if (selString == string.Empty || selString == "*")

   {
      // Select all
      rows = m_Customers.Tables["Customers"].Select( );
   }

   else

   {

     // Select based on the criteria
     rows = m_Customers.Tables["Customers"].Select(selString);

   }
   return rows;
}

Merging Data from Multiple Data Sets

The DataSet class also supports a couple of ways of moving data between different data sets. Your first instinct might be to simply get a DataRow reference from one data set, perhaps from a Select operation or from iterating through, and then add that to the table in another data set. Unfortunately, this isn’t allowed and will throw an exception. When a data row is added to a table, it becomes a part of that table, and it cannot belong to more than one table at a time. Only a row that has been created with the DataTable.NewRow method can be added to that table with the Add method on the table’s Rows collection.

So how do you get a row or set of rows of data that belong to one data set into another? One way is to use the Merge method. The behavior of this method can get fairly complex because it has a number of overloads, and it depends on what the state of the rows (Added, Modified, Deleted) are in the data set they are coming from (the source), and whether there are corresponding rows in the data set they are being merged into (the target). For simple cases, though, things are pretty straightforward.

Consider a situation where you have the Customers table in one data set and the Orders table in another data set. You want to get them both into a single data set so that you can create a parent-child relationship between them based on the CustomerID foreign key that exists in the Orders table. To do this, you would simply call Merge on the data set containing the Customers table, passing in the Orders data set. The Merge method would create an Orders table in the Customers data set and copy over all the rows from the Orders data set. You could then construct the relation between the two tables in code as follows:

private DataSet MergeCustomersAndOrders( )
{
   // Code to create and fill the two data sets...
   // customersDS contains just Customers,
   // ordersDS contains just Orders

   // Merge the orders into the customers

   customersDS.Merge(ordersDS);

   // Create the relation between the two tables
   DataColumn custIDParentCol =   customersDS.Tables["Customers"].Columns["CustomerID"];
 DataColumn custIDChildCol =
   customersDS.Tables["Orders"].Columns["CustomerID"];
 DataRelation custOrders =
   new
DataRelation("CustomersOrders",custIDParentCol,custIDChildCol);

    return customersDS; // Contains both tables and relation
}

If you want to add a row from a table in one data set into the same table in another data set with the row state of Added, you can pass the ItemArray property of the source row to an overload of the Add method on the Rows collection of the target table. This transfers the values of the cells as an array of objects into the target table, creating a new row to contain them. If you want to add a row from a source table into a target and preserve its state, you can use the ImportRow method on the DataTable class. The following shows both of these approaches.

private void AddAndImport(DataSet dsTarget, DataSet dsSource)
{
    // Add the values from a single row from the source
   // to the target with the row state set to Added
   dsTarget.Tables["Customers"].Rows.Add(
     dsSource.Tables["Customers"].Rows[0].ItemArray);
  // Import a single row preserving row state
  dsTarget.Tables["Customers"].ImportRow(
    dsSource.Tables["Customers"].Rows[0]);
}

A common use of Merge is in conjunction with the Select method to extract a set of rows and place them in another data set, either for display or transfer to another method or client.

private void OnSelect(object sender, EventArgs e)
{
  // Select the rows using helper method
  DataRow[ ] rows = SelectRows(m_SelectTextBox.Text);
  // Show the results
  if (rows.Length > 0)
  {
     // Create a new data set
    DataSet localDS = new DataSet( );
    // Merge the selected rows into it

    localDS.Merge(rows);
    // Bind the grid to this
    m_ResultsGrid.DataSource = localDS.Tables["Customers"];
   }
}

Another way to get a row into a table as a new row is to use the LoadDataRow method on the table, passing in an array of objects that contain the column values. Typically you will get this object array from the ItemArray on the source row.

Because there isn’t any way to directly set the row state of rows in a data table, if you want to get a row into a particular state, such as modified or deleted, there are some tricks you can play. For either of those states, you can first set the row state to unmodified by calling the AcceptChanges method on the row. To get the state set to modified, set the value of one of the columns to its current value by setting the column value against itself as shown next. To set the deleted state, just call Delete on the row:

DataSet data = new DataSet( );
// Code to fill data set with Customers ...

DataRow modRow = data.Tables["Customers"].Rows[0];
modRow.AcceptChanges( ); // Set state to unchanged
// Set state to modified:
modRow["CompanyName"] = modRow["CompanyName"];

DataRow delRow = data.Tables["Customers"].Rows[1];
delRow.AcceptChanges( ); // Set state to unchanged
delRow.Delete( ); // Set state to deleted

Working with Data Views

Another way to manipulate the data contained in a data set for processing or viewing is using a data view. A data view doesn’t contain any data itself; as its name implies, it is just a view into the data in the table underneath it. You can think of it as a lens that you place over a table that makes it look different to the consumer of the view, even though the underlying data is still whatever data is stored in the table that the view represents. You can modify the data exposed through a data view, and when you do, you are actually modifying the data in the underlying table directly.

Every instance of a DataTable already has a default DataView instance associated with it, and it is actually this view that is used when you bind a table to a grid. You can make modifications to the default data view, which will affect any controls that are bound to the underlying data table, or you can construct new instances of a DataView object to wrap a table for the purposes of sorting or filtering the data.

The DataView class implements Sort, RowFilter, and RowStateFilter properties that let you modify what data the view exposes. You set the Sort property to an expression that specifies what column(s) to sort on and in what order. You can set the RowFilter expression using the same syntax supported by the DataTable.Select method to restrict the rows presented by the view based on some criteria that is like a WHERE clause in a SQL statement. Finally, you can use the RowStateFilter to only look at rows in a particular state, such as all Deleted or Added rows. You can use all of these in combination with one another to specify fairly complex filtering and sorting criteria to expose data contained within a table in just about any way you need to:

private void OnView(object sender, EventArgs e)
{
   // Create the view wrapping the customers table
   DataView view = new DataView(m_Customers.Tables["Customers"]);

   // Sort by CompanyName ascending, ContactName descending
   view.Sort = "CompanyName ASC, ContactName DESC";

   // Only show German customers
   view.RowFilter = "Country = 'Germany'";

  // Only show inserted or edited rows
  view.RowStateFilter = DataViewRowState.Added |
      DataViewRowState.Modified;

  // Bind the view to a grid for viewing
  m_ResultsGrid.DataSource = view;
}

Working with Transactions

Sometimes you need to be able to execute multiple commands within the scope of a single database transaction, so that if any one of the commands fails, they all fail, and no changes are made to the database. You can do this inside of SQL Server and other databases by beginning a transaction within a stored procedure, and committing it or rolling it back based on the outcome of all the queries encapsulated within the stored procedure. However, sometimes you will want to control the transaction from within your ADO.NET code, wrapping the execution of multiple commands within a single transaction.

It is quite easy to do this. You can use the SqlTransaction class (or corresponding classes for other managed providers) in conjunction with the connection to manage a transaction from your data access code. For example, say you wanted to write code that first checked whether a row existed, and if it did, then make some modification to that row. Let’s also say that you needed to do this with two explicit SQL text queries from within your managed code. You want to write bulletproof code, so you want to ensure that the row that will be affected by the update cannot be modified or deleted by another client or query between the time you check for its existence and when you update it. Using a transaction with an appropriate isolation level will work just fine for this scenario. Take a look at the code in Listing D.9.

LISTING D.9: Executing Multiple Queries Within a Transaction


private void OnTransactionalUpdate(object sender, System.EventArgs e)
{
   // Create the connection
   SqlConnection conn = new SqlConnection(m_ConnString);

   // Create the two commands you want to wrap in a transaction
   SqlCommand selectCmd = new SqlCommand(
      "SELECT CustomerID FROM Customers WHERE CustomerID = 'ALFKI'",
      conn);
   SqlCommand updateCmd = new SqlCommand(
   "UPDATE Customers SET CompanyName = 'FooBros' WHERE CustomerID =
   'ALFKI'", conn);

   // Declare the transaction
   SqlTransaction trans = null;

   try
   {
   // Open the connection
   conn.Open( );

   // Start a transaction with the repeatable read isolation
   trans = conn.BeginTransaction(IsolationLevel.RepeatableRead);

   // Associate the transaction with the commands
   selectCmd.Transaction = trans;
   updateCmd.Transaction = trans;

   // Check for the existence of the customer
   string custID = (string)selectCmd.ExecuteScalar( );
   if (custID == null)
   {
       throw new ApplicationException("Customer not found");
   }
   // Update the customer
   updateCmd.ExecuteNonQuery( );
   // Commit the transaction if you got to here
   trans.Commit( );
}
catch (SqlException ex)
{
   if (trans != null)
   {
     trans.Rollback( );
   }
   MessageBox.Show(ex.Message);
}
finally
{
  conn.Close( );
 }
}


There are a number of things to discuss here. The first is how to create and employ the transaction. The code creates the connection and command objects the way you have seen before. The transaction is created by calling BeginTransaction on the connection, optionally specifying an isolation level (see Table D.3). The default isolation level is ReadCommitted, but to ensure that no one can modify the record once it has been read with a SELECT statement, you need RepeatableRead.

TABLE D.3: Transaction Isolation Levels

Image

Once the transaction is created, you need to associate it with any commands you want to enlist within the transaction scope. After that, you can execute the commands. If everything turns out the way you want, you should call Commit on the transaction object to make all the changes done within the transaction permanent. If anything goes wrong or an exception gets thrown, you should Rollback the transaction to prevent any changes made by the commands enlisted within the command from becoming permanent. The best way to make sure this pattern is followed is with a try-catch block as shown in Listing D.9.

Image TIP   Explicitly roll back transactions

If you close a connection on which you have begun a transaction and you haven’t called Commit on that transaction yet, the transaction will be automatically rolled back. However, I recommend that you always make it explicit by calling Rollback yourself (typically in an exception-handling catch block) to make it clear when and where a Rollback is occurring.

In addition to the try-catch, you can see that because the connection was explicitly opened in Listing D.9, I made sure to close it in the finally block, so that no matter what, the connection gets closed before I leave the method. A couple of other new things you see here but that haven’t been discussed yet are the ExecuteScalar and ExecuteNonQuery methods on the SqlCommand objects. ExecuteScalar is a convenience method for queries that are expected to return a single row with a single column as their result set. The method will extract the value in that column for you and pass it back as a return value from the method. It is returned as an object reference, so you will have to cast to the expected type, as shown in Listing D.9. The ExecuteNonQuery method is for executing commands for which you don’t expect any returned rows, such as update, insert, and delete queries.

Scoping Transactions with System.Transactions

A whole new approach to writing transactional code was designed into .NET 2.0 in the System.Transactions namespace. Through the classes defined in that namespace, you can now easily start or join transactions in your code without being tied directly to the transactional capabilities of the database or other specific transactional resource managers. The transactions that are created through the System.Transactions classes can be either lightweight transactions against a single resource manager, such as a single SQL Server 2005 database, or they can be distributed transactions involving multiple resource managers, such as a SQL 2000 database, an Oracle database, and an MSMQ message queue. Another great feature of the new transaction capabilities is that a transaction will automatically promote itself from a lightweight transaction to a distributed transaction whenever it sees that a new resource manager is accessed within a transaction scope that requires a distributed transaction, so you don’t have to worry about keeping straight the different models in your code.

To use System.Transactions, you first have to add a reference to your project to the System.Transactions.dll assembly from the .NET Framework. You then need to include the System.Transactions namespace in the code files where you will be using it. When you want to execute some code within a transaction, you set up a transactional scope by creating an instance of the TransactionScope class. You then execute the code that you want to be transactional, and if everything succeeds, you call Complete on the transaction scope object. The transaction won’t actually be committed until you dispose of the scope object, and if Dispose is called without first calling Complete on the scope object, the transaction will be aborted.

The easiest way to do this is with a using block in C#. When you use the using statement in C#, you pass an object reference to the using clause that is a disposable object. When the using block is exited, Dispose will be called on that object automatically, even if an exception is propagating. This is because the code that is generated by the compiler for a using block includes a try-finally block, and Dispose is called on the using clause argument in the finally block.

So if you create a TransactionScope object and pass it to a using block, when you exit that using block, the transaction that the scope represents will either commit or rollback, depending on if the Complete method was called.

Here is an example of this in action:

using System.Transactions;

public partial class Form1 : Form
{
   private void OnExecute(object sender, EventArgs e)
   {
       NorthwindDataAccess dac = new NorthwindDataAccess( );
       using (TransactionScope scope = new TransactionScope( ))
       {
           DataSet data = dac.GetCustomerData( );
           data.Tables["Customers"].Rows[0]["Phone"] = "030-0074321";
           dac.UpdateCustomers(data);
           scope.Complete( );
       }
    }
}

In this code, a data access component named NorthwindDataAccess is used to retrieve some records from the database, make a modification to one of them, and then push the updates back to the database. If you want both the retrieval and the update to be part of a single transaction without having to worry about connection and transaction management yourself, you simply create a TransactionScope instance to bracket the data access component calls. You pass that instance to the using statement, and call Complete on the scope object at the very end of the using block after all of the query methods have been called. If you reach the call to Complete, the data access calls must have succeeded, because otherwise an exception would be propagating. When the point of execution hits the end of the using block, the scope object will be disposed of. If the Complete method was called, the transaction will be committed, and if not (such as when an error occurs in the update query), then the transaction will be rolled back.

You can use this approach to wrap multiple queries to one or more databases in a single transaction, ensuring that all of the queries either succeed or fail as one. The big benefit for using a TransactionScope to create and manage a transaction is that it requires very little code, and it is simple to declare and use. This code can also be several layers up the call stack from the actual data access code, so you don’t need access to the database connection to set up the transaction. The code looks the same if the code uses one or many connections under the covers. There are a lot of other capabilities and ways to use a transaction using the classes in the System.Transactions namespace that are beyond the scope of this book. (“Introducing System.Transactions in the Microsoft .NET Framework, Version 2.0” by Juval Löwy, available on MSDN Downloads, provides comprehensive coverage of the capabilities of the System.Transactions namespace classes. See www.microsoft.com/downloads/details.aspx?familyid=AAC3D722-444C-4E27-8B2E-C6157ED16B15&displaylang=en.)

If the code that sets up a transaction scope as shown in this example is called from other code that already has a transaction in progress, the new transaction scope will become a nested transaction to the calling code’s transaction and will only commit if the containing transaction commits. There are a number of complex combinations you can achieve using nested transaction scopes, cloning of transactions, and directly managing the underlying Transaction object yourself if you need to address more advanced scenarios.

The one downside to using a transaction scope as shown here is that unless you are working against a SQL 2005 database, the transaction that is created will be a distributed transaction even if you are simply accessing a single database, such as SQL Server 2000. This can have some negative performance impacts on your application, compared to managing a transaction yourself as shown earlier in Listing D.9. You will have to weigh the performance impact against the cleaner and easier to write code resulting from using the System.Transactions approach for your own applications. I would recommend starting with the System.Transactions approach and only reverting to low-level transaction management if you need to address a performance problem in a particular part of your application that is using transactions.

Client-Side Transactions

Sometimes you may need to work with data on the client side in a data set’s form and have similar logic to that just presented for server transactions, so you can modify the data and back out changes if something goes wrong. You can simulate a transaction while working with client-side data by using the AcceptChanges and RejectChanges methods. As discussed earlier, any changes made to a data set are maintained through a combination of row state and the current and original values for each row.

If you have made any changes to the rows in your data set and call AcceptChanges, the original values for all modified rows will be replaced with the current values, and the state of all rows will be changed to Unchanged. If you call RejectChanges, the current values for any modified rows will be replaced with the original values, inserted rows will be discarded, and the state of all rows will also be set to Unchanged. The AcceptChanges and RejectChanges methods are defined on the DataSet, DataTable, and DataRow classes, so you can perform these transaction-like operations at any level of granularity within the data set that you need.

Using a combination of AcceptChanges and RejectChanges, you can code logic that accepts or rejects all changes to a table or data set based on some criteria. You should be aware when using these methods, however, that you generally want to avoid using them if the data in the data set will be used to update a database through a data adapter. As mentioned earlier, the data adapter figures out which commands to execute (update, insert, or delete) for which rows based on their row state. Rows with a row state of Unchanged don’t get any command executed against them when performing an update. Because both AcceptChanges and RejectChanges set the state of rows to Unchanged, you will only want to use these methods with a data set to update the database when using them with the workaround discussed earlier to get a row into the Modified or Deleted state after first setting them to Unchanged with AcceptChanges or RejectChanges.

Image NOTE

Do not call AcceptChanges on a data set if it contains changes and you need to update the database with those changes. Calling AcceptChanges sets the row state of every row in the data set to Unchanged and replaces the original values with the current values. Since all of the rows will have a state of Unchanged, no changes can be propagated to the database through a data adapter because it won’t see any rows to perform updates with.

Data Set and Data Adapter Events

So far I have just discussed programming against the methods and properties of data sets and data adapters, and handling exceptions when something goes wrong. However, sometimes you need to be more in the loop as the processing is going on. There are a number of events that you can tap into from the DataSet, DataTable, and SqlDataAdapter classes that are useful for letting your code be notified when changes are occurring.

The DataSet class has two events:

•    MergeFailed, which will be fired if you are performing a Merge operation and there is a primary key conflict between the target and source tables.

•    Initialized, which will be fired when the DataSet has completed initialization of its contained objects and state.

The DataTable has the most useful events, which are described in Table D.4.

TABLE D.4: DataTable Events

Image

The events on the DataTable follow a common pattern for events that relate to modifications of some entity: there is a pair of events for each form of modification. One event fires before the change has occurred, and another event fires after the change is complete. This lets you code any pre- and post-processing of the change that you need to, including preventing the change from occurring.

The ColumnChanging/ColumnChanged events pass a parameter of type DataColumnChangeEventArgs to your event handler. This parameter contains several useful values for controlling the update of a column. A read/write property named ProposedValue contains the value that the column will be changed to when the event handler completes. You can inspect this value in your ColumnChanging handler, and in combination with the Row and Column properties on the event arguments parameter, make validation decisions about changing that value.

If the value being proposed doesn’t meet your validation constraints, you have two choices. First, you can change it to some acceptable value by assigning a different value of the appropriate type to the ProposedValue property on the event arguments parameter. If you do this, whatever value you assign to that parameter will be the one that is actually set for the column when the change is made. So you could also change the Proposed value to the original value by extracting the original value through the Column and Row property, as shown here, to effectively prevent the change:

private void OnCustomersColumnChanging(object sender,
                        DataColumnChangeEventArgs e)
{
   if (m_ConfirmChangesCheckBox.Checked)
   {
       DialogResult res = MessageBox.Show(
       "Column " + e.Column.ColumnName + " changing to value: " +
       e.ProposedValue.ToString( ) + ". Allow change?",
       "ColumnChanging Event",MessageBoxButtons.YesNo);
       if (res == DialogResult.No)
       {
          e.ProposedValue = e.Row[e.Column.ColumnName];
       }
    }
}

The one thing to be aware of with this code is that even though setting the ProposedValue back to the original value prevents the value from changing, the row will still be marked as modified, and the row will be updated in the database if you use it for an update with a data adapter. You could work around this by calling AcceptChanges on the row, but then if any other columns have been changed to new values in that row, their changes won’t get sent to the database with a data adapter update. This is only usually a problem in a data-bound scenario, and in that case there are events on the data-bound control that you can usually handle to get into the loop sooner to cancel the change. However, this is something to keep in mind if you are trying to cancel a change through the DataTable events.

You could also throw an exception from within the ColumnChanging event handler. This would effectively abort the event processing, and the value would get changed back to its original value. However, throwing an exception has a lot of overhead with it, and the code that is causing the change better have an exception handler to deal with that scenario. In the case of a DataGrid control, the Changing event handler will be called again as the value changes back to the original, so the original value better pass your validation logic or you could end up with your program blowing up from unhandled exceptions. In general, you should try to avoid the approach of throwing an exception.

There are three events on the SqlDataAdapter class to be aware of as well. The FillError event will be fired if an error occurs while filling a data set with the adapter, which gives you insight into what went wrong, and you can make programmatic decisions based on the error and the values that caused it. The RowUpdating and RowUpdated events work like those described for the DataTable, except these fire as each row is updated (UPDATE, INSERT, or DELETE query) in the database. The event arguments let you determine the cause of the error and tell the adapter whether to continue updating other rows or to stop update processing.

Reading Data into Business Objects

I have covered the use of data sets pretty extensively in this appendix because they are the richest and most often used relational data container in both Windows and Web client and middle-tier applications. There is one other important data access class that was briefly described earlier: the data reader. A data reader is a fast, forward-only, read-only cursor into a returned set of data from a database. Each managed provider is responsible for including a provider-specific data reader class. For SQL Server, it is named, not surprisingly, SqlDataReader.

A SqlDataReader lets you execute a query and quickly iterate through each row of the returned results to perform processing on those rows. Note that I didn’t say the data reader is like a cursor into the database itself. You do need to maintain an open connection associated with the data reader as long as you are iterating through the results. But the data reader will buffer the results of the query on the client side, and you cannot modify the underlying data. All you can do is read the rows returned one at a time and act upon them.

When you are done using the reader, you need to make sure the connection gets closed as you do for any other query. It is not uncommon for a data access layer to pass a data reader back as the return value from some data access method.

public static SqlDataReader GetCustomers( )
{
   SqlConnection conn = new SqlConnection(
      "server=localhost;database=Northwind;trusted_connection=true");
   SqlCommand cmd = new SqlCommand(
      "SELECT CustomerID, CompanyName, ContactName FROM Customers",
      conn);
   conn.Open( );
   return cmd.ExecuteReader(CommandBehavior.CloseConnection);
}

This code allows a client to take that reader, iterate through the results quickly, and do with them what it will. But if you take a look at the code, you will notice that in order to do this, you need to open the connection to the database and leave it open so the client can iterate through the contents of the data reader. However, the connection is declared and opened down in the data access layer, and is not directly accessible to the client code to close the connection when it is done with the reader. So how can the client make sure the connection gets closed when it is done using the reader? You will see in this code that a CommandBehavior enumerated value of CloseConnection was passed to the ExecuteReader method on the command. Thus, if the client calls the Close method on the data reader itself, the underlying connection will be closed as well. Additionally, if the client simply iterates through all of the rows in the reader, the connection will be closed when the last row is read.

Returning a data reader is significantly faster than returning a data set, because you can’t directly operate on the contents of a data reader; all you can do is use it to quickly pull data out of the reader and do something with it. Often the thing you want to do with the data is use it to construct and populate some object model, which is what the data set and data adapter have to do at the time you fill the data set. That is why the data set takes a lot longer to fill than a data reader does to return. With the data set, you have a full object model in memory that you have read/write and random access into. To achieve the same thing with a data reader, you will have to pull the data out of the rows from the data reader and stuff them into your own object model before you can do something meaningful with them.

In larger-scale, object-oriented systems, this may be exactly what you want to do in the first place. Typed data sets (covered in Chapter 2) make a great choice for fairly lightweight, easy to construct and maintain data transfer objects that you can pass around the middle tier for processing or pass to the client for presentation. However, often your middle-tier objects need to not only encapsulate data—they need to encapsulate behavior as well. In that case, you will probably be designing your own object model, where many of the objects might encapsulate state that is stored in persistent form in the database. The data reader works great in this case for quickly reading data into an object or set of objects resulting from a retrieval query against the database.

For example, let’s say you have a Customer class defined in your business object model that encapsulates the state and behavior associated with customers. When you want to act on a collection of customers, you retrieve the data from the database like the previous code, getting back a data reader that lets you get the data associated with the customers. Now you need to pull that data into your customer objects.

The data reader exposes an interface similar to other reader classes in the .NET Framework. A reader acts like a cursor into a collection of items. The cursor is initially positioned just before the first item. To move the cursor to the next item, you call the Read method on the reader. The Read method will return false when there are no more items to be read. Once the cursor is positioned on an item, you can access the contents of that item through whatever access methods and properties the specific type of reader exposes. In the case of a SqlDataReader, the items are rows, and you can access the row contents through a combination of an indexer, the Item property, or GetXXX methods, where XXX is the type of the column you are requesting. The following code shows both the use of indexers and the GetString method.

private void OnGetCustomers(object sender, EventArgs e)
{
    // Create an empty collection of customer objects
    BindingList<Customer> custs = new BindingList<Customer>( );
    // Call the data access method that returns a reader
    SqlDataReader reader = CustomersDataAccess.GetCustomers( );
   // Loop through each row of data
   while (reader.Read( ))

 {
   // Create a new customer object
   Customer c = new Customer( );
   // Extract the values from the columns
   c.Id = (string)reader["CustomerID"];
   c.Name = (string)reader["CompanyName"];
   c.Contact = reader.GetString(reader.GetOrdinal("ContactName"));
   // Add the object to the collection
   custs.Add(c);
 }
   // Data bind the results to the grid
   m_CustomersGrid.DataSource = custs;
}

The indexer on the data reader returns an object reference, so you will need to cast the value to the expected type of the column you are retrieving. The indexer is overloaded so that you can pass it an integer index into the row representing the position of the column in the returned rows, or you can pass it the name of the column as shown here. Using column names is far easier to read and maintain, even though it is a tiny bit less efficient for retrieval.

The GetXXX methods are defined to extract the column value for each of the built-in .NET types. The GetXXX methods take a column index, or ordinal, for the position of the column to retrieve, and attempt to retrieve the column’s value as the specified type, performing any conversion if necessary. For example, if you call the GetString method on a column that actually contains an integer, ToString will be called on the returned value to get it back as a string. If you try to use a GetXXX method that requests an incompatible type with the actual contents of the column, an exception will be thrown. To determine the position of a column based on the column name, this code uses the data reader’s GetOrdinal method to look up the position of the column based on its name.

You can see that what the code does is to create an instance of your Customer business class for each row of data in the data reader. It stuffs the values from the reader that it cares about (I simplified the Customer class for brevity here) into the customer object, and then adds the object to a collection. In this case it just data binds the collection to a data grid, but if you were just data binding, you probably wouldn’t go to this trouble. Presumably you would do this in a middle-tier business service where those Customer objects were going to be used in the processing of the system and might include behavior, validation, and other capabilities that go beyond what a data set could do for you.

XML Data Access

XML data access in .NET is a huge topic unto itself and is mostly separable from relational data access. There are a number of things that make it a difficult topic to cover quickly. For starters, there is all the underlying standards knowledge that is a prerequisite for working with XML, such as the XML standard itself, XPath, XML schema, XSLT, and XQuery. For another thing, there are multiple models for dealing with XML in .NET. Finally, the XML capabilities in .NET Framework have been significantly enhanced in version 2.0.

There is no way to briefly cover all the ways you might program with XML in .NET, so I’ll just focus on those things that will let you understand some of the ways that you might get XML data into your Windows application so you can display the data through Windows Forms controls. There is no way to directly data bind XML data to any of the Windows Forms controls that ship with the .NET Framework. To use XML data for data binding, you will need to either get the data into a data set, or you will need to read the data into custom business objects that are suitable for data binding.

The first thing to get your arms around with XML in .NET is all the different flavors of XML object models that exist. There is the XmlDocument class, which provides an implementation of the W3C XML Document Object Model (DOM) standard. This class lets you deal with XML in the form of documents that are read into memory in their entirety into a heavyweight object model. There is also the XmlDataDocument class, which is really just a derived class from XmlDocument. In addition to providing storage for XML using the DOM, this class encapsulates a data set that will let you deal with a document’s contents as relational data. The XmlDataDocument will also let you place relational data into the object through a data set, and then you can manipulate the data as XML.

The XmlReader and XmlWriter classes provide a very lightweight stream-based approach to reading XML. These classes, and the classes derived from them, can be used with a number of other capabilities in .NET, such as XML serialization, XML data access from relational stores, and raw access to XML streams from disk or over a network. Finally, there is a new object model that was introduced in .NET 1.0 based on the XPathDocument and XPathNavigator classes. These classes form the basis for the preferred method of working with XML in .NET 2.0.

The following sections describe how to read XML data into an XmlDataDocument and access it in relational form through its encapsulated data set, and then discuss loading data into an XPathDocument and how to query and navigate data within that document. Then you’ll have enough data access tools in your belt to understand all the data access that is done in support of the samples in this book, as well as enough background material to get data in many forms into your applications for prototyping and developing your Windows Forms applications.

Working with the XmlDataDocument Class

The XmlDataDocument class bridges the world of relational and hierarchical data. The class is derived from XmlDocument, so it “is an” XmlDocument in the purest sense of object-oriented inheritance (where “is an” is used to describe objects related through inheritance), and thus exposes all the capabilities of the XmlDocument class for storing and manipulating data as XML. It also encapsulates a DataSet as a property, which lets you access all or part of the contents of the XmlDataDocument as relational data.

There are two common ways to use an XmlDataDocument. The first is to load XML into the document, and then access its DataSet property for data binding, iterating through the contents with the DataSet object model, or synchronizing the contents with a database. The second is that you can take a DataSet that already contains data, construct an XmlDataDocument from it, and use XML processing or constructs to program against the data (that is, perform an XPath query against the data).

In order for loading XML into an XmlDataDocument and accessing it through its DataSet property to work, you have to set up the data set schema within the XmlDataDocument before you load the XML. The XmlDataDocument won’t infer a data set schema when reading in XML from a file or string the way that the DataSet.ReadXml method does. However, if you first supply its contained data set with a schema and then read in XML, the XmlDataDocument will then be able to associate XML elements and attributes that it reads in with the data set schema, and then that data can be accessed through the DataSet property on the XmlDataDocument.

Take a look at the XML in Listing D.10.

LISTING D.10: Customer XML Data


<?xml version="1.0" encoding="utf-8"?>
<Customers xmlns="urn:AW-Windows-Forms-Data-SimpleCustomers.xsd">
   <Customer>
      <Name>Fred Smith</Name>
      <Email>[email protected]</Email>
      <Address>
          <Street>123 Nowhere St.</Street>
          <City>Middletown</City>
          <State>OH</State>
         <Zip>54321</Zip>
      </Address>
   </Customer>
   <Customer>
      <Name>Edith Jones</Name>
      <Email>[email protected]</Email>
      <Address>
         <Street>939 TakeMeBack Road</Street>
         <City>Fern</City>
         <State>TX</State>
          <Zip>86950</Zip>
      </Address>
   </Customer>
</Customers>


You can see that this data can be viewed as containing two kinds of data, Customers and Addresses, and that there is a parent-child relationship between the two. In a data set, this would be represented by two data tables and a data relation between the two. To load this data into an XmlDataDocument and access it through the DataSet property, you first have to tell the XmlDataDocument what data set schema to use when reading in the XML. There are a few ways you could accomplish this. One would be to access the DataSet property and add tables and relations to it programmatically, as shown earlier in the “Creating a Data Set Programmatically” section. Alternatively, you could construct an empty XmlDataDocument, and then read the XML data in through the DataSet property using the ReadXml method. Another approach would be to construct an empty data set, initialize its schema, and provide that to the XmlDataDocument constructor to set up the schema of the encapsulated data set, as shown here:

private void OnLoadXmlDataDoc(object sender, EventArgs e)
{
   DataSet custData = new DataSet( );
   custData.ReadXmlSchema(@"....SimpleCustomers.xsd");
   XmlDataDocument dataDoc = new XmlDataDocument(custData);
   dataDoc.Load(@"....SimpleCustomers.xml");
   DataRow row = custData.Tables["Customer"].NewRow( );
   row["Name"] = "FooFooFoo";
   custData.Tables["Customer"].Rows.Add(row);
   m_ResultsGrid.DataSource = dataDoc.DataSet;
}

In this code, custData is a data set that is loaded with an XML schema that matches the XML shown earlier. The custData data set is passed into the constructor for the XmlDataDocument, and it will encapsulate the reference to this data set instead of creating its own. Once it has a data set with the appropriate schema, you can load the XML into the XmlDataDocument, and the data set will be populated with the XML from the document loaded that matches the schema. The SimpleCustomers.xsd schema was created by loading the XML from Listing D.10 into the Visual Studio editor (minus the xmlns namespace declaration) and selecting Generate Schema from the XML menu.

An even simpler approach is to just read the XML in through the DataSet property:

private void OnLoadXmlDataDoc(object sender, System.EventArgs e)
{
   XmlDataDocument dataDoc = new XmlDataDocument( );
   dataDoc.DataSet.ReadXml("..\..\SimpleCustomers.xml");
   m_dgResults.DataSource = dataDoc.DataSet;
}

This uses the default XmlDataDocument constructor, which doesn’t take a data set as a parameter. It will create an empty data set internally, and then calls ReadXml on the contained data set, passing in the path to the XML data file. The contained data set will then infer the schema from the XML as it is read in as before. Once it is read into the XmlDataDocument, it can be treated as an XML node set and accessed through the XmlDocument base class methods as well as those of the contained data set.

Now let’s tackle the angle of loading an existing data set containing data into an XmlDataDocument for the purposes of navigating the data as XML. You might want to do this if you have a complex data set and want to perform selections or queries across multiple tables in the data set, or if the data is hierarchical and navigating with the XML object model makes more sense for the situation. In this case, you again use the fact that the XmlDataDocument constructor can take a data set as a parameter, and it will hold a reference to that data set internally instead of constructing its own. The following code demonstrates creating a data set from a relational source, then constructing an XmlDataDocument and using the XML object model to perform operations on the data.

private void OnGetData(object sender, EventArgs e)
{
   string connStr =
     "server=localhost;database=Northwind;trusted_connection=true";
   SqlDataAdapter adapter = new SqlDataAdapter(
     "SELECT TOP 10 * FROM Customers",
     new SqlConnection(connStr));
   DataSet custData = new DataSet( );
   adapter.Fill(custData, "Customers");
   XmlDataDocument dataDoc = new XmlDataDocument(custData);
   XmlNodeList custNames =
     dataDoc.SelectNodes("//Customers/CompanyName");
   string names = string.Empty;
   foreach (XmlNode node in custNames)
   {
     names += node.InnerText + " ";
   }
   MessageBox.Show(names);
}

This code first retrieves the customer data into a data set, as shown earlier in the appendix. If you were really retrieving data solely for the purpose of manipulating it as XML, you would be better off using the XML query capability of SQL Server, returning the results as XML, and stuffing the results into an XPathDocument.

Once that data set is populated with the appropriate schema and the data, that is passed to the constructor of the XmlDataDocument. An XPath query is performed against the XML document using the SelectNodes method, and this method returns a list of XML nodes that then iterate through for constructing a simple display of the customer names.

An important feature to realize about an XmlDataDocument is that it can load and hold a lot of XML content that isn’t exposed through the DataSet property, and that additional content can be accessed through the normal document object model of the base XmlDocument class. So you could load a document containing a set of data that matches a data set schema that you want to expose for data binding, and the document could also contain a bunch of other nodes that don’t match the schema. The parts of the XML document that match the data set schema would be added to the data set, but the parts of the XML document that don’t match would simply be contained within the XmlDocument base class’ object model and could be accessed through normal XML navigation of the document. So if you need to jump back and forth between dealing with a set of data as relational and dealing with it as an XML object hierarchy, the XmlDataDocument is one of the first places to look.

Working with the XPathDocument Class

As the name implies, an XPathDocument is an object model that is based on the hierarchical model exposed by the XPath specification. This is basically a new object model for storing XML content that was introduced in .NET 1.0 as a read-only store that is lighter weight and has higher performance than the XmlDocument DOM implementation. This class and its related classes all reside in the System.Xml.XPath namespace.

You generally don’t work directly against the XPathDocument except for a few common operations. The first thing you will want to do is load data into the document. The data can come from a number of sources, including a file on disk, a network stream, a Web services call, or a database query. Once you have loaded the data into the document, the next step is to get an XPathNavigator for the document, which is the primary API for accessing the data within an XML document. You can also obtain XPathNavigator objects for XmlDocuments and XmlDataDocuments so that you can use a consistent programming model for XML across all three document types. Additionally, with XmlDocument and XmlDataDocument objects, you can edit the contained data directly through the XPathNavigator.

Other than those operations, most of what you will care about when working with an XPathDocument will involve programming against the XPathNavigator that you use to perform queries against the document and to navigate through sets of nodes.

Loading Data into an XPathDocument

There are two common ways of getting XML data into an XPathDocument. The first is from a file, which is a simple matter of passing the file path or URL to the constructor of the XPathDocument class:

XPathDocument doc = new XPathDocument("doc.xml");

The constructor can also take a stream, an XmlReader, or a TextReader. Whichever you use, the contents of the document will be loaded into memory and parsed into the underlying object model of the XPathDocument.

To get the data out of SQL Server, you can issue a FOR XML query, and load the XPathDocument with the XmlReader returned from a call to ExecuteXmlReader on the SqlCommand object.


private void OnLoadFromDB(object sender, System.EventArgs e)
{
   SqlConnection conn = new SqlConnection(
     "server=localhost;database=Northwind;trusted_connection=true");
   SqlCommand cmd = new SqlCommand(
      "SELECT * FROM Customers FOR XML AUTO, Elements", conn);
   try
   {
      conn.Open( );
      XmlReader reader = cmd.ExecuteXmlReader( );
      m_XPathDoc = new XPathDocument(reader);

   }
   finally
   {
      conn.Close( );
   }
   DumpDocument( );
}

Querying XML Data

Once you have data in memory in an XML document of some form (XmlDocument, XmlDataDocument, or XPathDocument), you are likely to want to query it to select some subset of the nodes contained in the document based on some criteria you want to match against. For example, you may want to select all Customers from the state of California, get all sales data for the last six months, or look up the airspeed velocity of a laden swallow. Whatever you need to look up, there are easy and powerful ways to perform queries against XML content in .NET.

To perform a query against XML data, you have to:

1.   Load the data into a document object.

2.   Select a set of nodes in that document by specifying the query (in XPath).

3.   Iterate through the results.

Let’s just focus on the one query method that all of the document types support: XPathNavigator queries. The XPathNavigator class is the front end to the underlying query engine for any of the XML document types in .NET. In the last section you saw an example where the SelectNodes method of the XmlDocument was called. That is a streamlined method for node selection that really just uses an XPathNavigator under the covers. The XPathNavigator uses a cursor-style navigation through a node set that you can use to iterate through the document. It also exposes a number of selection methods that let you execute queries against the contents of the document or node it is pointing to. This section just focuses on querying; the next goes into a little more about navigation.

To get an XPathNavigator for any of the XML document types, you call CreateNavigator against the document or one of its nodes. CreateNavigator passes back an instance of an XPathNavigator with the cursor positioned on the node from which it was created. Once you have the navigator, you call one of the query methods shown in Table D.5 to obtain an XPathNodeIterator containing the matching nodes.

TABLE D.5: XPathNavigator Query Methods

Image

The XPathNodeIterator class follows the pattern of other iterators in the .NET Framework. It starts out positioned just before the first item in the collection it contains. You repeatedly call the MoveNext method, usually in a while loop, and inspect the Boolean return value. If the value is true, then there was another node to move to and the iterator will be positioned on that node. If it returns false, there are no more nodes to iterate through. The following code demonstrates the process of loading a document, performing a query, and iterating through the results.

private void NodeIteration( )
{
   // Load the document
   XPathDocument doc = new XPathDocument("Customers.xml");

  // Get the navigator
  XPathNavigator nav = doc.CreateNavigator( );

  // Perform the query
  XPathNodeIterator iter = nav.Select("//CompanyName");

  // Iterate through the results
  while (iter.MoveNext( ))
  {
     Console.WriteLine(iter.Current.Value);
  }
}

The Current property on the iterator returns a reference to an XPathNavigator positioned on the current node, which you can then use to access the node contents or perform navigation or further queries based on that node. This example accesses the Value property of the current node, which in the case of an element or attribute is just the contained text content. There is also a Name property, which returns the name of the node.

The complexity of the results returned depends on the complexity of the query that you issue. For something simple like this query that just returns a set of elements that contain text nodes with the values you are after, the code is pretty straightforward. However, the results of an XPath query could return any type of node, so you will often need to have some conditional logic that checks the node type of each node, as it iterates through them, and modifies what it does based on the node type. The node type is exposed through the NodeType property on the XPathNavigator, and it returns an enumeration of type XPathNodeType.

Navigating an XML Document

Each of the different XML document types has a variety of specialized navigation methods that depend on the specific object model that they expose. However, all of the XML document types let you obtain an XPathNavigator for the document, which allows you to query and navigate those documents in a consistent way using the preferred object model for working with XML in .NET. In the previous section, you saw how to perform queries against a document, returning an iterator that you could use to step through each of the nodes that matched the query and perform processing on it. To work with the node, you obtained an XPathNavigator reference to the node from the iterator’s Current property.

Once you have an XPathNavigator to a particular node, you will often need to perform some navigation through the object model based on your current position within it. For example, if you have selected a set of Order elements, you may need to navigate through the attributes on that element to extract their values. Or perhaps you will perform a query to obtain an element representing a Customer, and then want to navigate all the child elements to extract their values without performing individual queries for each element.

The XPathNavigator exposes a set of MoveToXXX methods that let you navigate the object model it exposes, where the XXX portion of the method name indicates the node to which it moves the navigator’s cursor (see Table D.6). Each of the MoveToXXX methods returns a Boolean indicating whether the move was successful, meaning there was a node in the intended location to move to. Note that attributes and namespaces are treated as special kinds of nodes because they can only be contained within the opening tag of an element node. To navigate attributes or namespaces, you use the separate set of MoveToXXX methods shown in Table D.7.

TABLE D.6: XPathNavigator Navigation Methods

Image

Image

TABLE D.7: XPathNavigator Attribute and Namespace Navigation Methods

Image

Where Are We?

In this appendix, I have given you a high-speed introduction to data access with ADO.NET, both for relational data and for XML data. You learned about the various capabilities of the DataSet class for containing relational data, which lets you retrieve data into the data set, make modifications to the data, and save that data back to a data source. The data set is the richest relational object for containing and manipulating data in the client, and it supports the widest range of data-binding scenarios. You saw how to get data into a custom object collection with which you could bind Windows Forms controls, and some of the different ways of interacting with both the client-side data and with the data sources from which you obtain data. The appendix stepped through some basics of dealing with data in XML form, and it showed how to load data into a document, work with that data as both hierarchical and relational data with an XmlDataDocument, and navigate and query the data using the XPathNavigator class.

Data access is a deep and important topic that is covered in much more detail by a number of other books. The intent of this appendix was simply to provide you enough information to be able to get data into your client applications for prototyping and to understand where the data was coming from for the chapters in this book dealing with manipulating and presenting that data in Windows Forms applications. By now your head is probably spinning if you haven’t been exposed to data access in .NET before. If you have, then hopefully this provided a good refresher and reference for the basics while you are working with the rest of the book.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.187.201