An important part of any real-world application is obtaining and managing the data that drives the application. The application could be something as simple as a game that is retrieving and storing user preferences and scoring data. At the other end of the spectrum, it could be a complex business application that retrieves, manipulates, and updates data from multiple data sources over a network or the Web and within the scope of a distributed transaction using asynchronous message-based communications. The spectrum of what constitutes data access is extremely broad and complicated, and is the topic of many other books.
This book, on the other hand, is focused on the presentation of data using Windows Forms, and this appendix isn’t intended to teach you everything there is to know about data access. What I do want to accomplish is to give you a solid foundation in how to get data into and out of your Windows Forms application architecture. I also want to ensure that you understand enough about ADO.NET data access to follow the samples in the chapters of the book and, more importantly, how the different forms of data affect the presentation of that data.
So basically I am going to show you just enough to make you dangerous. You should be able to develop data-bound UI prototypes without needing a data access expert to come in and build you a data layer before you can get started designing and coding the presentation tier. On the other hand, you shouldn’t build a client application that is tightly bound to the data tier. When it comes to data access, there are a lot of different ways to accomplish the same thing. This appendix covers data access at the lower levels—the basics of data access with ADO.NET. Chapter 2 describes how to work with typed data sets and how to do the kind of data access that you will want to use most often in your data applications.
If you have a solid foundation in ADO.NET, you can feel free to skip this appendix or just use it as a quick review of core concepts. If you will be coding up a data access layer, I’d suggest you pick up some data access books to learn all the various options and best practices for designing a data layer and handling more complex scenarios. (Pragmatic ADO.NET by Shawn Wildermuth [Addison-Wesley, 2003] is an outstanding place to start.)
Data can take on many forms and come from many sources when coding with ADO.NET. This appendix only focuses on two sources of data: relational data coming from a database like Microsoft SQL Server, and XML coming from a file or stream. The two worlds are certainly not mutually exclusive. SQL Server includes a lot of functionality for generating and consuming XML, and SQL Server 2005 includes a native data type for storing XML. You can load XML into a DataSet
and work with it as relational data, and you can transfer relational data into an XmlDataDocument
and work with it as an XML document. You might also load XML and iterate through the nodes in a document to extract data that you put into business objects that you then use for data binding.
How you code against relational and XML data is very different, as is how you might use them for data binding. This appendix focuses on the various ways to get data into business entities and data sets that you can use in your Windows Forms applications to present that data, and how to take any updates made to the data and push them back to the underlying data store.
Because this is such a dense topic, I am going to have to make a few assumptions about your knowledge. First, I assume that you understand the basic concepts behind storing data in a database and executing queries to retrieve and update it. Second, I will assume you have some basic knowledge of XML storage, the structure of a document, and the kinds of nodes that a document can contain. You certainly don’t need to be an expert in these things, but you should at least have enough background to understand the terminology and what the basic processes and capabilities of those technologies are.
When dealing with relational data access, you generally follow the same process each time you need to touch the underlying data store to retrieve data or update it. You need
1. A connection to the data store.
2. A way to express the query that you are going to perform.
3. Something to contain any returned results from the query.
The classes defined in ADO.NET are used to encapsulate the state and behavior associated with each of these steps.
A key concept with relational data access in .NET is that it is based on a disconnected data model. Obviously you can’t execute a query if you aren’t really connected to the database. Disconnected data access means that you are expected to open a connection, use that connection to issue a query or a set of queries, and then close the connection. You may continue to work with the data resulting from the query for some time on the client. At some point in the future, you may reconnect to the database to either issue update queries or to refresh the data you have, but you don’t keep a connection open to the database the entire time you are working with the data. This makes it a very different data model to work with than legacy database APIs, which typically favored a connected model. This is also different from the concept of offline operations of a smart client application. Offline capabilities are focused on being able to cache the operational data on the client to allow it to continue to support certain use cases even when not connected to a network. Disconnected data access generally assumes you have a network connection available, but that you just aren’t keeping database connections open throughout the functioning of your application.
When dealing with data types in ADO.NET, there are two kinds of types that you come across: generic data classes and provider-specific classes The generic relational data classes and interfaces implement abstract constructs in the relational model—tables, rows, relations, constraints—and can be used to contain data coming from a variety of data sources. These generic data classes are all part of the System.Data
namespace, which is the root namespace for ADO.NET class library constructs.
For the generic classes to be useful, though, you usually need to be able to connect to a data store and perform the queries that populate the generic data containers or update the underlying data store based on changes in them. The classes you use to make that connection and execute queries are the provider-specific classes that reside in child namespaces of System.Data
and are specific to each provider. Table D.1 lists the providers that ship with the .NET Framework.
The managed provider classes are expected to implement a common set of interfaces that are defined in the System.Data
namespace. These interfaces define the basic methods and properties for a set of classes that every managed provider is expected to implement. These include classes for establishing data source connections, creating commands to execute queries, creating data adapters used to fill data sets, and creating parameters that you associate with a command to pass to a query. In ADO.NET 2.0, there is also a set of abstract base classes for data providers in the System.Data.ProviderBase
namespace that lets you program against different data sources in a provider-agnostic way. This allows you to switch out the underlying data source without needing to change any of your application code. This topic is a little beyond the scope of what I cover here, but you should be aware of the capability.
However, if you need to tap into provider-specific features such as XML handling in SQL Server, then you will need to program against the specific provider classes instead of the interfaces or generic classes for those capabilities. In general, you should pick a consistent approach in coding against the provider objects instead of switching back and forth between sometimes using the interface and sometimes using the provider-specific classes. You will have to figure out what the best approach is based on your application needs and the degree to which you want your code to be portable across multiple providers.
As a simple example of working with a specific provider, the following code shows how to open a connection to a SQL Server database and execute a command to fill a DataSet
with a SqlDataAdapter
:
private DataSet GetCustomersFromDb( )
{
// Create the disconnected data store
DataSet ds = new DataSet( );
// Create the connection object
SqlConnection conn = new SqlConnection(
"server=localhost;database=Northwind;trusted_connection=true");
// Create the command that wraps the query
SqlCommand cmd = new SqlCommand(
"SELECT CustomerID, CompanyName FROM Customers", conn);
//Create the DataAdapter that bridges the provider-
// agnostic DataSet and the provider-specific connection
// and command
SqlDataAdapter adapter = new SqlDataAdapter(cmd);
// Opens the connection, executes the query,
// puts the result in the DataSet, closes the connection
adapter.Fill(ds, "Customers");
// Return the result
return ds;
}
I’ll get into more details on data sets and data adapters shortly, but the intent here is to start with the basics. The process is:
1. Create a data set object that you are going to populate.
2. Create a connection object for the data store that you are going to operate against. Note that creating an instance of a connection object doesn’t open the connection.
3. Create a command object that encapsulates the query that you are going to execute and associate it with the connection in its constructor.
4. Create a data adapter and associate it with the command that will be used to populate it.
5. Call the Fill
method to populate the data set and return the results.
I’ll get into more detail on all the magic that is happening in that Fill
method later in this appendix. At this point I just want you to get an overview.
When you need to get an entire set of data into memory, use a data set so you can deal with it as a whole on the client or in the middle tier. Each managed provider also implements another way to retrieve the data from a query, called a data reader. For example, the SQL Server managed provider implements a class called SqlDataReader
. You can think of a data reader as a fast, forward-only, fire hose-like cursor that you can use to iterate through the results of a query. There are ways to bind a data reader to Windows Forms controls through a BindingSource
component, but it isn’t a practice I would encourage, so I won’t go into a lot of detail on it. Binding a data reader to Windows Forms controls tightly couples your presentation tier with your data tier. If it is a very small-scale application, and you are looking for the fastest way to get data presented to the user, that may be okay. However, for enterprise-scale business applications, that is something you should avoid. Using data readers in the presentation tier potentially keeps the connection open longer than necessary, limiting scalability, and also tightly couples the client to the data schema of the data tier. However, data readers are the fastest way to iterate through the rows returned from a query, so you may want to use them if you are going to push the results of a query into a collection of business object classes instead of using data sets. I’ll show an example of this later in this appendix.
The most common data type you will deal with in Windows Forms data binding is the DataSet
, or a derived typed data set class. Data binding to custom objects and collections is also very easy in .NET 2.0, so that will be a very common approach as well and is covered in more detail in several places in the book. But data sets are specifically designed for data binding in .NET, and this appendix is focused on relational data access, so let’s get our hands dirty there first.
A data set basically provides you with an in-memory data collection that can be used for everything from containing a single row of data to a complex schema of many tables with relations and constraints between them. Some would even go so far as to call it an in-memory database, but there are hazards in thinking of it that way if you start bringing too much data into memory. Figure D.1 shows the structure of a data set and the objects that are contained within it.
For simple tabular access to data returned from a query, you will usually only deal with a single data table in a data set, and that will be the target of your data binding. When you do so, the data set just becomes a fancy wrapper around a set of rows. You can actually just create a data table and populate it, and .NET 2.0 lets you create DataTable
objects on their own, without a containing data set.
There are also many scenarios where you may want to retrieve rows from a table that have a parent-child relationship with rows from another table. In those cases, you will start to deal with data sets containing multiple data table objects, one or more data relation objects that maintain the parent-child relationship between the tables, and constraint objects that enforce the integrity of the contained data. A data table object contains a collection of data columns that describe the name and type of each column in a table (the metadata of the column), and it also contains a collection of data row objects that contain the actual data itself.
You can populate a data set in one of three primary ways: by loading it from a file, programmatically, or through a query to a database. In .NET 2.0, you can also work with DataTable
objects in isolation from a data set, and you can also load a data table using a data reader with the Load
method. Let’s go from the simplest to the most complex by starting with loading and saving data sets from files, specifically XML files.
The DataSet
class supports two methods for saving to and loading from a file: WriteXml
and ReadXml
. If you have data in a data set and you want to save it to disk or a stream, you call WriteXml
. When you want to read it back in, you call ReadXml
. Because the underlying file format is XML, you could create an XML document some other way that is readable by a data set. It actually doesn’t have to come from a file at all; the ReadXml
method will accept a stream, which could represent a file, network stream, or memory stream. For example, you might receive an XML document over a Web service that you could read into a data set and then display it in a DataGridView
control. There is also a GetXml
method that lets you get back the contents of a data set as an XML string.
The ReadXml
method is fairly adept at inferring an appropriate schema from a simple XML file and being able to transform that XML into a data set. It will treat the root element as the container for the data set. It will then treat each element type that contains other elements as a row from a table, and each element that contains only text as a column in that row. It can also handle schemas where the elements represent rows from a table, but the column values are encoded as attributes on that element instead of child elements. For example, the straightforward XML in Listing D.1 will be loaded into a data set that contains a table of Customers and a table of Orders, and will have foreign key constraints and relationships between the two tables representing the nested nature. The corresponding relational schema is shown in Figure D.2.
LISTING D.1: Customers Data in XML
<?xml version="1.0" standalone="yes"?>
<CustomersDataSet>
<Customers>
<CustomerID>ALFKI</CustomerID>
<CompanyName>Alfreds Futterkiste</CompanyName>
<Orders>
<OrderID>10643</OrderID>
</Orders>
<Orders>
<OrderID>10692</OrderID>
</Orders>
</Customers>
<Customers>
<CustomerID>ANATR</CustomerID>
<CompanyName>Ana Trujillo Emparedados y helados</CompanyName>
<Orders>
<OrderID>10308</OrderID>
</Orders>
<Orders>
<OrderID>10625</OrderID>
</Orders>
</Customers>
</CustomersDataSet>
In the code in Listing D.1, if the XML contains an embedded XML schema, the data set will try to use it to establish the relational schema of the tables and relationships, and then will load the data into that schema. If the file was created by saving an existing data set, then things should work out fine when you read it back in. However, not every XML file is suitable for loading into a data set. If you have a schema where the same element type can exist at multiple levels in the hierarchy, you will get an ArgumentException
if you try to load it into a data set, because the data set will fail to coerce it into a relational schema. For example, if you added a Customers element as a child to an Orders element in the XML in Listing D.1 and tried to read it in with ReadXml
, an ArgumentException
would be thrown. Additionally, if you have malformed XML, an XmlException
will be thrown when you call ReadXml
, so you will definitely want to have the call to ReadXml
wrapped in an exception handler.
To read a data set in from XML, the code is quite simple:
private void LoadXmlDataSet(string xmlFilePath)
{
try
{
DataSet ds = new DataSet( );
ds.ReadXml(xmlFilePath);
}
catch (ArgumentException ex)
{
MessageBox.Show("Error reading XML into DataSet: " +
ex.GetType( ).ToString( ) + " - " + ex.Message);
}
}
The ReadXml
method can take a path to a file, an open stream object, or an XML or text reader object. It also optionally takes an XmlReadMode
enumeration that controls how the inference of the schema and XML operate.
To create and populate a data set programmatically, you need nothing more than to understand the object model and how to call methods and properties on those objects. Programmatically constructing data sets and populating them is fairly unusual, because it requires tedious and error-prone coding that will need to be updated any time the schema of your data changes. About the only time you will probably want to create data sets programmatically is if you have some hard-coded or nonstructured information in your program that you want to present in a tabular form; then you might stuff it into a data set so that you can bind the data set to a DataGridView
control. But in that case it would be better to separate that structure into an XML file and load it from there so that you don’t have to change your source code to change the structure or contents of the presented data. Another case where you might programmatically deal with a data set in this way is if you want to add a computed column to a table within a data set. The code in Listing D.2 shows how to create a simple data set, including a computed column, and load some data into it.
LISTING D.2: Programmatic DataSet Creation and Population
private DataSet CreateProgrammaticDataSet( )
{
// Create the DataSet
DataSet ds = new DataSet( );
// Create a table with three columns
DataTable dt = new DataTable("Customers");
DataColumn col1 = new DataColumn("CustomerID",typeof(int));
DataColumn col2 = new DataColumn("CompanyName", typeof(string));
DataColumn col3 = new DataColumn("Computed Column",
typeof(string), "CustomerID + ':' + CompanyName");
dt.Columns.Add(col1);
dt.Columns.Add(col2);
dt.Columns.Add(col3);
// Create a primary key constraint on the ID column
UniqueConstraint pk = new UniqueConstraint("PK_Customers",col1,true);
// Add the table to the DataSet
ds.Tables.Add(dt);
// Add a row to the DataSet
DataRow row = dt.NewRow( );
row["CustomerID"] = 1;
row["CompanyName"] = "Fred";
dt.Rows.Add(row);
// Return the results
return ds;
}
The code in Listing D.2 constructs a data table and adds the desired columns to it, as well as a primary key constraint. Note that the computed column uses a third argument to the DataColumn
constructor, which is the expression to use to compute the value of that column at runtime. It adds the new table to the data set and then creates a new row. Once the values of the columns in the new row are set, the row is added to the table’s Rows
collection. Note that you have to create the row from the table so that it will have the correct schema (columns) to be part of the table, but it doesn’t actually become part of the table until it is added to the Rows
collection.
If you care about maintainability, you will probably cringe at the code that sets the values of the two columns through the row indexer. This indexer exposes an object reference, into which you can attempt to stuff any object. The compiler won’t be able to help you because it doesn’t have enough type information to figure out what the actual type of the column is, so you won’t know until runtime if you have tried to place an inappropriate value into the column. The other problem with this code is the hard-coded strings for the column names. Using column names in this case is more maintainable than the alternative, which is to pass the ordinal position of the column to the row indexer. However, if your schema changes, you are going to have to root out all the places in your code that you hard-coded column names, which is error prone at best. Typed data sets (described in Chapter 2) fix both of these problems, which is why you should favor them for most scenarios involving data sets.
You may occasionally load data into a data set from an XML file that has been cached on the client machine, and you may even (more rarely) have occasion to programmatically create a data set as shown in the previous section. However, the vast majority of the time you will be using data sets in conjunction with some form of relational data store. When that is the case, you will be loading the data into a data set using one of the managed providers introduced earlier, or you will be using typed data sets, where the schema is built into the type instead of needing to be determined dynamically.
I am going to focus on the use of the SQL Server managed provider, as it is probably the most common database used in .NET applications. If you need to use the OLE DB, Oracle, or ODBC providers, the coding patterns are virtually identical, thanks to the fact that they are all based on the same interfaces. Keep in mind that the SQL Server managed provider can be used with either a full-up instance of SQL Server or with a SQL Server 2005 Express instance, which is just a scaled-down free version of the SQL Server engine.
I recommend that you use SQL Express for creating client-side or small application databases, rather than Microsoft Access databases. The SQL Express engine is much more robust for client-server applications, and you can use many features of the SQL Server engine that aren’t available in Access, such as stored procedures and triggers. You can easily establish a connection to a SQL Express database just by specifying a file path to the MDF file that contains the database.
You have already seen one case of loading a data set from a SQL Server table earlier in the section Relational Data Access. You can also use a SQL Server 2005 Express database connection, as shown in Listing D.3.
LISTING D.3: Filling a DataSet Through a SQL 2005 Express Connection
private DataSet GetCustomersFromDb( )
{
// Create the DataSet
DataSet ds = new DataSet( );
// Create the connection to the SQL Express DB
SqlConnection conn = new SqlConnection(
@"server=.SQLEXPRESS;
AttachDbFileName=|DataDirectory|SimpleDatabase.mdf;
trusted_connection=true");
// Create the command and the adapter that uses it
SqlCommand cmd = new SqlCommand("SELECT * FROM Customers", conn);
SqlDataAdapter adapter = new SqlDataAdapter(cmd);
// Opens the connection, executes the query,
// puts the result in the DataSet, closes the connection
adapter.Fill(ds, "Customers");
// Return the result
return ds;
}
Let’s dissect the code in Listing D.3. First, to create a connection, you construct an instance of the SqlConnection
class. The constructor for this class takes a connection string that can contain any number of parameters, as defined by the SQL Server managed provider. The set of allowable parameters in the connection string are determined by each managed provider and will be different for each. There are a number of different parameter names for some of the typical parameters; consult the documentation for the ConnectionString
property of the SqlConnection
class for a comprehensive list.
The most common parameters to use for a SQL Server database are the server
, database
, trusted_connection
, user id
, and password
parameters. There are other parameter names that accomplish the same thing as these (such as pwd
instead of password
and Integrated Security
instead of trusted_connection
), so you can choose which ones you want to use for common connection scenarios. You can see from this example that with SQL Express 2005, you can use the AttachDBFilename
parameter to specify the database based on a file path. The |DataDirectory|
keyword is a special syntax that instructs the connection object to look in the working directory of the executing application. The connection string also specifies the server instance to be the default SQLEXPRESS instance, and trusted_connection=true
uses integrated security, which will log you into the database using the credentials of the Windows account under which the application is running. You can also pass an explicit username and password in the connection string if you want to use a SQL Server login.
Normally, you wouldn’t want to hard-code connection strings in your application code as shown in Listing D.3. A better way is to place them into your configuration file, and both Visual Studio 2005 and .NET 2.0 support a rich new model for placing connection strings and other custom user and application settings in your configuration file, while still allowing you to programmatically access those settings in a strongly typed way. These features are discussed in detail in Chapters 2 and 5.
Once you have a connection to work with, you create a SqlCommand
object, passing the SQL text query to execute and the connection to use to the constructor. A command object can be provided with SQL statements or with the name of a stored procedure to use. For real data access layers, I recommend you wrap your data access at the database level in stored procedures and only consume those stored procedures from your data layer. Doing so decouples your application code from the specific schema of your tables, preventing small changes to the underlying schema from affecting application code. Stored procedures also let you use SQL Server’s own security mechanisms to prevent direct access to the tables, and stored procedures sometimes have higher performance than dynamic SQL statements executed from the code in Listing D.3 (although the performance difference is insignificant for most queries in .NET). I have broken this guidance in many of the code samples in this book to be able to use the Northwind database with minimal modification, and so that you could clearly see what is being retrieved by the queries.
Next, the code in Listing D.3 creates a SqlDataAdapter
object, which is the bridge between the source-neutral data set and the specific data source that it will work with. In this case, I construct it by passing in the command object. Finally, I call Fill
on the data adapter, which performs a number of steps. First, it sees if the connection in the underlying command object is already open. If not, the Fill
method will open the connection. Next, it executes the query against the database. The adapter then takes the returned rows, determines if a DataTable
with the required schema already exists in the data set, and if not creates one, and then places the returned rows into it. If the data adapter opened the connection, it will close it before returning from the method call. Unless you specify a table name when you fill the data set, the data table instance name will simply be Table, and as you add additional tables, they will be named Table1, Table2, and so on. Figure D.3 shows the entire stack typically used for working with data sets and relational data stores.
Typically a query or stored procedure will only return a single table of results, but you could return multiple tables with a single query or stored procedure as shown in Listing D.4.
LISTING D.4: Loading Multiple Tables in a Single Query
private DataSet LoadMultipleDatasets( )
{
// Create the DataSet and connection
DataSet ds = new DataSet( );
SqlConnection conn = new SqlConnection( );
conn.ConnectionString =
"server=localhost;database=Northwind;trusted_connection=true";
// Create the command and the adapter
SqlCommand cmd = new SqlCommand( );
cmd.Connection = conn;
cmd.CommandText = "SELECT * FROM Customers; SELECT * FROM Orders";
SqlDataAdapter adapter = new SqlDataAdapter( );
adapter.SelectCommand = cmd;
// Opens the connection, executes the query,
// puts the result in the DataSet, closes the connection
adapter.Fill(ds);
// Name the tables correctly
ds.Tables[0].TableName = "Customers";
ds.Tables[1].TableName = "Orders";
// Return the result
return ds;
}
In Listing D.4, two tables of results are returned in the Fill
method, and the data set is populated with two tables, one for Customers and one for Orders. The TableName
property of each of the resulting tables is then set to the appropriate string, because the default naming convention would just have them named Table and Table1. Note also the different syntax in creating the data objects in this method. Instead of specifying the various parameters for each object in their constructor, they are supplied through properties after construction instead.
Another way of getting the tables named correctly, as well as to handle the case when the name of a column in the database is different from the name of a column in the target data set, is to use the TableMappings
property on the data adapter. The TableMappings
property holds a collection of DataTableMapping
objects, which is defined in the System.Data.Common
namespace. Each DataTableMapping
object lets you specify a mapping between the schema of a result set returned from the database and the schema of the data table into which the table adapter will place the results. The following code shows an example of using TableMappings
to modify the table name and column names when the data adapter fills the data set.
public DataSet GetCustomersSchemaDiff( )
{
// Create the data set
DataSet data = new DataSet( );
// Create the connection to the SQL Express DB
SqlConnection conn = new SqlConnection(
"server=localhost;database=Northwind;trusted_connection=true");
// Create the command and the adapter that uses it
SqlCommand cmd = new SqlCommand(
"SELECT CustomerID, CompanyName FROM Customers", conn);
SqlDataAdapter adapter = new SqlDataAdapter(cmd);
// Set up the table mapping to name the table, and to
// change CustomerID to ID, CompanyName column to Name
DataColumnMapping[ ] columnMappings = new DataColumnMapping[2];
columnMappings[0] = new DataColumnMapping("CustomerID", "ID");
columnMappings[1] = new DataColumnMapping("CompanyName", "Name");
DataTableMapping mapping = new DataTableMapping("Table", "Customers", columnMappings);
adapter.TableMappings.Add(mapping);
// Opens the connection, executes the query,
// puts the result in the DataSet, closes the connection
adapter.Fill(data);
return data; // one table named Customers, with columns ID, Name
}
In this code, the data coming in from the SELECT statement would have been placed in a data table named Table with columns CustomerID and CompanyName. The TableMapping
makes it so the table will be named Customers instead, and it will have columns ID and Name with the respective columns from the database mapped into them.
Most of the data classes in ADO.NET give you the option of using overloaded constructors and methods to specify all of the parameters for the object to do its work, or you can set them through properties. It is really just a matter of preference. The properties approach shown earlier requires more lines of code, but is a little easier to read for some, and is easier to debug because you can step through each line and inspect the property values as they are set. I have favored the constructor approach in this book to avoid killing too many trees in the production of this book. I’ll leave it up to you to decide the best approach for your code.
I’m going to show how to use a data reader to load a custom object model later on, but while we are on the subject of loading data sets, I thought I should highlight a new capability in .NET 2.0. If all you are dealing with is a single table at a time, the need to create a whole data set and index into the tables just to get to your data can be a lot of unnecessary overhead. In .NET 2.0, you can create, populate, and use a DataTable
on its own without ever needing to deal with the complexities of a data set. You can also fill it in a quick and efficient way using a data reader instead of having to go through a data adapter or table adapter (table adapters are discussed in Chapter 2). The following code demonstrates this in action.
public DataTable GetCustomers ( )
{
string connStr =
"server=localhost;database=Northwind;trusted_connection=true";
DataTable customersTable = new DataTable("Customers");
using (SqlConnection conn = new SqlConnection(connStr))
{
SqlCommand selectCmd = new SqlCommand(
"SELECT CustomerID, CompanyName FROM Customers", conn);
conn.Open( );
SqlDataReader reader = selectCmd.ExecuteReader( );
customersTable.Load(reader);
}
return customersTable;
}
There are a number of new concepts in this one block of code, so let’s step through them one at a time. The first thing the code does is to create an empty data table and name it Customers. Any time you need to execute a query, you need a connection object, so that is created next within a using statement. If you execute a command on your own (not through a data adapter), you will have to explicitly open the connection first. Any time you open a connection, you need to make sure you close it when you are done, and you need to make sure that happens even if an exception is thrown in the code that precedes the call to Close
. One way to do that is to put the code that opens the connection and uses it in a try
block, with the call to Close
in a finally
block. Another way that is used in this code is the using
statement in C#.
In this code, the connection object is created in a using
block, which causes the Dispose
method to be called on the connection when the scope of the using
block is left. The Dispose
method on a connection object calls Close
, and a using
block will always set up a try-finally
block for you, calling Dispose
in the finally
block. So the using
block approach gives you a clean, readable, safe way to use your connection objects and make sure they are closed before the scope is left.
The next thing that is created is the command object that will execute the SELECT statement used to fill the data reader. That is created just like in the previous examples—by passing the SQL statement and the connection object. The connection is then opened, and the command is executed with the ExecuteReader
method. This method returns a SqlDataReader
object containing the results of the query. The data reader is then passed to the new Load
method on the data table, which will iterate over the data reader to pull its contents into the data table.
As mentioned earlier, you can do a lot more with multiple tables in a single data set than just containing them as a collection of tables. You can also create parent-child relationships between them, and add primary, unique, and foreign key constraints. Data sets containing this kind of parent-child relationship are often called master-details data sets, because they are typically used in data-binding scenarios where the selection of a row in the parent table (the master) results in the display of all of the associated child rows (the details). The method in Listing D.5 again retrieves both the Customers and Orders tables, this time using two separate data adapters, and creates a master-details or parent-child relationship between them.
LISTING D.5: Creating a Master-Details DataSet
private DataSet GetCustomersAndOrders( )
{
DataSet ds = new DataSet( );
SqlConnection conn = new SqlConnection(
"server=localhost;database=Northwind;trusted_connection=true");
SqlCommand customersCmd = new SqlCommand("SELECT CustomerID,
CompanyName, ContactName FROM Customers", conn);
SqlDataAdapter customersAdapter = new SqlDataAdapter(customersCmd);
// Set the adapter to add primary key information
customersAdapter.MissingSchemaAction =
MissingSchemaAction.AddWithKey;
SqlCommand ordersCmd = new SqlCommand("SELECT OrderID, CustomerID,
OrderDate FROM Orders", conn);
SqlDataAdapter ordersAdapter = new SqlDataAdapter(ordersCmd);
// Set the adapter to add primary key information
ordersAdapter.MissingSchemaAction = MissingSchemaAction.AddWithKey;
// Add the customers to the DataSet
customersAdapter.Fill(ds,"Customers");
// Add the orders to the DataSet
ordersAdapter.Fill(ds, "Orders");
// Create the parent-child relation
DataColumn parentCol = ds.Tables["Customers"].Columns["CustomerID"];
DataColumn childCol = ds.Tables["Orders"].Columns["CustomerID"];
DataRelation dr = new DataRelation(
"Customers_Orders",parentCol,childCol);
// Set the relation as nested for output XML
// (Orders are child elements of Customers)
dr.Nested = true;
// Add the relation to the DataSet's collection
// Adds a corresponding foreign key constraint as well
ds.Relations.Add(dr);
// Return the result
return ds;
}
Setting the data adapter’s MissingSchemaAction
property to AddWithKey
before calling Fill
will direct the data adapter to also determine the column(s) that comprise the primary key, if any, when the data adapter executes the query, and it will create corresponding unique constraints in the resulting table. To create the data relation, you need to specify what the primary key column in the parent table is, as well as the foreign key column in the child table. The easiest way to do this is using the DataColumn
objects representing those columns. This will also implicitly specify which table is the parent table and which table is the child table, because columns can only belong to one table. Alternatively, you could specify the parent and child tables and columns by name. Adding the data relation also creates a corresponding foreign key constraint to enforce the rule that a corresponding row in the parent table must exist when a row in the child table refers to it. Notice that the code also sets the Nested
property on the data relation to true
. This only affects the XML that is output if you save the data set with WriteXml
or get it as a string with GetXml
. It would then show child rows as nested elements under the parent row elements.
When you work with stored procedures, you often need to provide parameters to them that affect their execution. The stored procedure might simply wrap a query that will return rows of data, or it might update, insert, or delete data in the database. In any of these cases, you usually have to tell the stored procedure which sets of data to operate on, and you do this through parameters.
Let’s start with a simple example. In the Northwind database, there is a stored procedure defined named CustOrderHist. You provide it with a customer ID, and it will return a result set containing a summary of products and their quantity ordered by that customer. To call a stored procedure using ADO.NET, you use the SqlCommand
class, as demonstrated earlier in this appendix. However, you need to tell the command object that you are calling a stored procedure instead of providing a textual query. You also need to provide the command object with any parameters that need to be passed to the stored procedure that it will execute. You do this as shown in Listing D.6.
LISTING D.6: Retrieving Data Through a Stored Procedure
private DataSet GetCustomerOrderHistory(string custId)
{
// Validate the input
if (custId.Length != 5)
throw new ArgumentException(
"The customer ID must be a 5 character string.","custId");
DataSet ds = new DataSet("CustomerOrderHistoryDataSet");
SqlConnection conn = new SqlConnection(
"server=localhost;database=Northwind;trusted_connection=true");
// Create the stored proc command to get customer history
SqlCommand custHistCmd = new SqlCommand("CustOrderHist", conn);
custHistCmd.CommandType = CommandType.StoredProcedure;
// Create the parameter for the customer ID
SqlParameter custIdParam = new SqlParameter(
"@CustomerID",SqlDbType.NChar,5);
custIdParam.Value = custId;
// Add the parameter to the command
custHistCmd.Parameters.Add(custIdParam);
SqlDataAdapter custHistAdapter = new SqlDataAdapter(custHistCmd);
custHistAdapter.Fill(ds,"CustomerHistory");
return ds;
}
In this case, you create the command object by specifying the name of the stored procedure you want to call instead of supplying a SQL query string, and you set the CommandType
property on the command to StoredProcedure
(the default is CommandType.Text
). You then need to create a SqlParameter
object (discussed later in this appendix) to encapsulate each of the parameters needed by the stored procedure and add them to the command’s Parameters
collection. Once everything is set up, you call the Fill
method on the data adapter as before, and it will call the specified stored procedure, placing the result set returned into a table in the data set.
So far we have only been looking at retrieving data from a database. What happens when you have modified the data and want to save those changes back to the database? Well, another powerful capability of the DataSet
class is the ability to keep track of changes made to the data contained in it. Changes come in three forms. You can
• Add new rows to a table
• Delete rows from a table
• Change the values in one or several of the columns of existing rows in a table
The data set handles this through a combination of holding two copies of each row, and through a RowState
property on each row, which takes a value from the DataRowState
enumeration. When you first retrieve a set of rows from the database, they are added to their table with a RowState
value of Unchanged
. When you modify a column value within an existing row, the state is set to Modified
. When you add a new row or delete an existing one, that row’s state will be marked Added
or Deleted
, respectively. And if you create a new row from a table, but have not yet added it to the Rows
collection, the row’s state will be Detached
.
The row state lets a data adapter detect which rows should be used to perform inserts, updates, or deletes in the database table. When you change the values of an existing row, the current values for the columns are kept in one copy of the row, but the original values of that row from when the data was retrieved is also kept in a second copy of the row. This allows the original values to be used for optimistic concurrency detection (discussed the next section). The RowState
values and their effects are described in Table D.2.
You update the database with a data set by using a data adapter. Keep in mind that when I say “update” the database, I mean execute any update, insert, or delete queries needed. Try not to confuse this with the specific SQL UPDATE queries, which only affect modified rows and are only one of three forms of “updates.”
So far, when constructing and using a data adapter to retrieve data, we have been setting its SelectCommand
property to an instance of a SqlCommand
that wraps a SQL text query or a stored procedure that returns the rows used to initially populate the table. To perform updates using a data adapter, you also need to provide command objects for the adapter’s InsertCommand
, DeleteCommand
, and UpdateCommand
properties, depending on which of those you expect to perform. As you might expect, the InsertCommand
will be used for any rows marked as Added
, the DeleteCommand
for any rows marked as Deleted
, and the UpdateCommand
for any rows marked as Modified
. If a data set doesn’t contain any rows with a particular row state, the corresponding command doesn’t have to be supplied since it won’t be used. Generally speaking, you will want to provide all three because you cannot be sure that no rows will be in the modified, added, or deleted state when an update is performed.
Just like the select command used to fill a data set, the commands used for updating the database with a data set can use either SQL text queries or stored procedures, passing the columns of the updated rows as parameters. For simple applications that need to work directly against a single table in the database at a time, you can use parameterized SQL text queries. For larger scale applications, you may want to use stored procedures.
Either way, the commands you associate with the data adapter will be executed once for each row in the data set that needs to perform an update. This can result in a lot of network round-trips to the database for a large data set that has a lot of changes in it. To address this problem, in ADO.NET 2.0 the SqlDataAdapter
class includes a batch updating feature that will group all of the queries and send them all to the database in one or several chunks. To use this feature, you set the UpdateBatchSize
property on the SqlDataAdapter
to some value other than 1 (the default). If you set the value to 0 (zero), then all of the updates will be batched into a single bunch. If you set some positive value greater than 1, the data adapter will batch sets of that many update queries and send them in chunks until all of the updates have completed. You do need to keep in mind, though, that when you send a batch of update queries using this approach, they will all execute as a single transaction in SQL Server. So if any one row in a batch fails to perform its update, the entire batch will fail.
To automatically formulate the update queries for working directly against the tables, the SqlCommandBuilder
class can examine the select command that is in use by a data adapter and dynamically build insert, update, and delete commands for you. It will then populate the command properties of the data adapter with the generated queries. The command builder works well for simple data sets that contain tables generated from SELECT statements against a single table with a primary key constraint. This all happens by constructing a command builder object and passing in the data adapter that it is to create commands for. You won’t actually call any of its methods or properties directly. The constructor for the command builder reaches into the data adapter and sets its InsertCommand
, UpdateCommand
, and DeleteCommand
properties to suitable queries based on the columns that are populated with the SelectCommand
. The following code demonstrates the use of a command builder to generate the update queries and use them to push the changes from a data set to the database. In this code, the data set and the data adapter are members on the enclosing class because they are used in a different method to populate the data set with a SELECT query.
private void btnSaveChanges_Click(object sender, System.EventArgs e)
{
// Get the connection off the existing select command of the
adapter
SqlConnection conn = m_Adapter.SelectCommand.Connection;
// Create the insert, update, and delete commands
// Simply constructing the command builder generates
// and populates those commands on the adapter
SqlCommandBuilder cmdBuilder = new SqlCommandBuilder(m_Adapter);
// Call the update method on the adapter to save all changes
// in the data set to the database
m_Adapter.Update(m_CustomersDataSet,"Customers");
// Refill the dataset to make sure any triggered changes
// in the DB are reflected
m_Adapter.Fill(m_CustomersDataSet,"Customers");
}
Using the SqlCommandBuilder
is very handy for simple cases. However, it has a number of downsides, and you should avoid it in general for production code in favor of explicitly writing your own SQL statements or using stored procedures. The queries generated by the command builder always compare the values of all columns that were retrieved by the SELECT query to detect optimistic concurrency violations, as described in the next section. However, this is usually not the best approach. This also means the queries carry a lot more data along with them for each update than may be necessary. For larger scale systems, working with stored procedures is a better approach for reasons of security, maintainability, and performance. See the later section “Updating with Data Sets and Stored Procedures” for more details.
One fundamental problem that almost always needs to be addressed when performing updates to a database is how to handle concurrent updates to the same data. Figure D.4 depicts a simple example of how concurrent updates can cause problems. In the figure, client 1 retrieves a row from the table. Client 2 comes along next and retrieves the same row. Client 2 is a little quicker in processing the row and commits its updates back to the database before client 1. The question that has to be addressed for concurrency is: What should happen when client 1 subsequently tries to write its changes to that same row?
You could allow a “last-in-wins” strategy, and always blindly write the values of an update to the row without considering its current contents. This is effectively no concurrency control. If you did this with the scenario in Figure D.4, the changes made by client 2 would be lost. If you are working with a connected data model, there are a number of ways to handle this situation with server-side locks on tables or rows that prevent concurrent access or updates depending on the level of isolation you want to provide. That is a very complex topic of its own that isn’t really relevant in the disconnected data model employed by ADO.NET.
To handle this in ADO.NET, the usual approach is to employ something called optimistic concurrency, which basically means that you are going to assume that between the time you retrieve a row of data and the time you go back to the data source to update it, you don’t expect any other query or client to come along and modify that row. If that assumption is violated, which could happen quite often in a highly concurrent system, your code needs to be able to deal with it in some way.
To deal with a concurrency violation, you first have to detect that it has occurred. This is done in ADO.NET by comparing the current contents of database columns (at the time an update is attempted) to those that were originally retrieved from the database. If the current contents of the database row are different from those that the ADO.NET client code has cached, then the optimistic concurrency assumption has been violated, and something needs to be done about it. As discussed earlier, the data set maintains the original values for any row that gets modified. This makes it very easy to use those original values to make optimistic concurrency decisions once a conflict has been detected.
Depending on the situation, you might want to compare the contents of every column in the row to the original values retrieved by ADO.NET. This is the approach used by the queries generated by the SqlCommandBuilder
shown in the last section, as well as by the queries generated by Visual Studio .NET in a table adapter when you use the data set designer discussed in Chapter 2. For example, a parameterized update query for the Region table in Northwind might look like this:
UPDATE Region
SET RegionID = @RegionID, RegionDescription = @RegionDescription
WHERE (RegionID = @Original_RegionID) AND (
RegionDescription = @Original_RegionDescription)
The @RegionID
and @RegionDescription
parameters would be populated by the current values in the row being updated, and the @Original_RegionID
and @Original_RegionDescription
would be populated with the original row values maintained by the data set.
However, for a large table, comparing the values of every column means carrying a lot of data back and forth across the connection to the database. A good compromise if you have control of the database schema is to include a timestamp, rowversion, or last-modified datetime column that gets updated every time any of the columns in the row gets updated. You can then just compare the value of this single column to detect concurrency violations (see Chapter 10 for an example of this approach). Another option is to use some subset of the columns in the row to determine whether a violation has occurred. For example, if the table storing the large images also contained their file name, size, and a hash of the image contents, you could compare those columns to decide whether the image had been updated concurrently or not.
The SqlDataAdapter
class has built-in logic for detecting when optimistic concurrency has been violated, based on the @@ROWCOUNT
SQL variable that indicates the number of affected rows when a query is executed. If the query that you execute to perform an update, insert, or delete indicates that the number of affected rows was zero, the data adapter will raise a DBConcurrencyException
. You can tap into this detection logic both when you are using SQL text queries like the one in this code example and when you are doing your updates through a stored procedure. As long as whatever query gets executed by the command affects zero rows, the data adapter will throw the exception, which you can catch and handle appropriately. This happens because the rows affected should be equal to one, indicating that the update was successful.
However, throwing an exception may not always be the right thing to do. What if the data set contains 100 rows to update, and the violation happens on the 42nd row? The first 41 updates will have already been persisted to the database unless the update commands are associated with a single transaction (discussed later in this appendix), and the remaining 58 updates won’t happen because the execution of the data adapter Update
method is interrupted when the DBConcurrencyException
is thrown. Instead, you may want to allow updating to continue even if an error occurs, and then decide what to do about any errors after all the rows have been processed.
The data set and data adapter support this scenario as well. Each DataRow
in the tables of the data set contains a collection of errors exposed through the Errors
property. If a problem occurs performing an update on a row, the errors collection will get populated with information about what happened by the exception that was thrown. That could include not only concurrency exceptions, but other built-in or custom SQL Server errors raised by the query or stored procedure execution. You can set the ContinueUpdateOnError
property of the data adapter to true
, and then if any exceptions are thrown in the processing of a row, the errors collection will still be populated for that row but processing will continue with the remaining rows. You are then expected to iterate through all the rows in the table being updated after completion of the Update
method, and inspect the Errors
property on each row and react accordingly. If you are doing batch updating in ADO.NET 2.0, the entire batch will be committed or aborted by the database within the scope of a transaction. If an exception is raised in the processing of a batch, all the rows of that batch will fail.
How you handle the presence of errors, or a thrown exception if you don’t set ContinueUpdateOnError
equal to true
, is going to be application dependent. You may want to just log the problem, or prompt users with a list of failed updates and the current database values, and provide them with some kind of data merging user interface to resolve the conflict. If you are binding the data to a Windows Forms DataGridView
or DataGrid
, the grid itself will detect the presence of errors and will depict the problem with an icon and tooltip help.
Not surprisingly, in order to talk about updates with stored procedures, we need some stored procedures that perform updates. If you run the script in Listing D.7 against the Northwind database, you will get simple SELECT, UPDATE, INSERT, and DELETE stored procedures for working with the Region table that are appropriate for working with a data set that will contain the Region table data.
LISTING D.7: Update Stored Procedures for Regions
CREATE PROCEDURE GetRegions
AS
SELECT RegionID, RegionDescription FROM Region
RETURN
GO
CREATE PROCEDURE UpdateRegion
@RegionID int,
@RegionDescription nchar(50)
AS
UPDATE Region
SET RegionDescription = @RegionDescription
WHERE RegionID = @RegionID
RETURN
GO
CREATE PROCEDURE InsertRegion
@RegionID int,
@RegionDescription nchar(50)
AS
INSERT INTO Region
(RegionID, RegionDescription)
VALUES (@RegionID, @RegionDescription)
RETURN
GO
CREATE PROCEDURE DeleteRegion
@RegionID int
AS
DELETE FROM Region WHERE RegionID = @RegionID
RETURN
GO
To keep things simple here and focused on the calling of stored procedures for updates, these stored procedures have no concurrency protection. As mentioned earlier when discussing command builders, queries required to do optimistic concurrency checking based on every column in the row get messy and are very inefficient. A better way is to use a timestamp or rowversion column in your table, or have a column that gets updated with the current date and time every time a row is changed. You can then use that for concurrency violation detection, and you only need to check a single column and carry around one extra parameter to see if anyone has updated the row since one was retrieved.
Note that in this case, the RegionID column in the Regions table isn’t an identity autonumbered column, so you have to pass in the ID as a parameter even for inserts. If you have an identity column that is going to be generated on the server side when the insert is performed, you will want to make the identity column an output parameter on the stored procedure, and then set that parameter within the stored procedure to the @@IDENTITY value. This way the new identity value gets passed back out of the stored procedure and will be placed in the row that originated the insert in the data table on the client side.
CREATE PROCEDURE InsertSimpleTable
(
@Val nvarchar(50),
@ID int OUTPUT
)
AS
INSERT INTO SimpleTable (Val) Values (@Val)
SET @ID = @@IDENTITY
Once you have the stored procedures to call, you need to write a little code to wire up all the commands needed to retrieve and update the data set using those stored procedures. The following code shows a simple method that calls the GetRegions
stored procedure to populate a data set that is a member of the containing class.
private void GetRegions( )
{
// Clear the current contents
m_RegionsDataSet.Clear( );
SqlConnection conn = new SqlConnection(
"server=localhost;database=Northwind;trusted_connection=true");
SqlCommand selectCmd = new SqlCommand("GetRegions",conn);
selectCmd.CommandType = CommandType.StoredProcedure;
// Set the select command on the adapter
m_Adapter.MissingSchemaAction = MissingSchemaAction.AddWithKey;
m_Adapter.SelectCommand = selectCmd;
// Fill the member data set
m_Adapter.Fill(m_RegionsDataSet,"Regions");
}
The GetRegions
method first clears the current contents of a data set that is held as a member variable of the class containing this method. It then sets up a connection object and passes that to a command object’s constructor along with the name of the stored procedure that will be called. Any time you are going to call a stored procedure, you have to set the CommandType
property on the command object to the StoredProcedure
enumerated value. The adapter is set up to retrieve the primary key information with the AddWithKey
enumerated value on the MissingSchemaAction
property, and its SelectCommand
property is set to the command object just created. Finally, Fill
is called, which will execute the stored procedure, placing the returned result set in a table named Regions within the data set.
The code in Listing D.8 shows an update
method that uses a data set containing the Regions table of the Northwind database to make updates if any inserts, deletes, or modifications have been made to the table. The data set and the data adapter in this case are members of the containing class; this way, they can be accessed by both the method that populates the data set and by the method that performs updates and returns the number of rows updated.
LISTING D.8: Calling Update Stored Procedures
private int UpdateRegions( )
{
// Get the connection from the adapter's select command
SqlConnection conn = m_Adapter.SelectCommand.Connection;
// Create the insert command
SqlCommand insertCmd = new SqlCommand("InsertRegion",conn);
insertCmd.CommandType = CommandType.StoredProcedure;
insertCmd.Parameters.Add(
new SqlParameter("@RegionID",SqlDbType.Int,4,"RegionID"));
insertCmd.Parameters.Add(
new SqlParameter("@RegionDescription", SqlDbType.NChar, 50,
"RegionDescription"));
// Create the update command
SqlCommand updateCmd = new SqlCommand("UpdateRegion", conn);
updateCmd.CommandType = CommandType.StoredProcedure;
updateCmd.Parameters.Add(
new SqlParameter("@RegionID",SqlDbType.Int,4,"RegionID"));
updateCmd.Parameters.Add(
new SqlParameter("@RegionDescription", SqlDbType.NChar, 50,
"RegionDescription"));
// Create the delete command
SqlCommand deleteCmd = new SqlCommand("DeleteRegion", conn);
deleteCmd.CommandType = CommandType.StoredProcedure;
deleteCmd.Parameters.Add(
new SqlParameter("@RegionID",SqlDbType.Int,4,"RegionID"));
// Associate each command with the adapter
m_Adapter.InsertCommand = insertCmd;
m_Adapter.UpdateCommand = updateCmd;
m_Adapter.DeleteCommand = deleteCmd;
// Perform the update
int count = m_Adapter.Update(m_RegionsDataSet,"Regions");
// Refill the dataset to reflect any concurrent changes
m_Adapter.Fill(m_RegionsDataSet,"Regions");
return count;
}
This code is a little more verbose than the other examples in this appendix, because the update, insert, and delete stored procedures each take some parameters to decide what updates to perform. Let’s step through it line by line.
The method first retrieves the connection to be used from the select command associated with the data adapter that is a member of the class. It then creates each of the stored procedure commands for the update, insert, and delete procedures. It creates SqlParameter
objects to wrap each parameter to each of the commands and adds them to the Parameters
collection on the command. The method then associates each of the commands with the data adapter and calls the Update
method on the data adapter to execute the commands for each row in the data set that contains changes. The Update
method on a data adapter returns the number of rows that were inserted, deleted, or updated using the data set. The method also calls Fill
again on the data adapter before returning, so that any concurrent changes made in the database, or any columns that are computed by triggers, are refreshed in the data cached in the data set.
The SqlParameter
class encapsulates the passing of parameters to stored procedures and parameterized queries. It is responsible for managing the translation of parameters between two different type systems: the type system of .NET and the type system of the database. The .NET type system contains primitives such as Int32
, Float
, Double
, DateTime
, and String
, as well as every other class in the .NET Framework and every custom class and structure that you write. This type system stores values in a data set, because a data set is just another type in the .NET world that is capable of containing other .NET types through object references. SQL Server (or any other data source) has its own notion of a type system, and that system is used to store the data in the database.
So the SqlParameter
class takes care of wrapping the values in the type system of .NET and passing them to the database in a form that is compatible with the type system in play there. You construct each SqlParameter
object by specifying the name of the query parameter that it will provide, the database type of the query parameter, and the size of that parameter if it is variable length in the database (such as a varchar
in SQL Server). There are different overloads of the SqlParameter
constructor that let you skip specifying a size, and you can always set this information through properties as well. The name parameter passed to the constructor needs to match the name of the query parameter exactly, including the @ symbol for SQL parameters. This is specific to the SQL Server managed provider. If you use the OLE DB managed provider, you will use the OleDbParameter
class. However, OLE DB parameters are always passed by position (order) in the statement, so the name is ignored.
When you are using a data set to perform updates through a data adapter, you also need to provide a column mapping between the column in the table and the parameter that it will be passed to. As mentioned before, these SqlParameter
objects could be initialized using the properties exposed by the class, but this example instead uses one of the overloaded constructors that takes all of the values needed in one line of code. The column name that you specify for that last parameter allows the data adapter to automatically extract the current column value for each row and set it as the parameter value at the time the update occurs.
If there is any type mismatch between the value in the table and the type of the parameter, you won’t know until runtime. Just as the query can throw exceptions, the setup code that precedes the query can throw exceptions as well. You will want to make sure that you have proper exception handling code wrapping any data access methods or the code that calls them to make sure your application doesn’t blow up on you.
If you are working with stored procedures that have output parameters, such as the INSERT stored procedure shown earlier that returned the value of an identity column, you would need to specify the parameter’s direction when you add it to the command object:
SqlParameter idParam = insertCmd.Parameters.Add(
new SqlParameter("@ID",SqlDbType.Int,4,"ID"));
idParam.Direction = ParameterDirection.Output;
This would map the ID column in the data table to the output @ID parameter on the stored procedure. After the data adapter called the InsertCommand
to perform the insert, it would take the value that was in the idParam
after the command was complete and would put it into the respective column in the data table.
Once you get data into a data set, you often need to be able to search and select data from within it, either to perform updates or processing on some subset of the data, or for display purposes. Keep in mind that the DataSet
class and its related class aren’t designed to be high-efficiency query engines. Although you can select and filter the data in a data table on the client, it may be faster in many cases to simply execute a new, more specialized query against the database to get the specific results you are looking for. This is especially true when working with a local database instance like with a SQL Express database.
However, if you already have the data in memory in a DataSet
and want to select items without incurring a round-trip to the database, there are several approaches you can use. The simplest is to use the Select
method of the DataTable
class to perform SQL-like queries against the data contained within a table. You can also use data views to filter the data based on a selection criteria, or you can use the XmlDataDocument
class to load a data set into an XML document container and perform XPath queries against it (data views and the XmlDataDocument
are discussed in subsequent sections).
The Select
method has a number of overloads, which let you search based on a string criteria that resembles a WHERE clause in SQL, and optionally provide a sort order or a DataRowViewState
enumeration to search among records of a particular state or states. The Select
method returns an array of DataRow
objects containing references to the rows that matched the selection criteria. Once you have that array of rows, you can iterate over them and perform any processing you need on them. You are working against live references to the actual rows in the data set, so any changes made to them immediately affect the data in the data set. The following code shows a Select
operation on a Customers data set using two of the overloads of the method.
private DataRow[ ] SelectRows(string selString)
{
DataRow[ ] rows = null;
// Perform the selection
if (selString == string.Empty || selString == "*")
{
// Select all
rows = m_Customers.Tables["Customers"].Select( );
}
else
{
// Select based on the criteria
rows = m_Customers.Tables["Customers"].Select(selString);
}
return rows;
}
The DataSet
class also supports a couple of ways of moving data between different data sets. Your first instinct might be to simply get a DataRow
reference from one data set, perhaps from a Select
operation or from iterating through, and then add that to the table in another data set. Unfortunately, this isn’t allowed and will throw an exception. When a data row is added to a table, it becomes a part of that table, and it cannot belong to more than one table at a time. Only a row that has been created with the DataTable.NewRow
method can be added to that table with the Add
method on the table’s Rows
collection.
So how do you get a row or set of rows of data that belong to one data set into another? One way is to use the Merge
method. The behavior of this method can get fairly complex because it has a number of overloads, and it depends on what the state of the rows (Added
, Modified
, Deleted
) are in the data set they are coming from (the source), and whether there are corresponding rows in the data set they are being merged into (the target). For simple cases, though, things are pretty straightforward.
Consider a situation where you have the Customers table in one data set and the Orders table in another data set. You want to get them both into a single data set so that you can create a parent-child relationship between them based on the CustomerID foreign key that exists in the Orders table. To do this, you would simply call Merge
on the data set containing the Customers table, passing in the Orders data set. The Merge
method would create an Orders table in the Customers data set and copy over all the rows from the Orders data set. You could then construct the relation between the two tables in code as follows:
private DataSet MergeCustomersAndOrders( )
{
// Code to create and fill the two data sets...
// customersDS contains just Customers,
// ordersDS contains just Orders
// Merge the orders into the customers
customersDS.Merge(ordersDS);
// Create the relation between the two tables
DataColumn custIDParentCol = customersDS.Tables["Customers"].Columns["CustomerID"];
DataColumn custIDChildCol =
customersDS.Tables["Orders"].Columns["CustomerID"];
DataRelation custOrders =
new
DataRelation("CustomersOrders",custIDParentCol,custIDChildCol);
return customersDS; // Contains both tables and relation
}
If you want to add a row from a table in one data set into the same table in another data set with the row state of Added
, you can pass the ItemArray
property of the source row to an overload of the Add
method on the Rows
collection of the target table. This transfers the values of the cells as an array of objects into the target table, creating a new row to contain them. If you want to add a row from a source table into a target and preserve its state, you can use the ImportRow
method on the DataTable
class. The following shows both of these approaches.
private void AddAndImport(DataSet dsTarget, DataSet dsSource)
{
// Add the values from a single row from the source
// to the target with the row state set to Added
dsTarget.Tables["Customers"].Rows.Add(
dsSource.Tables["Customers"].Rows[0].ItemArray);
// Import a single row preserving row state
dsTarget.Tables["Customers"].ImportRow(
dsSource.Tables["Customers"].Rows[0]);
}
A common use of Merge
is in conjunction with the Select
method to extract a set of rows and place them in another data set, either for display or transfer to another method or client.
private void OnSelect(object sender, EventArgs e)
{
// Select the rows using helper method
DataRow[ ] rows = SelectRows(m_SelectTextBox.Text);
// Show the results
if (rows.Length > 0)
{
// Create a new data set
DataSet localDS = new DataSet( );
// Merge the selected rows into it
localDS.Merge(rows);
// Bind the grid to this
m_ResultsGrid.DataSource = localDS.Tables["Customers"];
}
}
Another way to get a row into a table as a new row is to use the LoadDataRow
method on the table, passing in an array of objects that contain the column values. Typically you will get this object array from the ItemArray
on the source row.
Because there isn’t any way to directly set the row state of rows in a data table, if you want to get a row into a particular state, such as modified or deleted, there are some tricks you can play. For either of those states, you can first set the row state to unmodified by calling the AcceptChanges
method on the row. To get the state set to modified, set the value of one of the columns to its current value by setting the column value against itself as shown next. To set the deleted state, just call Delete
on the row:
DataSet data = new DataSet( );
// Code to fill data set with Customers ...
DataRow modRow = data.Tables["Customers"].Rows[0];
modRow.AcceptChanges( ); // Set state to unchanged
// Set state to modified:
modRow["CompanyName"] = modRow["CompanyName"];
DataRow delRow = data.Tables["Customers"].Rows[1];
delRow.AcceptChanges( ); // Set state to unchanged
delRow.Delete( ); // Set state to deleted
Another way to manipulate the data contained in a data set for processing or viewing is using a data view. A data view doesn’t contain any data itself; as its name implies, it is just a view into the data in the table underneath it. You can think of it as a lens that you place over a table that makes it look different to the consumer of the view, even though the underlying data is still whatever data is stored in the table that the view represents. You can modify the data exposed through a data view, and when you do, you are actually modifying the data in the underlying table directly.
Every instance of a DataTable
already has a default DataView
instance associated with it, and it is actually this view that is used when you bind a table to a grid. You can make modifications to the default data view, which will affect any controls that are bound to the underlying data table, or you can construct new instances of a DataView
object to wrap a table for the purposes of sorting or filtering the data.
The DataView
class implements Sort
, RowFilter
, and RowStateFilter
properties that let you modify what data the view exposes. You set the Sort
property to an expression that specifies what column(s) to sort on and in what order. You can set the RowFilter
expression using the same syntax supported by the DataTable.Select
method to restrict the rows presented by the view based on some criteria that is like a WHERE clause in a SQL statement. Finally, you can use the RowStateFilter
to only look at rows in a particular state, such as all Deleted
or Added
rows. You can use all of these in combination with one another to specify fairly complex filtering and sorting criteria to expose data contained within a table in just about any way you need to:
private void OnView(object sender, EventArgs e)
{
// Create the view wrapping the customers table
DataView view = new DataView(m_Customers.Tables["Customers"]);
// Sort by CompanyName ascending, ContactName descending
view.Sort = "CompanyName ASC, ContactName DESC";
// Only show German customers
view.RowFilter = "Country = 'Germany'";
// Only show inserted or edited rows
view.RowStateFilter = DataViewRowState.Added |
DataViewRowState.Modified;
// Bind the view to a grid for viewing
m_ResultsGrid.DataSource = view;
}
Sometimes you need to be able to execute multiple commands within the scope of a single database transaction, so that if any one of the commands fails, they all fail, and no changes are made to the database. You can do this inside of SQL Server and other databases by beginning a transaction within a stored procedure, and committing it or rolling it back based on the outcome of all the queries encapsulated within the stored procedure. However, sometimes you will want to control the transaction from within your ADO.NET code, wrapping the execution of multiple commands within a single transaction.
It is quite easy to do this. You can use the SqlTransaction
class (or corresponding classes for other managed providers) in conjunction with the connection to manage a transaction from your data access code. For example, say you wanted to write code that first checked whether a row existed, and if it did, then make some modification to that row. Let’s also say that you needed to do this with two explicit SQL text queries from within your managed code. You want to write bulletproof code, so you want to ensure that the row that will be affected by the update cannot be modified or deleted by another client or query between the time you check for its existence and when you update it. Using a transaction with an appropriate isolation level will work just fine for this scenario. Take a look at the code in Listing D.9.
LISTING D.9: Executing Multiple Queries Within a Transaction
private void OnTransactionalUpdate(object sender, System.EventArgs e)
{
// Create the connection
SqlConnection conn = new SqlConnection(m_ConnString);
// Create the two commands you want to wrap in a transaction
SqlCommand selectCmd = new SqlCommand(
"SELECT CustomerID FROM Customers WHERE CustomerID = 'ALFKI'",
conn);
SqlCommand updateCmd = new SqlCommand(
"UPDATE Customers SET CompanyName = 'FooBros' WHERE CustomerID =
'ALFKI'", conn);
// Declare the transaction
SqlTransaction trans = null;
try
{
// Open the connection
conn.Open( );
// Start a transaction with the repeatable read isolation
trans = conn.BeginTransaction(IsolationLevel.RepeatableRead);
// Associate the transaction with the commands
selectCmd.Transaction = trans;
updateCmd.Transaction = trans;
// Check for the existence of the customer
string custID = (string)selectCmd.ExecuteScalar( );
if (custID == null)
{
throw new ApplicationException("Customer not found");
}
// Update the customer
updateCmd.ExecuteNonQuery( );
// Commit the transaction if you got to here
trans.Commit( );
}
catch (SqlException ex)
{
if (trans != null)
{
trans.Rollback( );
}
MessageBox.Show(ex.Message);
}
finally
{
conn.Close( );
}
}
There are a number of things to discuss here. The first is how to create and employ the transaction. The code creates the connection and command objects the way you have seen before. The transaction is created by calling BeginTransaction
on the connection, optionally specifying an isolation level (see Table D.3). The default isolation level is ReadCommitted
, but to ensure that no one can modify the record once it has been read with a SELECT statement, you need RepeatableRead
.
Once the transaction is created, you need to associate it with any commands you want to enlist within the transaction scope. After that, you can execute the commands. If everything turns out the way you want, you should call Commit
on the transaction object to make all the changes done within the transaction permanent. If anything goes wrong or an exception gets thrown, you should Rollback
the transaction to prevent any changes made by the commands enlisted within the command from becoming permanent. The best way to make sure this pattern is followed is with a try-catch
block as shown in Listing D.9.
In addition to the try-catch
, you can see that because the connection was explicitly opened in Listing D.9, I made sure to close it in the finally
block, so that no matter what, the connection gets closed before I leave the method. A couple of other new things you see here but that haven’t been discussed yet are the ExecuteScalar
and ExecuteNonQuery
methods on the SqlCommand
objects. ExecuteScalar
is a convenience method for queries that are expected to return a single row with a single column as their result set. The method will extract the value in that column for you and pass it back as a return value from the method. It is returned as an object reference, so you will have to cast to the expected type, as shown in Listing D.9. The ExecuteNonQuery
method is for executing commands for which you don’t expect any returned rows, such as update, insert, and delete queries.
A whole new approach to writing transactional code was designed into .NET 2.0 in the System.Transactions
namespace. Through the classes defined in that namespace, you can now easily start or join transactions in your code without being tied directly to the transactional capabilities of the database or other specific transactional resource managers. The transactions that are created through the System.Transactions
classes can be either lightweight transactions against a single resource manager, such as a single SQL Server 2005 database, or they can be distributed transactions involving multiple resource managers, such as a SQL 2000 database, an Oracle database, and an MSMQ message queue. Another great feature of the new transaction capabilities is that a transaction will automatically promote itself from a lightweight transaction to a distributed transaction whenever it sees that a new resource manager is accessed within a transaction scope that requires a distributed transaction, so you don’t have to worry about keeping straight the different models in your code.
To use System.Transactions
, you first have to add a reference to your project to the System.Transactions.dll assembly from the .NET Framework. You then need to include the System.Transactions
namespace in the code files where you will be using it. When you want to execute some code within a transaction, you set up a transactional scope by creating an instance of the TransactionScope
class. You then execute the code that you want to be transactional, and if everything succeeds, you call Complete
on the transaction scope object. The transaction won’t actually be committed until you dispose of the scope object, and if Dispose
is called without first calling Complete
on the scope object, the transaction will be aborted.
The easiest way to do this is with a using
block in C#. When you use the using
statement in C#, you pass an object reference to the using
clause that is a disposable object. When the using
block is exited, Dispose
will be called on that object automatically, even if an exception is propagating. This is because the code that is generated by the compiler for a using
block includes a try-finally
block, and Dispose
is called on the using
clause argument in the finally
block.
So if you create a TransactionScope
object and pass it to a using
block, when you exit that using
block, the transaction that the scope represents will either commit or rollback, depending on if the Complete
method was called.
Here is an example of this in action:
using System.Transactions;
public partial class Form1 : Form
{
private void OnExecute(object sender, EventArgs e)
{
NorthwindDataAccess dac = new NorthwindDataAccess( );
using (TransactionScope scope = new TransactionScope( ))
{
DataSet data = dac.GetCustomerData( );
data.Tables["Customers"].Rows[0]["Phone"] = "030-0074321";
dac.UpdateCustomers(data);
scope.Complete( );
}
}
}
In this code, a data access component named NorthwindDataAccess
is used to retrieve some records from the database, make a modification to one of them, and then push the updates back to the database. If you want both the retrieval and the update to be part of a single transaction without having to worry about connection and transaction management yourself, you simply create a TransactionScope
instance to bracket the data access component calls. You pass that instance to the using
statement, and call Complete
on the scope object at the very end of the using
block after all of the query methods have been called. If you reach the call to Complete
, the data access calls must have succeeded, because otherwise an exception would be propagating. When the point of execution hits the end of the using
block, the scope object will be disposed of. If the Complete
method was called, the transaction will be committed, and if not (such as when an error occurs in the update query), then the transaction will be rolled back.
You can use this approach to wrap multiple queries to one or more databases in a single transaction, ensuring that all of the queries either succeed or fail as one. The big benefit for using a TransactionScope
to create and manage a transaction is that it requires very little code, and it is simple to declare and use. This code can also be several layers up the call stack from the actual data access code, so you don’t need access to the database connection to set up the transaction. The code looks the same if the code uses one or many connections under the covers. There are a lot of other capabilities and ways to use a transaction using the classes in the System.Transactions
namespace that are beyond the scope of this book. (“Introducing System.Transactions in the Microsoft .NET Framework, Version 2.0” by Juval Löwy, available on MSDN Downloads, provides comprehensive coverage of the capabilities of the System.Transactions
namespace classes. See www.microsoft.com/downloads/details.aspx?familyid=AAC3D722-444C-4E27-8B2E-C6157ED16B15&displaylang=en.)
If the code that sets up a transaction scope as shown in this example is called from other code that already has a transaction in progress, the new transaction scope will become a nested transaction to the calling code’s transaction and will only commit if the containing transaction commits. There are a number of complex combinations you can achieve using nested transaction scopes, cloning of transactions, and directly managing the underlying Transaction
object yourself if you need to address more advanced scenarios.
The one downside to using a transaction scope as shown here is that unless you are working against a SQL 2005 database, the transaction that is created will be a distributed transaction even if you are simply accessing a single database, such as SQL Server 2000. This can have some negative performance impacts on your application, compared to managing a transaction yourself as shown earlier in Listing D.9. You will have to weigh the performance impact against the cleaner and easier to write code resulting from using the System.Transactions
approach for your own applications. I would recommend starting with the System.Transactions
approach and only reverting to low-level transaction management if you need to address a performance problem in a particular part of your application that is using transactions.
Sometimes you may need to work with data on the client side in a data set’s form and have similar logic to that just presented for server transactions, so you can modify the data and back out changes if something goes wrong. You can simulate a transaction while working with client-side data by using the AcceptChanges
and RejectChanges
methods. As discussed earlier, any changes made to a data set are maintained through a combination of row state and the current and original values for each row.
If you have made any changes to the rows in your data set and call AcceptChanges
, the original values for all modified rows will be replaced with the current values, and the state of all rows will be changed to Unchanged
. If you call RejectChanges
, the current values for any modified rows will be replaced with the original values, inserted rows will be discarded, and the state of all rows will also be set to Unchanged
. The AcceptChanges
and RejectChanges
methods are defined on the DataSet
, DataTable
, and DataRow
classes, so you can perform these transaction-like operations at any level of granularity within the data set that you need.
Using a combination of AcceptChanges
and RejectChanges
, you can code logic that accepts or rejects all changes to a table or data set based on some criteria. You should be aware when using these methods, however, that you generally want to avoid using them if the data in the data set will be used to update a database through a data adapter. As mentioned earlier, the data adapter figures out which commands to execute (update, insert, or delete) for which rows based on their row state. Rows with a row state of Unchanged
don’t get any command executed against them when performing an update. Because both AcceptChanges
and RejectChanges
set the state of rows to Unchanged
, you will only want to use these methods with a data set to update the database when using them with the workaround discussed earlier to get a row into the Modified
or Deleted
state after first setting them to Unchanged
with AcceptChanges
or RejectChanges
.
So far I have just discussed programming against the methods and properties of data sets and data adapters, and handling exceptions when something goes wrong. However, sometimes you need to be more in the loop as the processing is going on. There are a number of events that you can tap into from the DataSet
, DataTable
, and SqlDataAdapter
classes that are useful for letting your code be notified when changes are occurring.
The DataSet
class has two events:
• MergeFailed
, which will be fired if you are performing a Merge operation and there is a primary key conflict between the target and source tables.
• Initialized
, which will be fired when the DataSet
has completed initialization of its contained objects and state.
The DataTable
has the most useful events, which are described in Table D.4.
The events on the DataTable
follow a common pattern for events that relate to modifications of some entity: there is a pair of events for each form of modification. One event fires before the change has occurred, and another event fires after the change is complete. This lets you code any pre- and post-processing of the change that you need to, including preventing the change from occurring.
The ColumnChanging/ColumnChanged
events pass a parameter of type DataColumnChangeEventArgs
to your event handler. This parameter contains several useful values for controlling the update of a column. A read/write property named ProposedValue
contains the value that the column will be changed to when the event handler completes. You can inspect this value in your ColumnChanging
handler, and in combination with the Row
and Column
properties on the event arguments parameter, make validation decisions about changing that value.
If the value being proposed doesn’t meet your validation constraints, you have two choices. First, you can change it to some acceptable value by assigning a different value of the appropriate type to the ProposedValue
property on the event arguments parameter. If you do this, whatever value you assign to that parameter will be the one that is actually set for the column when the change is made. So you could also change the Proposed
value to the original value by extracting the original value through the Column
and Row
property, as shown here, to effectively prevent the change:
private void OnCustomersColumnChanging(object sender,
DataColumnChangeEventArgs e)
{
if (m_ConfirmChangesCheckBox.Checked)
{
DialogResult res = MessageBox.Show(
"Column " + e.Column.ColumnName + " changing to value: " +
e.ProposedValue.ToString( ) + ". Allow change?",
"ColumnChanging Event",MessageBoxButtons.YesNo);
if (res == DialogResult.No)
{
e.ProposedValue = e.Row[e.Column.ColumnName];
}
}
}
The one thing to be aware of with this code is that even though setting the ProposedValue
back to the original value prevents the value from changing, the row will still be marked as modified, and the row will be updated in the database if you use it for an update with a data adapter. You could work around this by calling AcceptChanges
on the row, but then if any other columns have been changed to new values in that row, their changes won’t get sent to the database with a data adapter update. This is only usually a problem in a data-bound scenario, and in that case there are events on the data-bound control that you can usually handle to get into the loop sooner to cancel the change. However, this is something to keep in mind if you are trying to cancel a change through the DataTable
events.
You could also throw an exception from within the ColumnChanging
event handler. This would effectively abort the event processing, and the value would get changed back to its original value. However, throwing an exception has a lot of overhead with it, and the code that is causing the change better have an exception handler to deal with that scenario. In the case of a DataGrid
control, the Changing
event handler will be called again as the value changes back to the original, so the original value better pass your validation logic or you could end up with your program blowing up from unhandled exceptions. In general, you should try to avoid the approach of throwing an exception.
There are three events on the SqlDataAdapter
class to be aware of as well. The FillError
event will be fired if an error occurs while filling a data set with the adapter, which gives you insight into what went wrong, and you can make programmatic decisions based on the error and the values that caused it. The RowUpdating
and RowUpdated
events work like those described for the DataTable
, except these fire as each row is updated (UPDATE, INSERT, or DELETE query) in the database. The event arguments let you determine the cause of the error and tell the adapter whether to continue updating other rows or to stop update processing.
I have covered the use of data sets pretty extensively in this appendix because they are the richest and most often used relational data container in both Windows and Web client and middle-tier applications. There is one other important data access class that was briefly described earlier: the data reader. A data reader is a fast, forward-only, read-only cursor into a returned set of data from a database. Each managed provider is responsible for including a provider-specific data reader class. For SQL Server, it is named, not surprisingly, SqlDataReader
.
A SqlDataReader
lets you execute a query and quickly iterate through each row of the returned results to perform processing on those rows. Note that I didn’t say the data reader is like a cursor into the database itself. You do need to maintain an open connection associated with the data reader as long as you are iterating through the results. But the data reader will buffer the results of the query on the client side, and you cannot modify the underlying data. All you can do is read the rows returned one at a time and act upon them.
When you are done using the reader, you need to make sure the connection gets closed as you do for any other query. It is not uncommon for a data access layer to pass a data reader back as the return value from some data access method.
public static SqlDataReader GetCustomers( )
{
SqlConnection conn = new SqlConnection(
"server=localhost;database=Northwind;trusted_connection=true");
SqlCommand cmd = new SqlCommand(
"SELECT CustomerID, CompanyName, ContactName FROM Customers",
conn);
conn.Open( );
return cmd.ExecuteReader(CommandBehavior.CloseConnection);
}
This code allows a client to take that reader, iterate through the results quickly, and do with them what it will. But if you take a look at the code, you will notice that in order to do this, you need to open the connection to the database and leave it open so the client can iterate through the contents of the data reader. However, the connection is declared and opened down in the data access layer, and is not directly accessible to the client code to close the connection when it is done with the reader. So how can the client make sure the connection gets closed when it is done using the reader? You will see in this code that a CommandBehavior
enumerated value of CloseConnection
was passed to the ExecuteReader
method on the command. Thus, if the client calls the Close
method on the data reader itself, the underlying connection will be closed as well. Additionally, if the client simply iterates through all of the rows in the reader, the connection will be closed when the last row is read.
Returning a data reader is significantly faster than returning a data set, because you can’t directly operate on the contents of a data reader; all you can do is use it to quickly pull data out of the reader and do something with it. Often the thing you want to do with the data is use it to construct and populate some object model, which is what the data set and data adapter have to do at the time you fill the data set. That is why the data set takes a lot longer to fill than a data reader does to return. With the data set, you have a full object model in memory that you have read/write and random access into. To achieve the same thing with a data reader, you will have to pull the data out of the rows from the data reader and stuff them into your own object model before you can do something meaningful with them.
In larger-scale, object-oriented systems, this may be exactly what you want to do in the first place. Typed data sets (covered in Chapter 2) make a great choice for fairly lightweight, easy to construct and maintain data transfer objects that you can pass around the middle tier for processing or pass to the client for presentation. However, often your middle-tier objects need to not only encapsulate data—they need to encapsulate behavior as well. In that case, you will probably be designing your own object model, where many of the objects might encapsulate state that is stored in persistent form in the database. The data reader works great in this case for quickly reading data into an object or set of objects resulting from a retrieval query against the database.
For example, let’s say you have a Customer
class defined in your business object model that encapsulates the state and behavior associated with customers. When you want to act on a collection of customers, you retrieve the data from the database like the previous code, getting back a data reader that lets you get the data associated with the customers. Now you need to pull that data into your customer objects.
The data reader exposes an interface similar to other reader classes in the .NET Framework. A reader acts like a cursor into a collection of items. The cursor is initially positioned just before the first item. To move the cursor to the next item, you call the Read
method on the reader. The Read
method will return false
when there are no more items to be read. Once the cursor is positioned on an item, you can access the contents of that item through whatever access methods and properties the specific type of reader exposes. In the case of a SqlDataReader
, the items are rows, and you can access the row contents through a combination of an indexer, the Item
property, or Get
XXX
methods, where XXX
is the type of the column you are requesting. The following code shows both the use of indexers and the GetString
method.
private void OnGetCustomers(object sender, EventArgs e)
{
// Create an empty collection of customer objects
BindingList<Customer> custs = new BindingList<Customer>( );
// Call the data access method that returns a reader
SqlDataReader reader = CustomersDataAccess.GetCustomers( );
// Loop through each row of data
while (reader.Read( ))
{
// Create a new customer object
Customer c = new Customer( );
// Extract the values from the columns
c.Id = (string)reader["CustomerID"];
c.Name = (string)reader["CompanyName"];
c.Contact = reader.GetString(reader.GetOrdinal("ContactName"));
// Add the object to the collection
custs.Add(c);
}
// Data bind the results to the grid
m_CustomersGrid.DataSource = custs;
}
The indexer on the data reader returns an object reference, so you will need to cast the value to the expected type of the column you are retrieving. The indexer is overloaded so that you can pass it an integer index into the row representing the position of the column in the returned rows, or you can pass it the name of the column as shown here. Using column names is far easier to read and maintain, even though it is a tiny bit less efficient for retrieval.
The Get
XXX
methods are defined to extract the column value for each of the built-in .NET types. The Get
XXX
methods take a column index, or ordinal, for the position of the column to retrieve, and attempt to retrieve the column’s value as the specified type, performing any conversion if necessary. For example, if you call the GetString
method on a column that actually contains an integer, ToString
will be called on the returned value to get it back as a string. If you try to use a Get
XXX
method that requests an incompatible type with the actual contents of the column, an exception will be thrown. To determine the position of a column based on the column name, this code uses the data reader’s GetOrdinal
method to look up the position of the column based on its name.
You can see that what the code does is to create an instance of your Customer
business class for each row of data in the data reader. It stuffs the values from the reader that it cares about (I simplified the Customer
class for brevity here) into the customer object, and then adds the object to a collection. In this case it just data binds the collection to a data grid, but if you were just data binding, you probably wouldn’t go to this trouble. Presumably you would do this in a middle-tier business service where those Customer
objects were going to be used in the processing of the system and might include behavior, validation, and other capabilities that go beyond what a data set could do for you.
XML data access in .NET is a huge topic unto itself and is mostly separable from relational data access. There are a number of things that make it a difficult topic to cover quickly. For starters, there is all the underlying standards knowledge that is a prerequisite for working with XML, such as the XML standard itself, XPath, XML schema, XSLT, and XQuery. For another thing, there are multiple models for dealing with XML in .NET. Finally, the XML capabilities in .NET Framework have been significantly enhanced in version 2.0.
There is no way to briefly cover all the ways you might program with XML in .NET, so I’ll just focus on those things that will let you understand some of the ways that you might get XML data into your Windows application so you can display the data through Windows Forms controls. There is no way to directly data bind XML data to any of the Windows Forms controls that ship with the .NET Framework. To use XML data for data binding, you will need to either get the data into a data set, or you will need to read the data into custom business objects that are suitable for data binding.
The first thing to get your arms around with XML in .NET is all the different flavors of XML object models that exist. There is the XmlDocument
class, which provides an implementation of the W3C XML Document Object Model (DOM) standard. This class lets you deal with XML in the form of documents that are read into memory in their entirety into a heavyweight object model. There is also the XmlDataDocument
class, which is really just a derived class from XmlDocument
. In addition to providing storage for XML using the DOM, this class encapsulates a data set that will let you deal with a document’s contents as relational data. The XmlDataDocument
will also let you place relational data into the object through a data set, and then you can manipulate the data as XML.
The XmlReader
and XmlWriter
classes provide a very lightweight stream-based approach to reading XML. These classes, and the classes derived from them, can be used with a number of other capabilities in .NET, such as XML serialization, XML data access from relational stores, and raw access to XML streams from disk or over a network. Finally, there is a new object model that was introduced in .NET 1.0 based on the XPathDocument
and XPathNavigator
classes. These classes form the basis for the preferred method of working with XML in .NET 2.0.
The following sections describe how to read XML data into an XmlDataDocument
and access it in relational form through its encapsulated data set, and then discuss loading data into an XPathDocument
and how to query and navigate data within that document. Then you’ll have enough data access tools in your belt to understand all the data access that is done in support of the samples in this book, as well as enough background material to get data in many forms into your applications for prototyping and developing your Windows Forms applications.
The XmlDataDocument
class bridges the world of relational and hierarchical data. The class is derived from XmlDocument
, so it “is an” XmlDocument
in the purest sense of object-oriented inheritance (where “is an” is used to describe objects related through inheritance), and thus exposes all the capabilities of the XmlDocument
class for storing and manipulating data as XML. It also encapsulates a DataSet
as a property, which lets you access all or part of the contents of the XmlDataDocument
as relational data.
There are two common ways to use an XmlDataDocument
. The first is to load XML into the document, and then access its DataSet
property for data binding, iterating through the contents with the DataSet
object model, or synchronizing the contents with a database. The second is that you can take a DataSet
that already contains data, construct an XmlDataDocument
from it, and use XML processing or constructs to program against the data (that is, perform an XPath query against the data).
In order for loading XML into an XmlDataDocument
and accessing it through its DataSet
property to work, you have to set up the data set schema within the XmlDataDocument
before you load the XML. The XmlDataDocument
won’t infer a data set schema when reading in XML from a file or string the way that the DataSet.ReadXml
method does. However, if you first supply its contained data set with a schema and then read in XML, the XmlDataDocument
will then be able to associate XML elements and attributes that it reads in with the data set schema, and then that data can be accessed through the DataSet
property on the XmlDataDocument
.
Take a look at the XML in Listing D.10.
LISTING D.10: Customer XML Data
<?xml version="1.0" encoding="utf-8"?>
<Customers xmlns="urn:AW-Windows-Forms-Data-SimpleCustomers.xsd">
<Customer>
<Name>Fred Smith</Name>
<Email>[email protected]</Email>
<Address>
<Street>123 Nowhere St.</Street>
<City>Middletown</City>
<State>OH</State>
<Zip>54321</Zip>
</Address>
</Customer>
<Customer>
<Name>Edith Jones</Name>
<Email>[email protected]</Email>
<Address>
<Street>939 TakeMeBack Road</Street>
<City>Fern</City>
<State>TX</State>
<Zip>86950</Zip>
</Address>
</Customer>
</Customers>
You can see that this data can be viewed as containing two kinds of data, Customers and Addresses, and that there is a parent-child relationship between the two. In a data set, this would be represented by two data tables and a data relation between the two. To load this data into an XmlDataDocument
and access it through the DataSet
property, you first have to tell the XmlDataDocument
what data set schema to use when reading in the XML. There are a few ways you could accomplish this. One would be to access the DataSet
property and add tables and relations to it programmatically, as shown earlier in the “Creating a Data Set Programmatically” section. Alternatively, you could construct an empty XmlDataDocument
, and then read the XML data in through the DataSet
property using the ReadXml
method. Another approach would be to construct an empty data set, initialize its schema, and provide that to the XmlDataDocument
constructor to set up the schema of the encapsulated data set, as shown here:
private void OnLoadXmlDataDoc(object sender, EventArgs e)
{
DataSet custData = new DataSet( );
custData.ReadXmlSchema(@"....SimpleCustomers.xsd");
XmlDataDocument dataDoc = new XmlDataDocument(custData);
dataDoc.Load(@"....SimpleCustomers.xml");
DataRow row = custData.Tables["Customer"].NewRow( );
row["Name"] = "FooFooFoo";
custData.Tables["Customer"].Rows.Add(row);
m_ResultsGrid.DataSource = dataDoc.DataSet;
}
In this code, custData
is a data set that is loaded with an XML schema that matches the XML shown earlier. The custData
data set is passed into the constructor for the XmlDataDocument
, and it will encapsulate the reference to this data set instead of creating its own. Once it has a data set with the appropriate schema, you can load the XML into the XmlDataDocument
, and the data set will be populated with the XML from the document loaded that matches the schema. The SimpleCustomers.xsd schema was created by loading the XML from Listing D.10 into the Visual Studio editor (minus the xmlns namespace declaration) and selecting Generate Schema from the XML menu.
An even simpler approach is to just read the XML in through the DataSet
property:
private void OnLoadXmlDataDoc(object sender, System.EventArgs e)
{
XmlDataDocument dataDoc = new XmlDataDocument( );
dataDoc.DataSet.ReadXml("..\..\SimpleCustomers.xml");
m_dgResults.DataSource = dataDoc.DataSet;
}
This uses the default XmlDataDocument
constructor, which doesn’t take a data set as a parameter. It will create an empty data set internally, and then calls ReadXml
on the contained data set, passing in the path to the XML data file. The contained data set will then infer the schema from the XML as it is read in as before. Once it is read into the XmlDataDocument
, it can be treated as an XML node set and accessed through the XmlDocument
base class methods as well as those of the contained data set.
Now let’s tackle the angle of loading an existing data set containing data into an XmlDataDocument
for the purposes of navigating the data as XML. You might want to do this if you have a complex data set and want to perform selections or queries across multiple tables in the data set, or if the data is hierarchical and navigating with the XML object model makes more sense for the situation. In this case, you again use the fact that the XmlDataDocument
constructor can take a data set as a parameter, and it will hold a reference to that data set internally instead of constructing its own. The following code demonstrates creating a data set from a relational source, then constructing an XmlDataDocument
and using the XML object model to perform operations on the data.
private void OnGetData(object sender, EventArgs e)
{
string connStr =
"server=localhost;database=Northwind;trusted_connection=true";
SqlDataAdapter adapter = new SqlDataAdapter(
"SELECT TOP 10 * FROM Customers",
new SqlConnection(connStr));
DataSet custData = new DataSet( );
adapter.Fill(custData, "Customers");
XmlDataDocument dataDoc = new XmlDataDocument(custData);
XmlNodeList custNames =
dataDoc.SelectNodes("//Customers/CompanyName");
string names = string.Empty;
foreach (XmlNode node in custNames)
{
names += node.InnerText + "
";
}
MessageBox.Show(names);
}
This code first retrieves the customer data into a data set, as shown earlier in the appendix. If you were really retrieving data solely for the purpose of manipulating it as XML, you would be better off using the XML query capability of SQL Server, returning the results as XML, and stuffing the results into an XPathDocument
.
Once that data set is populated with the appropriate schema and the data, that is passed to the constructor of the XmlDataDocument
. An XPath query is performed against the XML document using the SelectNodes
method, and this method returns a list of XML nodes that then iterate through for constructing a simple display of the customer names.
An important feature to realize about an XmlDataDocument
is that it can load and hold a lot of XML content that isn’t exposed through the DataSet
property, and that additional content can be accessed through the normal document object model of the base XmlDocument
class. So you could load a document containing a set of data that matches a data set schema that you want to expose for data binding, and the document could also contain a bunch of other nodes that don’t match the schema. The parts of the XML document that match the data set schema would be added to the data set, but the parts of the XML document that don’t match would simply be contained within the XmlDocument
base class’ object model and could be accessed through normal XML navigation of the document. So if you need to jump back and forth between dealing with a set of data as relational and dealing with it as an XML object hierarchy, the XmlDataDocument
is one of the first places to look.
As the name implies, an XPathDocument
is an object model that is based on the hierarchical model exposed by the XPath specification. This is basically a new object model for storing XML content that was introduced in .NET 1.0 as a read-only store that is lighter weight and has higher performance than the XmlDocument
DOM implementation. This class and its related classes all reside in the System.Xml.XPath
namespace.
You generally don’t work directly against the XPathDocument
except for a few common operations. The first thing you will want to do is load data into the document. The data can come from a number of sources, including a file on disk, a network stream, a Web services call, or a database query. Once you have loaded the data into the document, the next step is to get an XPathNavigator
for the document, which is the primary API for accessing the data within an XML document. You can also obtain XPathNavigator
objects for XmlDocuments
and XmlDataDocuments
so that you can use a consistent programming model for XML across all three document types. Additionally, with XmlDocument
and XmlDataDocument
objects, you can edit the contained data directly through the XPathNavigator
.
Other than those operations, most of what you will care about when working with an XPathDocument
will involve programming against the XPathNavigator
that you use to perform queries against the document and to navigate through sets of nodes.
There are two common ways of getting XML data into an XPathDocument
. The first is from a file, which is a simple matter of passing the file path or URL to the constructor of the XPathDocument
class:
XPathDocument doc = new XPathDocument("doc.xml");
The constructor can also take a stream, an XmlReader
, or a TextReader
. Whichever you use, the contents of the document will be loaded into memory and parsed into the underlying object model of the XPathDocument
.
To get the data out of SQL Server, you can issue a FOR XML query, and load the XPathDocument
with the XmlReader
returned from a call to ExecuteXmlReader
on the SqlCommand
object.
private void OnLoadFromDB(object sender, System.EventArgs e)
{
SqlConnection conn = new SqlConnection(
"server=localhost;database=Northwind;trusted_connection=true");
SqlCommand cmd = new SqlCommand(
"SELECT * FROM Customers FOR XML AUTO, Elements", conn);
try
{
conn.Open( );
XmlReader reader = cmd.ExecuteXmlReader( );
m_XPathDoc = new XPathDocument(reader);
Once you have data in memory in an XML document of some form (XmlDocument, XmlDataDocument
, or XPathDocument
), you are likely to want to query it to select some subset of the nodes contained in the document based on some criteria you want to match against. For example, you may want to select all Customers from the state of California, get all sales data for the last six months, or look up the airspeed velocity of a laden swallow. Whatever you need to look up, there are easy and powerful ways to perform queries against XML content in .NET.
To perform a query against XML data, you have to:
1. Load the data into a document object.
2. Select a set of nodes in that document by specifying the query (in XPath).
3. Iterate through the results.
Let’s just focus on the one query method that all of the document types support: XPathNavigator queries. The XPathNavigator
class is the front end to the underlying query engine for any of the XML document types in .NET. In the last section you saw an example where the SelectNodes
method of the XmlDocument
was called. That is a streamlined method for node selection that really just uses an XPathNavigator
under the covers. The XPathNavigator
uses a cursor-style navigation through a node set that you can use to iterate through the document. It also exposes a number of selection methods that let you execute queries against the contents of the document or node it is pointing to. This section just focuses on querying; the next goes into a little more about navigation.
To get an XPathNavigator
for any of the XML document types, you call CreateNavigator
against the document or one of its nodes. CreateNavigator
passes back an instance of an XPathNavigator
with the cursor positioned on the node from which it was created. Once you have the navigator, you call one of the query methods shown in Table D.5 to obtain an XPathNodeIterator
containing the matching nodes.
The XPathNodeIterator
class follows the pattern of other iterators in the .NET Framework. It starts out positioned just before the first item in the collection it contains. You repeatedly call the MoveNext
method, usually in a while
loop, and inspect the Boolean return value. If the value is true
, then there was another node to move to and the iterator will be positioned on that node. If it returns false
, there are no more nodes to iterate through. The following code demonstrates the process of loading a document, performing a query, and iterating through the results.
private void NodeIteration( )
{
// Load the document
XPathDocument doc = new XPathDocument("Customers.xml");
// Get the navigator
XPathNavigator nav = doc.CreateNavigator( );
// Perform the query
XPathNodeIterator iter = nav.Select("//CompanyName");
// Iterate through the results
while (iter.MoveNext( ))
{
Console.WriteLine(iter.Current.Value);
}
}
The Current
property on the iterator returns a reference to an XPathNavigator
positioned on the current node, which you can then use to access the node contents or perform navigation or further queries based on that node. This example accesses the Value
property of the current node, which in the case of an element or attribute is just the contained text content. There is also a Name
property, which returns the name of the node.
The complexity of the results returned depends on the complexity of the query that you issue. For something simple like this query that just returns a set of elements that contain text nodes with the values you are after, the code is pretty straightforward. However, the results of an XPath query could return any type of node, so you will often need to have some conditional logic that checks the node type of each node, as it iterates through them, and modifies what it does based on the node type. The node type is exposed through the NodeType
property on the XPathNavigator
, and it returns an enumeration of type XPathNodeType
.
Each of the different XML document types has a variety of specialized navigation methods that depend on the specific object model that they expose. However, all of the XML document types let you obtain an XPathNavigator
for the document, which allows you to query and navigate those documents in a consistent way using the preferred object model for working with XML in .NET. In the previous section, you saw how to perform queries against a document, returning an iterator that you could use to step through each of the nodes that matched the query and perform processing on it. To work with the node, you obtained an XPathNavigator
reference to the node from the iterator’s Current
property.
Once you have an XPathNavigator
to a particular node, you will often need to perform some navigation through the object model based on your current position within it. For example, if you have selected a set of Order elements, you may need to navigate through the attributes on that element to extract their values. Or perhaps you will perform a query to obtain an element representing a Customer, and then want to navigate all the child elements to extract their values without performing individual queries for each element.
The XPathNavigator
exposes a set of MoveTo
XXX
methods that let you navigate the object model it exposes, where the XXX
portion of the method name indicates the node to which it moves the navigator’s cursor (see Table D.6). Each of the MoveTo
XXX
methods returns a Boolean indicating whether the move was successful, meaning there was a node in the intended location to move to. Note that attributes and namespaces are treated as special kinds of nodes because they can only be contained within the opening tag of an element node. To navigate attributes or namespaces, you use the separate set of MoveTo
XXX
methods shown in Table D.7.
In this appendix, I have given you a high-speed introduction to data access with ADO.NET, both for relational data and for XML data. You learned about the various capabilities of the DataSet
class for containing relational data, which lets you retrieve data into the data set, make modifications to the data, and save that data back to a data source. The data set is the richest relational object for containing and manipulating data in the client, and it supports the widest range of data-binding scenarios. You saw how to get data into a custom object collection with which you could bind Windows Forms controls, and some of the different ways of interacting with both the client-side data and with the data sources from which you obtain data. The appendix stepped through some basics of dealing with data in XML form, and it showed how to load data into a document, work with that data as both hierarchical and relational data with an XmlDataDocument
, and navigate and query the data using the XPathNavigator
class.
Data access is a deep and important topic that is covered in much more detail by a number of other books. The intent of this appendix was simply to provide you enough information to be able to get data into your client applications for prototyping and to understand where the data was coming from for the chapters in this book dealing with manipulating and presenting that data in Windows Forms applications. By now your head is probably spinning if you haven’t been exposed to data access in .NET before. If you have, then hopefully this provided a good refresher and reference for the basics while you are working with the rest of the book.
13.59.187.201