The DataSet class within the System.Data namespace is typically used for the in-memory storage of data within one or more DataTable class members. This data usually represents information retrieved from a database, the focus of this section, or stored within an XML document, the focus of the next section. A DataTable is represented by a collection of DataRow entries divided into a set of one or more DataColumn fields.
The sample database we'll use in this section is the USDA Nutrient Database maintained and freely distributed by the U.S. Department of Agriculture. This version, Release 13 (SR13), contains data on over 6,000 food items, such as cheese, potato chips, soft drinks, and so on. We'll be working with three tables of the relational database.
The food description database table (FOOD_DES) has 6,210 rows of food entries with 11 columns of data. The primary key of the table is the NDB_NO column, which holds a unique number associated with each food entry. A second column of interest to us is DESC, which provides a brief description of the food item. For example, here are the NDB_NO and DESC entries for the first 11 rows of the FOOD_DES database table:
NDB_NO DESC 01001 :: Butter, with salt 01002 :: Butter, whipped, with salt 01003 :: Butter oil, anhydrous 01004 :: Cheese, blue 01005 :: Cheese, brick 01006 :: Cheese, brie 01007 :: Cheese, camembert 01008 :: Cheese, caraway 01009 :: Cheese, cheddar 01010 :: Cheese, cheshire 01011 :: Cheese, colby
A second table in the database—the nutritional data table (NUT_DATA)-contains the nutritional values associated with each food. It has three columns of interest to us: (1) NDB_NO, which identifies the food item; (2) NUTR_NO, which uniquely identifies each nutritional element, such as protein, fat, carbohydrate, and so on; and (3) NUTR_VAL, which holds the nutritional value. Here's an example from the NUT_DATA table:
NDB_NO NUTR_NO NUTR_VAL 01001 :: 203 :: 0.85 01001 :: 204 :: 81.11 01001 :: 205 :: 0.06 01001 :: 207 :: 2.11 01001 :: 208 :: 717 01001 :: 255 :: 15.87 01001 :: 268 :: 3000 01001 :: 291 :: 0 01001 :: 301 :: 24 01001 :: 303 :: 0.16 01001 :: 304 :: 2
Each table by itself is incomplete. The food description table provides the name of each food but does not contain the nutritional values associated with the food. Those values are contained in the nutritional data table. The link between the two is the NDB_NO column. To navigate between the two tables, we'll need to define a data relationship between them. Within the .NET framework we do this by adding a DataRelation property to the DataSet.DataRelation defines a relationship between two tables based on a shared column.
Even taken together, these two tables are incomplete. We still have no way to identify the categories of the nutritional values represented by each entry. This information is available in a nutritional definition table (NUTR_DEF), which has two columns that interest us: (1) NUTR_NO, which provides us with the table's primary key; and (2) NUTRDESC, which holds a description of the nutritional element. The following example comes from the NUTR_DEF table:
NUTR_NO NUTR_NO 203 :: Protein 204 :: Total lipid (fat) 205 :: Carbohydrate, by difference 207 :: Ash 208 :: Energy 221 :: Alcohol 255 :: Water 262 :: Caffeine 263 :: Theobromine 268 :: Energy 269 :: Sugars, total
The link between the nutritional data table and the nutritional definition table is the NUTR_NO column. This link is also represented by a DataRelation object.
The complete in-memory representation of the nutritional database—in terms of the values we require—is made up of the three DataTable objects and the two DataRelation objects contained within a DataSet object. These classes and others found within the System.Data namespace make up the ADO.NET architecture.
All interactions with a database are carried out through a data provider. A data provider consists of four primary services, each represented by a class:
Connection, which handles the connection to a specific data source.
Command, which executes a command at the data source. There are select, update, insert, and delete categories of commands.
DataReader, which provides a forward-only, read-only stream of data from the data source.
Currently, the .NET framework provides two data providers: (1) a SQL Server .NET data provider (for Microsoft SQL Server 7.0 or later), defined in the System.Data.SqlClient namespace; and (2) the OLE DB .NET data provider, defined in the System.Data.OleDb namespace, for all other databases, such as Microsoft Access.
In this section we'll work with Microsoft Access and the OLE DB data provider. In Section 7.7 we'll look at using the SQL Server data provider in support of ASP.NET.
Here are the steps necessary, using the OLE DB data provider, first to connect to and select data from a database, and then to fill a DataSet object with the retrieved data:
To add a second or subsequent table, we simply repeat the sequence of steps outlined here. Once within the DataSet object, all data is manipulated through that object.
The Tables property of the DataSet class returns a DataTableCollection object that holds the set of DataTable objects associated with the DataSet. We can index the collection either positionally:
DataTable food_des_tbl = ds.Tables[0];
or with a string representation of the table's name:
displayTableHeader( ds.Tables[ "NUT_DATA" ] );
Several different properties are associated with the DataTable, including TableName, Rows, Columns, and PrimaryKey:
public static void displayTableHeader( DataTable dt ) { Console.WriteLine("The DataTable is named {0}",dt.TableName); DataRowCollection drows = dt.Rows; DataColumnCollection ccols = dt.Columns; Console.WriteLine( "It has {0} rows of {1} columns", drows.Count, ccols.Count ); Console.WriteLine( "The columns are as follows: " ); foreach ( DataColumn dc in ccols ) displayColumn( dc ); DataColumn [] keys = dt.PrimaryKey; Console.WriteLine("It has {0} primary keys",keys.Length); }
A primary key is a column that is guaranteed to contain a unique value for each record entry and that serves as the identity of the record within the table. (A primary key can also be a set of columns. This is why the PrimaryKey property returns an array of DataColumn objects rather than just a single DataColumn object.)
The Rows property of the DataTable returns a DataRowCollection object, which holds the set of rows associated with the table. A row is represented by a DataRow class object. The rows are indexed beginning at 0. To retrieve the first row, we write
DataRow dr = drows[ 0 ]; // first row
To iterate across the entire DataRowCollection, we can use either a foreach statement:
foreach( DataRow row in dt.Rows )
or a for loop, incrementing the index with each iteration.
Each DataRow object contains data for one record of the table divided into one or more columns. The ItemArray property retrieves the column values as an array of type object with the first column at index 0, the second at 1, and so on:
object [] fieldValues = dr.ItemArray;
To access the value of an individual column, we can index by position, beginning with 0 for the first column:
int nutritionID = dr[ 0 ];
by a string representing the column name:
string nutritionDesc = dr["NUTRDESC" ];
or by the DataColumn object representing the column. That is, the DataRow class has defined a DataColumn indexer. Given a DataColumn object, the DataColumn indexer returns the associated value of the column as a generic value of type object.
public object this[ DataColumn index ] { get; set; }
For example, here is how we might use this indexer:
public static void displayTable( DataTable dt ) { displayTableHeader( dt ); foreach( DataRow row in dt.Rows) { foreach ( DataColumn col in dt.Columns) Console.Write( "{0} ", row[ col ] ); Console.WriteLine(); } }
When this function is past the nutrition definition table NUTR_DEF, for example, it prints the contents of its four columns:
203 g PROCNT Protein 204 g FAT Total lipid (fat) 205 g CHOCDF Carbohydrate, by difference 207 g ASH Ash 208 kcal ENERC_KCAL Energy 221 g ALC Alcohol 255 g WATER Water 262 mg CAFFN Caffeine 263 mg THEBRN Theobromine 268 kj ENERC_KJ Energy 269 g SUGAR Sugars, total 291 g FIBTG Fiber, total dietary 301 mg CA Calcium, Ca
The DataColumn class represents a column in the DataTable. It does not itself hold data; rather it holds information describing the data that each row contains within that column. We retrieve a column through either a string or a numeric index, depending on whether we wish to retrieve the column by its name or ordinal position:
DataColumn primaryKey = ds.Tables[0].Columns[ "NDB_NO" ]; primaryKey.AllowDBNull = false; primaryKey.Unique = true;
AllowDBNull specifies whether the user can enter a null value for that column. For a primary key, of course, a null value would be terrible. Unique stipulates that each row must contain a unique value for this column. This is essential for a primary key.
Other properties associated with a DataColumn are DataType, which returns the Type object associated with the stored value; Table, which returns the DataTable object to which the column belongs, and ColumnName.
The information we need is stored in the three separate tables that were introduced in Section 5.10.1: (1) the food description table (FOOD_DESC), (2) the nutritional data table (NUT_DATA), and (3) the nutritional definition table (NUTR_DEF).
Given a food identity value from the FOOD_DES table, we want to retrieve all the entries associated with that value. We do this by defining a DataRelation class object linking the columns of the two tables that hold the food identity value. We'll call the relationship FoodID:
DataRelation dr = new DataRelation("FoodID", ds.Tables["FOOD_DES"].Columns["NDB_NO"], ds.Tables["NUT_DATA"].Columns["NDB_NO"]);
The first column is referred to as the parent. Its value is used to retrieve the matching entries in the second column, which is referred to as the child. We add the relationship to the DataSet:
ds.Relations.Add(dr);
DataRelation provides a collection of links between columns of the various tables of the DataSet. These links allows for retrieval based on a column entry. For example, in the following code sequence we step through the food items in turn. So, for example, we retrieve the entry for Butter, with salt. Now we need all the nutritional values associated with this entry. Rather than explicitly searching each record of the NUT_DATA table for these values, we use the DataRelation object to retrieve an array of all the matching DataRow entries:
foreach ( DataRow dRow in ds.Tables["FOOD_DES"].Rows ) { // we retrieve all the matching entries DataRow [] food_desc = dRow.GetChildRows( dr ); Console.WriteLine( " There are {0} food desc rows", food_desc.Length );
Now that we have the array of DataRow objects, we can process it as we wish. For example, the following code sequence extracts the row values and prints out the string representation for each:
foreach ( DataRow ddr in food_desc ) { object [] row_values = ddr.ItemArray; foreach ( object o in row_values ) Console.Write( "{0} | ", o.ToString() ); }
The following is part of the output generated by this code:
There are 71 food desc rows 01001 | 203 | 0.85 | 16 | 0.074 | 1 | 01001 | 204 | 81.11 | 580 | 0.065 | 1 | 01001 | 205 | 0.06 | 0 | | 4 |
The DataTable supports an overloaded Select() method to retrieve all the DataRow entries that match an expression. Select() returns an array of DataRow objects. If the selection fails, the array is empty. For example, the following is a simple expression that searches the NUTR_NO column for the value 203:
DataRow [] datselect = ds.Tables["NUTR_DEF"].Select( "NUTR_NO = 203" );
Alternatively, we can build the string from an object:
DataRow [] datselect = ds.Tables["NUTR_DEF"].Select( "NUTR_NO = " + key );
where key is an integer representing a primary key, or
DataRow [] datselect = ds.Tables["NUTR_DEF"].Select("NUTR_VAL > " + someValue);
which returns all rows in which the NUTR_VAL column holds a value greater than someValue.
Expressions can also be associated with columns. In the following example found in the NET documentation, we create a calculated column:
ds.Tables["Products"].Columns["tax"].Expression = "UnitPrice * 0.086";
where 0.086 represents a tax rate.
Unlike a calculated column, an aggregate column evaluates an expression based on an entire set of rows. The example in the .NET documentation shows an expression that counts the number of rows contained in the table:
ds.Tables["Orders"].Columns["OrderCount"].Expression = "Count(OrderID)";
where OrderID represents the primary key of the table and is therefore guaranteed to be unique.
18.117.231.15