Many real-world applications need to interact with a database. The .NET Framework provides a rich set of objects to manage database interaction; these classes are collectively referred to as ADO.NET.
ADO.NET looks very similar to ADO, its predecessor. The key difference is that ADO.NET is a disconnected data architecture. In a disconnected architecture, data is retrieved from a database and cached on your local machine. You manipulate the data on your local computer and connect to the database only when you wish to alter records or acquire new data.
There are significant advantages to disconnecting your data architecture from your database. The biggest advantage is that you avoid many of the problems associated with connected data objects which do not scale very well. Database connections are resource-intensive, and it is difficult to have thousands (or hundreds of thousands) of simultaneous continuous connections. A disconnected architecture is resource-frugal.
ADO.NET connects to the database to retrieve data, and connects again to update data when you’ve made changes. Most applications spend most of their time simply reading through data and displaying it; ADO.NET provides a disconnected subset of the data for your use while reading and displaying.
Disconnected data objects work in a mode similar to that of the Web. All web sessions are disconnected, and state is not preserved between web page requests. A disconnected data architecture makes for a cleaner marriage with the Web.
Although one can certainly write an entire book on relational databases, and another on SQL, the essentials of these technologies are not hard to understand. A database is a repository of data. A relational database organizes your data into tables. Consider the Northwind database provided with Microsoft SQL Server 7, SQL Server 2000, and all versions of Microsoft Access.
The Northwind database describes a fictional company buying and selling food products. The data for Northwind is divided into 13 tables, including Customers, Employees, Orders, Order Details, Products, and so forth.
Every table in a relational database is organized into rows, where
each row represents a single record. The rows are organized into
columns. All the rows in a table have the same column structure. For
example, the Orders table has these columns:
OrderID
, CustomerID
,
EmployeeID
, OrderDate
, etc.
For any given order, you need to know the customer’s name,
address, contact name, and so forth. You could store that information
with each order, but that would be very inefficient. Instead, we use
a second table called Customers, in which each row represents a
single customer. In the Customers table is a column for the
CustomerID
. Each customer has a unique ID, and
that field is marked as the
primary key for that
table. A primary key is the column or combination of columns that
uniquely identifies a record in a given table.
The Orders table uses the CustomerID
as
a
foreign key. A foreign key is a column (or
combination of columns) that is a primary (or otherwise unique) key
from a different table. The Orders table uses the
CustomerID
, which is the primary key used in the
Customers table, to identify which customer has placed the order. To
determine the address for the order, you can use the
CustomerID
to look up the customer record in the
Customers table.
This use of foreign keys is particularly helpful in representing one-to-many or many-to-one relationships between tables. By separating information into tables that are linked by foreign keys, you avoid having to repeat information in records. A single customer, for example, can have multiple orders, but it is inefficient to place the same customer information (name, phone number, credit limit, and so on) in every order record. The process of removing redundant information from your records and shifting it to separate tables is called normalization .
Normalization not only makes your use of the database more efficient,
but also it reduces the likelihood of data corruption. If you kept
the customer’s name both in the Customers table and also in the
Orders table, you would run the risk that a change in one table might
not be reflected in the other. Thus, if you changed the
customer’s address in the Customers table, that change might
not be reflected in every row in the Orders table (and a lot of work
would be necessary to make sure that it was reflected). By keeping
only the CustomerID
in Orders, you are free to
change the address in Customers, and the change is automatically
reflected for each order.
Just as C# programmers want the compiler to catch bugs at compile
time rather than at runtime, database programmers want the database
to help them avoid data corruption. The
compiler helps avoid bugs in C# by
enforcing the rules of the language; for example, you can’t use
a variable you’ve not defined. SQL Server and other modern
relational databases avoid bugs by enforcing constraints that you
request. For example, the Customers database marks the
CustomerID
as a primary key. This creates a
primary key constraint in the database, which ensures that each
CustomerID
is unique. If you were to enter a
customer named Liberty Associates, Inc. with the
CustomerID
of LIBE, and then tried to add Liberty
Mutual Funds with a CustomerID
of LIBE, the
database would reject the second record because of the primary key
constraint.
Relational databases use Declarative Referential
Integrity
(DRI) to
establish constraints on the relationships among the various tables.
For example, you might declare a constraint on the Orders table that
dictates that no order can have a CustomerID
unless that CustomerID
represents a valid record
in Customers. This helps you avoid two types of mistakes. First, you
cannot enter a record with an invalid CustomerID
.
Second, you cannot delete a Customer record if that
CustomerID
is used in any order. The integrity of
your data and their relationships are thus protected.
The most popular language for querying and manipulating databases is SQL, usually pronounced “sequel.” SQL is a declarative language, as opposed to a procedural language, and it can take a while to get used to working with a declarative language when you are used to languages such as C#.
The heart of SQL is the query. A query is a statement that returns a set of records from the database.
For example, you might like to see all the
CompanyNames
and CustomerIDs
of
every record in the Customers table where the customer’s
address is in London. To do so you would write:
Select CustomerID, CompanyName from Customers where city = 'London'
This returns the following six records as output:
CustomerID CompanyName ---------- ---------------------------------------- AROUT Around the Horn BSBEV B's Beverages CONSH Consolidated Holdings EASTC Eastern Connection NORTS North/South SEVES Seven Seas Imports
SQL is capable of much more powerful queries. For example, suppose
the Northwinds manager would like to know what products were
purchased in July of 1996 by the customer “Vins et alcools
Chevalier.” This turns out to be somewhat complicated. The
Order Details table knows the ProductID
for all
the products in any given order. The Orders table knows which
CustomerID
s are associated with an order. The
Customers table knows the CustomerID
for a
customer, and the Products table knows the Product name for the
ProductID
. How do you tie all this together?
Here’s the query:
select o.OrderID, productName from [Order Details] od join orders o on o.OrderID = od.OrderID join products p on p.ProductID = od.ProductID join customers c on o.CustomerID = c.CustomerID where c.CompanyName = 'Vins et alcools Chevalier' and orderDate >= '7/1/1996' and orderDate <= '7/31/1996'
This query asks the database to get the OrderID
and the product name from the relevant tables: first look at Order
Details (which we’ve called od
for short),
then join that with the Orders table for every record where the
OrderID
in the Order Details table is the same as
the OrderID
in the Orders table.
When you join two tables you can say either “Get every record
that exists in either table” (this is called an outer
join), or you can say, as I’ve done here, “Get
only those records that exist in both tables” (called an
inner join). That is, an inner join says: get
only the records in Orders that match the records in Order Details
by having the same value in the
OrderID
field (on o.Orderid = od.Orderid
).
SQL joins are inner joins by default. Writing join orders is the same as writing inner join orders.
The SQL statement goes on to ask the database to create an inner join
with Products
, getting every row where the
ProductID
in the Products table is the same as the
ProductID
in the Order Details table.
You then create an inner join with customers for those rows where the
CustomerID
is the same in both the Orders table
and the Customer table.
Finally, you tell the database to constrain the results to only those
rows where the CompanyName
is the one you want,
and the dates are in July.
The collection of constraints finds only three records that match:
OrderID ProductName ----------- ---------------------------------------- 10248 Queso Cabrales 10248 Singaporean Hokkien Fried Mee 10248 Mozzarella di Giovanni
This output shows that there was only one order (10248) where the
customer had the right ID and where the date of the order was July
1996. That order produced three records in the Order Details
table, and using the product IDs in these three
records, we got the product names from the Products table.
You can use SQL not only for searching for and retrieving data, but also for creating, updating, and deleting tables and generally managing and manipulating both the content and the structure of the database.
For a full explanation of SQL and tips on how to put it to best use, I recommend Transact SQL Programming, by Kline, Gould, and Zanevsky (O’Reilly & Associates, 1999).
3.145.17.140