The Importance of ADO.NET

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

The Importance of ADO.NET

The world of applications has been evolving for some time. Originally, applications were centralized, and they were accessed via dumb terminals. All the code ran on the mainframe, and the information technology (IT) professionals controlled everything.

Along came the PC and everything changed. The computing power moved to the desktop, and people ran spreadsheets, databases, word processors, and the like. However, no PC can be an island for long. Soon, people were tying their PCs together with umbilical cords back to servers and, of course, those mainframes.

Given that PCs were often connected to servers, you saw the rise of client/server computing. Unlike the centralized approach of mainframe/dumb terminal applications, you had a distributed approach in which the data was on the server, and much of the functionality ran on the client.

Next came n-tier development, in which the business rules were placed into centralized objects running on a server. This meant that the client application was used for such tasks as accepting input, viewing data, printing, and so forth. Business rules and data access were encapsulated in shared, reusable components, which accessed the data in a mid-range or mainframe database.

At the next stage of evolution, Web applications became all the rage, and soon enough, development had returned to the centralized model of the mainframe days. The browser was acting as nothing more than a dumb terminal, accepting user input and displaying screens. All the processing occurred under the control of the IT department, although it was often scattered among Web servers, application servers, and database servers.

Although Web applications are still common, many of them are being developed using the n-tier model. More importantly, the rise of a new class of components, called Web Services, is driving changes in how applications are developed. With Web Services, you can use the Internet as a distributed operating system, with components located anywhere in the world, and accessed via HTTP. You will see more about Web services in Chapter 9, “Building Web Services with VB .NET.”

It is into this world that ADO.NET arrives, ready to address the challenge of having components that, although spread across the Internet, still need to pass data to each other. ADO.NET also addresses some scalability concerns. Some parts of ADO.NET are much simpler than their counterparts in previous data access technologies. Some facets of ADO.NET might seem more complex, but this is usually due to a level of indirection aimed at making data access more logical in today's environment.

To better understand ADO.NET, it is useful to understand how it evolved. Prepare for a brief history lesson, and you'll better understand why ADO.NET came to be.

The ADO.NET Family Tree

ADO.NET is the latest in a line of data access technologies from Microsoft. Even though it is an evolutionary step, it is a large evolutionary step forward; it is arguably the largest change in Microsoft's data access since Microsoft's first object model was released ten years ago. ADO.NET represents the first time Microsoft has attempted to separate the concept of accessing the data from the concept of working with the data, as you will see in the following sections.

In the Beginning: Embedded SQL and APIs

At first, applications accessed data by simply embedding SQL statements into the program, along with all the information to get to the database. Most mainframe applications built before the 1990s had direct access to the database built right in. They contained connection information, SQL statements to retrieve and manipulate data, and the code to display that data. These were very tightly coupled applications, relying on the name of the server and database never to change. In addition, any change in the table structure usually required that the client applications be recompiled.

To open up databases to some form of alternative access, database-specific APIs were created, such as DB-Lib and OCI. These APIs were designed to allow developers, almost always using C, to access the data from a Unix or Windows machine. These libraries were typically fast because they were optimized for a specific back-end database. They were typically challenging to use, however; they were low-level APIs, and not object-based as later technologies would be. In addition, because they were database-specific, it became very hard to move from one database to another. To make the move, a large portion of your application had to be rewritten to use a completely different library.

ODBC, DAO, and RDO

When Microsoft released Access 1.0 in the early 1990s, it was its first Windows-based database program. Access contained the JET database engine, which is just Microsoft's desktop database engine (as SQL Server is its server database engine). Microsoft realized that although JET was good for building applications based on data stored on the PC, not all data was stored on a PC. Instead, most companies already had data stored in server-based databases such as SQL Server, Oracle, and DB2. Microsoft, therefore, wanted people not only to have access to this back-end data, but also to be able to use it as easily as they used data stored in JET. To accomplish this, in 1993, Microsoft released Open Database Connectivity, or ODBC.

ODBC, unlike the database-specific APIs, allowed the developer to program in such a way that the physical database became less important than the logical structure of the schema. You could write your SQL statements and, if they were just standard SQL, you could change the back-end database engine without having to rewrite any of your code. This was because ODBC was a layer of indirection between your application and the database. You pointed to something called a data source name, or DSN. The DSN determined what database you were accessing (by loading the appropriate ODBC driver) and the location of the database.

ODBC was designed to let you not only access back-end data, but to make that back-end data look like local data. The remote tables looked just like local JET tables to the point that you could perform SQL joins between local JET tables and remote tables. Or, you could perform joins between tables in different back-end databases, such as joining SQL Server tables to Oracle tables, using ODBC and JET as the intermediate layer.

In JET, Microsoft also included an object model called Data Access Objects, or DAO. DAO contained a set of classes that allowed you to easily access databases, tables, records, and other schema objects. For any table, you could examine the columns, including the column names, types, and so on. You could use DAO to programmatically access data and perform data inserts, updates, and deletions. You could issue data definition language (DDL) commands to create or drop tables, create or drop indexes, add constraints, and so forth.

JET was a boon to developers because it presented the developer with an easy-to-use, rich set of objects with which to access and manipulate data. Developers using Access or Visual Basic could use DAO to work with any data in a JET database, including those tables linked into the JET database via ODBC. This meant that the developer had easy access to the data regardless of the actual database in which it was stored.

The problem with DAO, however, was that it was built on top of JET. What if all your data was in SQL Server, or Oracle, or some other back-end database? You could link all the tables into a JET database via ODBC and use DAO, but then you had JET sitting between you and your data. JET was relatively fat for its time, taking up about 1MB of memory in an era when machines typically had 4–16MB RAM. In addition, JET included a query optimizer, which was already available in the back-end databases. All queries to the back-end database had to pass through this fat JET layer, and each query got optimized by JET and again by the back-end. The result was that going through DAO and JET could be slow.

An alternative to using DAO was to program directly against ODBC. This was typically a much faster way of accessing the data, but it was much more challenging. ODBC was not object-based, as was DAO. Therefore, programming against the ODBC API required significantly more code, and such items as database cursors, and even memory, had to be managed by the developer.

For a long time, developers accessing back-end data had these two options: use DAO, which was easy but (relatively) slow, or program against the ODBC API, which was fast but difficult.

Microsoft realized that there must be a better way, so when Visual Basic 4.0 was released in 1995, Microsoft also released a new object model, built on top of ODBC, called Remote Data Objects or RDO. RDO was very similar in structure to DAO, but most of the object names were just different enough to require DAO code to be rewritten. Still, the basic concepts were the same, as far as connecting to a database, and then having access to a collection of the tables, and then a collection of the columns, and so on. The actual accessing of the data was similar in both DAO and RDO, but RDO didn't go through the JET database engine, so you had nearly-ODBC speeds with the ease of programming that had previously existed only in DAO.

Microsoft had DAO and RDO, both based on ODBC. These were the two object models that developers were using with great success. However, Microsoft saw a shift coming in the market, and decided to address it with a new underlying technology: OLE DB.

OLE DB and ADO

Microsoft had built ODBC with a particular bias: It assumed a relational database. Understand that although there were ODBC drivers available for Excel files, text files, and other nonrelational sources, these drivers actually made the sources look relational. DAO and RDO were designed with a relational structure in mind; when you connected to a database, it automatically populated a collection of available tables. Each table had a collection of available columns that was automatically populated. These collections and objects were available because the back-end was assumed to be a relational database.

Microsoft knew that not all corporate data was stored in relational databases. For example, corporate data could be stored in Exchange or the Active Directory, neither of which uses a relational engine. Data could be stored in Excel, or text files, or a variety of other formats. Microsoft wanted to design a way to read data, both relational and nonrelational, using one consistent methodology. OLE DB was born.

OLE DB is a low-level API, like ODBC. However, OLE DB does not assume a relational back-end. Instead, OLE DB uses its own drivers, called providers, to access the data. Native OLE DB providers allow you to access relational or nonrelational data using the same API.

Although OLE DB is a low-level API, Microsoft had learned from its ODBC and RDO experience, so OLE DB shipped with ActiveX Data Objects, or ADO. ADO is the easy-to-use object model built on top of OLE DB. Using ADO, you can make a connection to the database and access the records. ADO allows you to use a single set of objects to access the back-end data, regardless of the format of the data.

Unlike DAO and RDO, however, the ADO object model is quite small. DAO and RDO both contain dozens of objects, whereas ADO contains fewer than ten major objects. DAO and RDO can automatically fill out collections of tables, columns, and relationships. ADO has no such objects because ADO, being based on OLE DB, cannot automatically assume a relational database as its source. Therefore, with ADO, you have to know how to query the system tables in order to examine the structure.

Note

Microsoft actually released a companion object model to ADO, called ADOX. ADOX does assume a relational back-end, and has the necessary objects to work with tables, columns, relationships, and the like. Despite this, few people ever use ADOX.

Given the fact that OLE DB does not assume a relational back-end, ADO has a small number of objects, and therefore has a small memory footprint. It is also quite fast; OLE DB providers tend to outperform ODBC drivers for a given database.

ADO, in some of its more recent incarnations, also adds some XML features. For example, you can use ADO to create disconnected recordsets. These recordsets are stored on disk, and you can open them with ADO without the need to first connect to a database. These recordsets can be persisted in either a binary, advanced data tablegram (ADTG) format or in XML format. These disconnected recordsets can be used to boost application performance because with them, you do not have to make an expensive database connection and then retrieve the records from the database and stream them across the wire. It is even possible to make inserts, updates, and deletes to these disconnected recordsets and later synchronize the changes with the underlying database.

Despite this capability to create disconnected recordsets, ADO typically assumes a connected environment, where you stream records from the database to the client, using one of several cursor types. As different as ADO is from DAO and RDO, all three are built on a connected model and combine both accessing data with working with data. The distinction is important, and will be examined in the next section.

In addition, data is typically retrieved and stored in memory in such a way that it is easy to access and manipulate when it is in-process to the client application. However, if the data is retrieved into a middle-tier COM component and needs to be transferred to a client to be displayed, it is a constant struggle to determine the best way to pass the data from one tier to another. Cross-process data passing is always a challenge with n-tier applications. ADO.NET attempts to address the question of separating data access from data viewing and manipulation, and also how to pass data across tiers. You can see the evolutionary timeline of Microsoft's data access technologies in Figure 6.1.

Figure 6.1. The timeline of Microsoft's data access technologies.

Why ADO.NET Exists

Just as Microsoft saw a need to move from ODBC to OLE DB, it saw the need to move beyond OLE DB. Although there is nothing inherently wrong with either ODBC or OLE DB, Microsoft saw that much of today's development efforts were spent building n-tier systems. Not only were these n-tier applications built inside the company, but also these applications might want to reuse a specific type of component—Web services—that was exposed on the Web via HTTP.

Thanks to an initiative called the Simple Object Access Protocol, or SOAP, it is possible to call components anywhere on the Web via HTTP. SOAP is an XML-based protocol built for sending and receiving information in a distributed environment, enabling remote procedure calls over HTTP. With SOAP and Web services, you have the added advantage of platform independence; the Web service could be written in C# and running on a Windows XP server, whereas the client that calls it could be a Web service written in C++ running on Linux, or could be a Web-enabled cell phone or PDA. The point is, the call is simply an HTTP request, and the results are sent back in an HTTP response, with the actual request and response calls containing XML.

Passing data from a Web service to some form of client shows one reason ADO.NET exists. Passing an ADO Recordset object from a Web service to a client is possible, but there are certain caveats. First, you have to create a static cursor. Then, you have to save the Recordset in XML format and send that back to the client. The default ADTG format of ADO Recordsets is typically not transferable across firewalls or through proxy servers; even if it is, the binary ADTG format will have little meaning to a non-Windows client application. Therefore, ADO.NET, as you will see, speaks XML natively.

A further reason for ADO.NET's existence is the fact that Microsoft wanted to separate data access from working with data. This provides yet another layer of indirection, and allows you to separate the objects that let you work with data from the objects that allow you to access the data. This new layer also makes it quite easy to work with disconnected data as the default, which can lead to huge scalability gains.

Further, ADO.NET exists because of the .NET Framework. In the past, ODBC and OLE DB, along with their associated object models, were independent of any specific tool. With the .NET Framework, Microsoft put ADO.NET into the standard set of services available to any language using the Framework. Now, data access becomes one of the base services provided by the underlying application services infrastructure, ensuring a consistent database access technology across all the .NET languages.

By being built into the .NET Framework, this means that ADO.NET takes advantage of all the services provided by the Framework, such as automatic memory management, garbage collection, and so forth. This makes ADO.NET easy to work with, and means that those of you writing C++ managed code should not have to worry about handling the memory used by ADO.NET.

ADO.NET is different from DAO, RDO, and ADO in that it does not have an underlying API like OLE DB and ODBC. Instead, ADO.NET can use native drivers, called managed providers, written for each database, or it can use a generic OLE DB managed provider that uses OLE DB providers. Because there is an OLE DB provider for ODBC, this means that ADO.NET can use its managed providers, OLE DB, or ODBC. The highest performance comes from the managed providers, as you can imagine. The only managed providers available with the first release of the .NET Framework are for SQL Server 7.0/2000 and OLE DB. Previous versions of SQL Server, or any other database, will need to use the ADO.NET managed provider for OLE DB.

ADO.NET's Disconnected Architecture

Much ado has been made about ADO.NET's disconnected architecture, and it is important. Before diving into this disconnected architecture, however, it is important to understand that ADO.NET can operate in a connected mode as well. ADO.NET uses different objects depending on whether you are operating in a connected or disconnected mode, and this paradigm allows you to more easily identify when you are using a connected versus a disconnected mode. In Microsoft's previous data access technologies, you made a connection to the server and kept it open while you retrieved and manipulated the data. This worked fine as long as the number of clients was small, but it did not tend to scale very well. In addition, you opened a connection to the data and were sometimes able to scroll through the data, depending on the cursor you chose. With some of the cursor options of ADO, you actually read the values for each record from the underlying tables as you accessed that record. That meant a lot of communication back and forth to the database, and also meant that the only way to achieve good performance was to maintain an open connection.

One trick that some people employed to improve performance was to create a disconnected recordset in ADO. This could be accomplished by reading all the records at one time, and storing them in memory at the client, in what was called a static cursor. You could then write the entire recordset to disk to persist it over time. You then used this local copy of the data instead of going back to the database for the data. This approach worked very well for data that did not change often.

ADO.NET stores data in memory in an inherently disconnected fashion. Recall that ADO.NET separates the data access from the data usage, which means that the concepts of moving through data in a fully scrollable fashion, and performing inserts, updates, and deletes, has nothing to do with the physical act of connecting to the database itself. Consider the act of buying and driving an automobile. Every morning, when you need to drive to the office, you get in your car and drive. You do not have to go to the dealer first to get your car. Instead, you got the car one time, and then you took it home, and now you drive it from there. Occasionally, you have to return to the dealer to have maintenance done on the car, but by and large, you use the car disconnected from the dealer from which you bought the car.

In the same vein, ADO.NET allows you to connect to the database and retrieve records, and then disconnect from the database. You continue to work with the records, performing such actions as scrolling through the records, sorting and filtering records, and performing inserts, updates, and deletions.

You might be saying to yourself, “Big deal. I could do all that with ADO's disconnected recordsets.” You are partially correct, but as you will see later, ADO.NET provides a new object, called the DataSet, that allows an unprecedented level of disconnected support. For example, if you were using a disconnected ADO recordset and you inserted a record that violated some foreign key constraint, you would not know about this foreign key violation until you attempted to resynchronize your disconnected recordset with the underlying data source. ADO.NET's DataSet, on the other hand, can enforce referential integrity constraints in the disconnected mode, ensuring that you are following the rules and making valid data changes.

ADO and other connected models can also face challenges when moving data from tier to tier. If a middle-tier component opened a connection and began retrieving data, it needed to feed the records to the client in a string, in a collection of objects, in a variant array, by passing a recordset object, or with some other method. Each method tended to have its own benefits and pitfalls, and no one method was right for every situation. Disconnected records in ADO.NET, on the other hand, can be much easier to pass from one tier to another (or across process boundaries in the same tier).

By not maintaining an open connection, and making it easier to pass data efficiently across process boundaries, ADO.NET achieves a new level of scalability.

Performance and Scalability Considerations

One of the primary questions with ADO.NET is, “Yes, but how does it perform compared to ADO?” The answer so far is, “Quite well.” ADO.NET is being tested by the ADO.NET product team, and even in its beta form, it is meeting or exceeding ADO performance when compared on an apples-to-apples basis. The reasons for this vary, but one reason is that the managed provider for SQL Server is faster and more efficient than the OLE DB provider for SQL Server. Another reason is that the objects are more finely tuned for specific tasks than those in ADO.

Performance and scalability are tightly coupled. As performance increases, scalability increases. Because tasks are completing faster, the same resources can handle more events in the same amount of time. Therefore, by tuning the performance, you can make the applications more scalable.

In addition to just trying to speed up the actual access time, the fact that ADO.NET can work in a disconnected mode has huge scalability implications. If you create an application that connects to a server for some of its data, you have a connection to the database. If you have five clients, this might not be an issue. However, if you have five thousand clients, this connection load could overwhelm your server, either from a resource standpoint, or a licensing (in other words, legal) standpoint. By building ADO.NET to connect, retrieve the data, and then disconnect, you achieve maximum scalability.

Consider what most people do with an application: They retrieve data and then examine that data. This examination, or “think time,” is often significantly longer than the time it takes to retrieve the data. Assume that you retrieve data from a database, and the process takes one second. You then display the data to the user and the user ponders it for the next 29 seconds. This means that in the total 30 seconds the data is being used, the connection is only needed one-thirtieth of the time, or for one second. Maintaining the connection the other 29 seconds would be a waste of time.

You might be thinking, “No big deal. I can do this with ADO. I only build Web applications, and by the time the data has rendered in the browser, my page has fallen out of scope and the connection is closed.” That is true, but what if you needed to pass that data from one component to another? How do you do this in the most efficient manner? What if the same component is to be used by a Windows client that does not have the object fall out of scope? What if part of the processing requires a fully scrollable cursor, which requires the connection to stay open when using ADO?

ADO.NET addresses many of these issues by storing the data in a disconnected mode, and by storing it in such a way that it is easy to pass from one component to a client, regardless of what that client happens to be. The passing mechanism is based on XML, which means that you do not have to manipulate the data types as you did when using COM marshaling with ADO disconnected recordsets.

The XML Basis of ADO.NET

XML, if you believe the hype, is the answer to everything. From the problems of world hunger to world peace to the meaning of life, XML is the answer. People get upset when you say it, because they want to think XML is magical, but XML is text. Before you have a fit, realize that HTML is text. Active Server Pages are text. There's no binary information in XML, HTML, or ASPs. They are all text files. However, it's the tags and formatting of XML and HTML that make them special, and XML is an incredibly powerful technology because it combines the data with the definition of the data, all in one neat package that is accessible to any XML-speaking application on any platform. Think of electronic data interchange (EDI) without having to look up in a book to discover what the 42nd position in a string represents. In a way, XML is the Esperanto of the computer world.

The good news about ADO.NET is that it natively speaks XML. When you retrieve data from a data source, or when you transmit data from one component to another component or a client, the format of the data is XML. If you happen to write data to a file in order to persist it, the format is XML. You could persist data in ADO using XML, but it was not the default; in ADO.NET, XML is the native tongue.

Not only can ADO.NET speak XML, it can read XML. If you have an XML file, ADO.NET can use that to create an in-memory data store that appears to you to be just like any other dataset. The data inside the DataSet object is not in XML format, but any time you transmit data, it is automatically sent as XML. The receiving application sees this incoming data as XML and can process it as it chooses.

Being able to natively read and write XML is important for several reasons. First, XML is an open standard that is supported on a variety of platforms. There are standards for how XML is structured, and for how the schema information is stored. This means that XML can be read by any number of clients, and that the schema of the data can be transmitted as well, obviating the need to predefine the schema and have the data in a fixed format, such as is the case with EDI. By basing data on XML, you allow your data to be transmitted to applications other than just .NET or COM programs. Instead, you can now interoperate with clients on any platform, written in any language, as long as they understand XML.

By being all text, XML is very easy to transfer via HTTP. As mentioned earlier, the binary format of an ADO recordset has no meaning on a Unix box, and the binary information is often not transferable over a proxy server or through a firewall. Text, on the other hand, flows freely through most corporate systems, using HTTP as the protocol.

Having now read a bit about what ADO.NET is and what its goals are, it is time to jump into a quick example and build some simple data access applications.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for The Importance of ADO.NET

Create new playlist

Sign In

Sign Up

The Importance of ADO.NET

The ADO.NET Family Tree

In the Beginning: Embedded SQL and APIs

ODBC, DAO, and RDO

OLE DB and ADO

Figure 6.1. The timeline of Microsoft's data access technologies.

Why ADO.NET Exists

ADO.NET's Disconnected Architecture

Performance and Scalability Considerations

The XML Basis of ADO.NET

Table of Contents for
The Importance of ADO.NET