XML in the .NET Framework

The .NET Framework XML core classes can be categorized according to their functions: reading and writing documents, validating documents, navigating and selecting nodes, managing schema information, and performing document transformations. The assembly in which the whole XML .NET Framework is implemented is system.xml.dll.

The most commonly used namespaces are listed here:

  • System.Xml

  • System.Xml.Schema

  • System.Xml.XPath

  • System.Xml.Xsl

The .NET Framework also provides for XML object serialization. The classes involved with this functionality are grouped in the System.Xml.Serialization namespace. XML serialization writes objects to, and reads them from, XML documents. This kind of serialization is particularly useful over the Web in combination with the Simple Object Access Protocol (SOAP) and within the boundaries of .NET Framework XML Web services.

Related XML Standards

Table 1-1 lists the XML-related standards that have been implemented in the .NET Framework. The table also provides the official URL for each standard for further reference.

Table 1-1. W3C Standards Supported in the .NET Framework
Standard Reference
XML 1.0 http://www.w3.org/TR/1998/REC-xml-19980210
XML namespaces http://www.w3.org/TR/REC-xml-names
XML Schema http://www.w3.org/TR/xmlschema-2
DOM Level 1 and Level 2 Core http://www.w3.org/TR/DOM-Level-2
XPath http://www.w3.org/TR/xpath
XSLT http://www.w3.org/TR/xslt
SOAP 1.1 http://www.w3.org/TR/SOAP

As a data exchange technology, XML is fully and tightly integrated into the .NET Framework. Table 1-2 provides a quick schematic view of the main areas of the .NET Framework in which significant traces of XML are clearly visible. Each area includes numerous classes and provides a set of application-level functions.

Table 1-2. Areas of the .NET Framework in Which XML Is Key
Category Description
ADO.NET Data container objects (for example, the DataSet object) are always transferred and remoted via XML. The .NET Framework also provides for two-way synchronized binding between data exposed in tabular format and XML format.
Configuration Application settings are stored in XML files, making use of predefined and user-defined section readers. (More on readers later.)
Remoting Remote .NET Framework objects can be accessed by using SOAP packets to prepare and perform the call.
Web services SOAP is a lightweight XML protocol that Web services use for the exchange of information in a decentralized, distributed environment. Typically, you use SOAP to invoke methods on a Web service in a platform-independent fashion.
XML parsing The core classes providing for XML parsing and manipulation through both the stream-based API and the XML Document Object Model (XMLDOM).
XML serialization Supplies the ability to save and restore living instances of objects to and from XML documents.

Although not strictly part of the .NET Framework, another group of classes deserves mention: the managed classes defined in the SQL Server 2000 XML Extensions (SQLXML). SQLXML 3.0 extends the XML capabilities of SQL Server 2000 by introducing Web services support. SQLXML 3.0 makes it possible for you to export stored procedures as SOAP-based Web services and also extends ADO.NET capabilities with server-side XPath queries and XML views. SQLXML 3.0 is available as a separate download, but it seamlessly integrates with the existing installation of the .NET Framework. We’ll look at SQLXML 3.0 in more detail in Chapter 8.

In general, the entire set of XML classes provided with the .NET Framework offers a standards-compliant, interoperable, extensible solution to today’s software development challenges. This support is not a tacked-on API but a true part of the .NET Framework.

Note

Almost all of today’s XML parsers support the latest W3C specification for the DOM Level 2 Core. The current specification does not define a standard interface to persist and restore contents, however, although the most popular XML parsers, such as Microsoft’s XML Core Services (MSXML)—formerly known as the Microsoft XML Parser—and some others based on Java, already have their own ways to persist objects to streams and to restore objects from them. These mechanisms have yet to be considered as custom and platform-specific extensions. An official API for serializing documents to and from XML format will not be available until DOM Level 3 Core achieves the status of a W3C recommendation. As of summer 2002, DOM Level 3 Core is qualified as a work in progress. The publicly available draft defines the specification for a pair of Load and Save methods designed to enable loading XML documents into a DOM representation and saving a DOM representation as an XML document. For more information, refer to http://www.w3.org/TR/2002/WD-DOM-Level-3-Core-20020409.

A known parser that already provides an experimental implementation of DOM Level 3 Core is IBM’s XML Parser for Java (Xml4J). See http://www.alphaworks.ibm.com/tech/xml4j for more information.


Core Classes for Parsing

Regardless of the underlying platform, the available XML parsers fall into one of two main categories: tree-based parsers and event-based parsers. Each parser category is designed according to a different philosophical approach and, subsequently, has its own pros and cons. The two categories are commonly identified with their two most popular implementations: XMLDOM and Simple API for XML (SAX). The XMLDOM parser is a generic tree-based API that renders an XML document as an in-memory structure. The SAX parser provides an event-based API for processing each significant element in a stream of XML data.

Conceptually speaking, a SAX parser is diametrically opposed to an XMLDOM parser, and the gap between the two models is indeed fairly large. XMLDOM seems to be clearly defined in its set of functionalities, and there is not much more one can reasonably expect from the evolution of this model. Regardless of whether you like the XMLDOM model or find it suitable for your needs, you can’t really expect to radically improve or change its way of working. In a certain sense, the down sides of the XMLDOM model (memory footprint and bandwidth required to process large documents) are structural and stem directly from design choices.

SAX parsers work by letting client applications pass living instances of platform-specific objects to handle parser events. The parser controls the whole process and pushes data to the application, which is in turn free to accept or simply ignore the data. The SAX model is extremely lean and features a limited complexity in space.

The .NET Framework provides full support for the XMLDOM parsing model but not for the SAX model. The set of .NET Framework XML core classes supports two parser models: XMLDOM and a new model called an XML reader. The lack of support for SAX parsers does not mean that you have to renounce the functionality that a SAX parser can bring, however. All the functions of a SAX parser can be easily and even more effectively implemented using an XML reader. Unlike a SAX parser, a .NET Framework XML reader works under the total control of the client application, enabling the application to pull out only the data it really needs and skip over the remainder of the XML stream.

Readers are based on .NET Framework streams and work in much the same way as a database cursor. Interestingly, the classes that implement this cursor-like parsing model also provide the substrate for the .NET Framework implementation of the XMLDOM parser. Two abstract classes—XmlReader and XmlWriter—are at the very foundation of all .NET Framework XML classes, including XMLDOM classes, ADO.NET-related classes, and configuration classes. So in the .NET Framework you have two possible approaches when it comes to processing XML data. You can use either any classes directly built onto XmlReader and XmlWriter or classes that expose information through the well-known XMLDOM.

The set of XML core classes also includes tailor-made class hierarchies to support other related XML technologies such as XSLT, XPath expressions, and the Schema Object Model (SOM).

We’ll look at XML core classes and related standards in the following chapters. In particular, Chapter 2, Chapter 3, Chapter 4, and Chapter 5 describe the core classes and parsing models. Chapter 6 and Chapter 7 examine the related standards, such as XPath and XSL.

XML and ADO.NET

The interaction between ADO.NET classes and XML documents takes one of two forms:

  • Serialization of ADO.NET objects (in particular, the DataSet object) to XML documents and corresponding deserialization. Data can be saved to XML in a variety of formats, with or without schema information, as a full snapshot of the in-memory data including pending changes and errors, or with just the current instance of the data.

  • A dual-access model that lets you access and update the same piece of data either through a hierarchical programming interface or using the ADO.NET relational API. Basically, you can transform a DataSet object into an XMLDOM object and view the XMLDOM’s subtrees as tables merged with the DataSet object’s tables.

The ADO.NET DataSet class represents the only .NET Framework object that can be natively saved to XML. The XML representation of a DataSet object can have two different layouts: the ADO.NET normal form and the DiffGram format. In particular, the DiffGram format describes the history of the data and all recent changes. Each changed row in each table is represented by two nodes: the first node contains the snapshot of the row as it was originally read, and the second node contains the current values. The DiffGram represents a snapshot of the DataSet state and contents at a given moment. To write DiffGrams, ADO.NET uses an XmlWriter object.

The integration of and interaction between XML and ADO.NET classes is discussed in Chapter 8.

Application Configuration

Before Microsoft Windows 95, applications stored configuration settings to a text file with a .ini extension. INI files store information using name/value pairs grouped under sections. Ultimately, an INI file is a collection of sections, with each section consisting of any number of name/value pairs.

Windows 95 revamped the role of the system registry—a centralized data repository originally introduced with Windows NT. The registry is a collection of binary files that the operating system manages in exclusive mode. Client applications can read and write the contents of the registry only by using a tailor-made API. The registry works as a kind of hierarchical database consisting of root nodes (also known as hives), nodes, and entries. Each entry is a name/value pair.

All system, component, and application settings are supposed to be stored in the registry. The registry continues to increase in size, contributing to the creation of a configuration subsystem with a single (and critical) point of failure. More recently, applications have been encouraged to store custom settings and preferences in a local file stored in the application’s root folder. For .NET Framework applications, this configuration file is an XML file written according to a specific schema.

In addition, the .NET Framework provides a specialized set of classes to read and write settings. The key class is named AppSettingsReader and works as a kind of parser for a small fragment of XML code—mostly a node or two with a few attributes.

ASP.NET applications store configuration settings in a file named web.config that is located in the root of the application’s virtual folder. Windows Forms applications, on the other hand, store their preferences in a file with the same name as the executable plus a .config extension—for example, myprogram.exe.config. The CONFIG file must be available in the same folder as the main executable. The schema of the CONFIG file is the same regardless of the application model.

The contents of a CONFIG file is logically articulated into sections. The .NET Framework provides a number of predefined sections to accommodate Web and Windows Forms settings, remoting parameters, and ASP.NET run-time characteristics such as the authentication scheme and registered HTTP handlers and modules.

User-defined applications can extend the XML schema of the CONFIG file by defining custom sections with custom elements. By default, however, the AppSettingsReader class supports only settings expressed in a few formats, such as name/value pairs and a single tag with as many attributes as needed. This schema fits the bill in most cases, but when you have complex structured information, it soon becomes insufficient. Information is read from a section using special objects called section handlers. If no predefined section structure fits your needs, you can provide a tailor-made configuration section handler to read your own XML data, as shown here:

<configuration>
  <configSections>
    <section name="MySection"
      type="MySectionHandlerClass, assembly" />
  </configSections>
  <MySection>
    ⋮
  </MySection>
</configuration>

A configuration section handler is simply a .NET Framework class that parses a particular XML fragment extracted from the CONFIG file. We’ll look at custom section handlers in more detail in Chapter 15.

Interoperability

XML is key to making .NET Framework applications interoperate with each other and with external applications running on other software and hardware platforms. XML interoperability is a sort of blanket term that covers three .NET-specific technologies: XML Web services, remoting, and XML object serialization.

By rolling functionality into an XML Web service, you can expose the functionality to any application on the Web that, irrespective of platform, speaks HTTP and understands XML. Based on open standards (HTTP and XML, but also SOAP), XML Web services are an emerging technology for system interoperation and are supported by the major players in the IT industry. The .NET Framework provides a special infrastructure to build both remote services and proxy-based clients.

Actually, in the .NET Framework, an XML Web service is treated as a special case of an ASP.NET application—one that is saved with a different file extension (.asmx) and accessible through the SOAP protocol as well as through HTTP GET and POST commands. Incoming calls for both .aspx files (ASP.NET pages) and .asmx files are processed by the same Internet Information Services (IIS) extension module, which then dispatches the request to distinct downstream factory components.

In an XML Web service, XML plays its role entirely behind the scenes. It is first used as the glue for the SOAP payloads that the communicating sides exchange. In addition, XML is used to express the results of a remote, cross-platform call. But what if you write a .NET XML Web service with one method returning, say, an ADO.NET DataSet object? How can a Java application handle the results? The answer is that the DataSet object is serialized to XML and then sent back to the client.

The .NET Framework provides two types of object serialization: serialization through formatters and XML serialization. The two live side by side but have different characteristics. XML serialization is the process that converts the public interface of an object to a particular XML schema. The goal is simplifying the process of data exchange between components rather than truly serializing objects that will then be deserialized to living and effective instances.

Remoting is the .NET Framework counterpart of the Distributed Component Object Model (DCOM) and uses XML to configure both the client and the remote components. In addition, XML is used through SOAP to serialize outbound parameters and inbound return values. Remoting is the official .NET Framework API for communicating applications, but it works only between .NET peers.

XML serialization, remoting, and XML Web services are covered in Part IV—specifically in Chapter 11, Chapter 12, and Chapter 13.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.226.180.68