Investigating XML

The Quiz game stores and manages quiz data using a format called eXtended Markup Language (XML). XML has become an important technology in the past few years. XML is often used to share information between programs. You will likely come across XML data somewhere in your programming travels, so you should know how to work with it. Fortunately, C# and .NET provide powerful tools for working with XML data.

Defining XML

XML can be a bewildering topic because it is a wide-reaching standard and implementation tools abound. However, the basic concepts behind XML are not as complex as they might seem.

XML is a technology for describing data. An XML document is a file that contains data. In addition to the data itself, an XML document contains information about the data (often called metadata). An XML document is written in plain text that can easily be distributed over the Internet. It uses a series of tags to describe the various data elements, much as HTML uses tags to describe the parts of a Web page. The interesting thing about XML is its capability to generate new tags to describe any kind of information.

A tag is simply a word inside angle brackets (<>). Most elements in XML have both a beginning tag and an ending tag. A tag can contain text (or other types of data) or other tags.

XML is closely related to HTML, the language of Web pages. XML and HTML share a common ancestor, SGML (Standard Generalized Markup Language), and they demonstrate a strong family resemblance. If you write any HTML code or look at the code behind a Web page, you will find XML to be very similar in its general design. HTML is designed for only one specific job–to format Web pages. XML is designed to let you describe any kind of data you want.

An example will help make sense of all this theory. The quiz program you saw at the beginning of the chapter is based on a form of XML I invented just for this chapter. (That’s the fun part of XML–you get to invent new languages.) My new348C#Progrm ami gf nr t oe A hbsol e B utgi enner

NOTE

IN THE REAL WORLD

In this chapter I am focusing on how you can use and define your own custom version of XML, but interesting and powerful predefined XML languages are available. Most of these languages are designed to simplify working with a particular type of information or data. Some interesting XML languages are

  • SMIL (Synchronized Multimedia Integration Language). This language defines multimedia presentations. It is used to generate slide shows, synchronize captions with streaming audio and video content, and manage other kinds of multimedia information.

  • SVG (Scalable Vector Graphics). This language allows authors to create vector-based graphics, which are often much more compact and flexible than traditional graphics schemes. It is supported by an international standards body, the W3C (World Wide Web Consortium), but not by Microsoft.

  • VML (Vector Markup Language). This language is Microsoft’s answer to SVG. It is another powerful, vector-based system for creating graphics in documents and Web pages.

  • CML (Chemical Markup Language). This language is dedicated to simplifying the drawing of chemical figures, which has long been a challenge for those who frequently write about chemistry.

  • SOAP (Simple Object Access Protocol). This language enables objects to communicate across the Internet.

language will describe a quiz. Take a look at the XML code that describes the sample test:
<?xml version="1.0" encoding="utf-8"?>

<test>
  <problem>
    <question>What is the primary language of this book?</question>
    <answerA>Cobol</answerA>
    <answerB>C#</answerB>
    <answerC>C--</answerC>
    <answerD>French</answerD>
    <correct>B</correct>
  </problem>

  <problem>
    <question>What does XML stand for?</question>
    <answerA>eXtremely Muddy Language</answerA>
    <answerB>Xerxes, the Magnificent Chameleon</answerB>
    <answerC>eXtensible Markup Language</answerC>
    <answerD>eXecutes with Multiple Limitations</answerD>
    <correct>C</correct>
  </problem>

  <problem>
    <question>Which command sends a message to the user?</question>
    <answerA>Messagebox.Show()</answerA>
    <answerB>sendMessage()</answerB>
    <answerC>Alert()</answerC>
    <answerD>HeyUser()</answerD>
    <correct>A</correct>
  </problem>
</test>

By examining the quiz code, you can see that it defines a structure for the quiz. The first line of the code describes the file as XML code and defines the encoding type:

<?xml version="1.0" encoding="utf-8"?>

Almost every XML file you see begins with a similar line. The utf-8 encoding refers to the Unicode text-formatting scheme used by C# and most other modern languages to support international languages. The line begins with <? and ends with ?> to indicate that this is a special header line. It is required in most XML documents.

The terms version and encoding are attributes of the XML tag. Each tag can have attributes that modify the data in some way. For example, the <img> tag in HTML has attributes to modify the width and height of the image. If you design an XML document, your tags can have attributes. Working with attributes is easy, but to keep things simple in this introductory chapter, I decided to use them only where they are mandated, as in the XML tag in the preceding code.

If you examine the quiz’s structure, you will quickly see some patterns emerge. The document is composed of nested structures. The bulk of the document is encased inside a <test> </test> pair. Inside that are a series of <problem> </problem> sets. Each problem consists of a <question>, several <answers>, and a <correct> element. You might also see the structure as a hierarchy, as illustrated in Figure 10.4.

Figure 10.4. The test contains problems, which can contain a question, answers, and the correct answer.


It’s important to note that the tags describe what kind of data is represented, rather than how data is to be presented on the screen. An important aspect of XML is how it focuses on the meaning of the data alone. XML data is intended to be used by many programs written in different languages on different platforms, so it is up to the program that uses the data to determine how exactly the data will be displayed.

NOTE

IN THE REAL WORLD

The original intent of HTML was to describe only the meanings of various text elements in the context of a Web page rather than to determine how a page is to be depicted on the monitor. (In fact, a tag such as <i> is considered rude by HTML purists, who prefer the <emphasis> tag because it describes what kind of text is being encased rather than how to display it.) HTML documents are meant to be displayed on many kinds of devices, including cell phones and PDAs, which cannot always handle all the complex formatting commonly indicated on modern Web pages. For this reason, it is still smarter to use HTML to determine what the text means rather than how it is displayed. It is considered more appropriate to use style sheets to determine exactly how the data is to be displayed. XML documents can be displayed using style sheets, also, but programmers can do even more. When you learn how to read and manipulate an XML document in this chapter, you will be able to display XML data however you want.

Learning XML Syntax Rules

You must follow a set of rules when creating an XML language. In this sense, XML is stricter than HTML. However, these rules are very easy to learn and allow tremendous flexibility.

All Tags Are Lowercase

In HTML, you can use <center> or <CENTER>. In an XML language, you are required to use lowercase letters for any tag you define. However, to separate words, you can use mixed case, such as <myTag> or <questionNumber>.

All Tags Have an Ending Tag

If you define a tag, it must have a corresponding ending tag. The ending tag is just like the start tag, except for a slash (/) inside the angle brackets. For example, you can define a <note> tag, but you must also have a </note> tag. Tags that do not usually have an ending tag (such as the <img> tag in HTML, which defines an image) can have a slash at the end of the tag. For example,

<img src = "myImage.gif"/>

is legal in XML, but

<img src = "myImage.gif">

is not legal.

All Attribute Values Are Enclosed in Quotes

A tag can include an attribute to modify the tag’s meaning. For example, in the test data, you can have multiple-choice questions, short-answer questions, and true-or-false questions. You can extend the problem tag to indicate the type of question. For example, you could write

<problem type = "mc">

to indicate a multiple-choice question

<problem type = "tf">

or to indicate a true-or-false question.

In this case, type is an attribute, and the value of the type attribute could be mc or tf.

Creating an XML Document in .NET

You can create any kind of XML document you want in order to describe any kind of data you want to work with. However, XML works best with hierarchical types of data structures that can be organized in an outline form. The first step of defining an XML language is to look at the data you are working with and try to organize it into an outline form. For the quiz program, I realized that a quiz is made up of several problems. (I avoided the use of the term question to describe the major element of a quiz because I wanted to use question to describe the question being asked.)

Each problem consists of a question, four possible answers, and the correct answer. To simplify the example, I decided to work only with multiple-choice questions with four answers. When I had decided what kinds of tags I would have, I started to build a sample document. You can use any text editor you want to build an XML document, but Visual Studio comes with a very nice XML document editor that makes the task simple. To use the XML editor, open up a project in the editor, and choose Add New Item from the Project menu. Select XML Document from the resulting dialog to open the XML editor. If you already have an XML document that you’d like to open in the editor, choose Open File from the File menu, and select the XML file from the drive system. (Alternatively, you can choose Add Existing Item from the Project menu. Opening a file doesn’t necessarily add the file to your project, but adding an existing item does.) Figure 10.5 shows an XML file being written in the Visual Studio XML editor.

Figure 10.5. The Visual Studio XML editor automatically indents your code and creates a closing tag for each opening tag.


The XML editor included with Visual Studio makes writing XML easy because it automatically indents your XML code and, each time you create a tag, it creates an ending tag. The editor also color-codes the XML much like normal C# code, which helps you to separate the XML code from the actual data. Visual Studio also includes a very handy editor for writing your own XML schema, but you will not need it for this brief introduction to XML. After you create the basic framework of your data structure, you can click the data tag at the bottom of the XML code window to switch to a table view (see Figure 10.6).

Figure 10.6. After you define your data set, you can enter it in, just like a database.


Visual Studio can automatically convert your XML structure into a table for easy data entry. It allows you to enter your data quickly and accurately without having to repeat all the XML tags. This gives you the flexibility of XML with the easy data access of a more formal database. In Chapter 11, “Databases and ADO.NET: The Spymaster Database,” you will learn more about the relationship between XML and databases in .NET.

Creating an XML Schema for Your Language

If you’re going to reuse your language or you expect other people to use it, you should have a formal definition of the rules of your language. There has been some debate among the XML community about how this should be done, but .NET provides a solution that is extremely simple and elegant. When you load or create XML code in the IDE XML editor, a new XML menu appears on the menu bar. After you define your first set of data, you can choose Schema from the XML menu visible in the XML editor. This creates a new XML document that describes how your data works. The following XML code illustrates the schema automatically generated for the quiz document:

<?xml version="1.0" ?>
<xs:schema id="test"
  targetNamespace="http://tempuri.org/sampleTest.xsd"
  xmlns:mstns="http://tempuri.org/sampleTest.xsd"
  xmlns="http://tempuri.org/sampleTest.xsd"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"
  attributeFormDefault="qualified"
  elementFormDefault="qualified">
<xs:element name="test" 
  msdata:IsDataSet="true"
  msdata:EnforceConstraints="False">
    <xs:complexType>
      <xs:choice maxOccurs="unbounded">
        <xs:element name="problem">
          <xs:complexType>
            <xs:sequence>
              <xs:element name="question" type="xs:string" minOccurs="0" />
              <xs:element name="answerA" type="xs:string" minOccurs="0" />
              <xs:element name="answerB" type="xs:string" minOccurs="0" />
              <xs:element name="answerC" type="xs:string" minOccurs="0" />
              <xs:element name="answerD" type="xs:string" minOccurs="0" />
              <xs:element name="correct" type="xs:string" minOccurs="0" />
            </xs:sequence>
          </xs:complexType>
        </xs:element>
      </xs:choice>
    </xs:complexType>
  </xs:element>
</xs:schema>

The meaning of this code is beyond the scope of an introduction to XML, but the code is automatically generated, so you can use it even if you don’t know exactly what it’s doing.

You can also choose Validate from the XML menu to ensure that your data follows the guidelines generated by the schema. For a beginner, creating a schema and validating your documents are not necessary because your first attempts at XML code will probably be simple and nobody but you will use your particular XML dialect. If you write an XML language that others will use, you will want to explore data validation because it can prevent many kinds of data errors. In this chapter most of the XML will be generated automatically by the programs, so there is no need to validate it.

Investigating the .NET View of XML

The .NET framework defines a set of classes that map to an XML document and its constituent parts. An XML document is essentially seen as a tree. The document itself is the base of the tree. The document contains a series of nodes. .NET has a class to define the Node element. Each pair of tags (such as <question> and </question>) is considered an element. The information between the tags is the data (usually text). The .NET environment provides three classes that are critical for using XML:

  • The XmlNode class defines the basic characteristics of any node in an XML document.

  • The XmlElement class extends XmlNode and adds a few methods for dealing with attributes.

  • The XmlDocument class also extends XmlNode, but it adds several methods, mainly for storing and loading documents and creating new nodes.

Essentially, you work with an XML document in .NET by defining an XmlDocument, which is mapped to a specific document on the drive system. All the tags in the document are instances of the XmlElement class. Because both XmlDocument and XmlElement are derived from XmlNode, they also share the characteristics of the XmlNode class.

Exploring the XmlNode Class

The XmlNode class describes each node in a document. A node can be an entire XML document, an element (a pair of tags), or data (the text inside a pair of tags). The XmlNode class has properties and methods that enable you to figure out what type of node you are working with. It also features the capability to extract and modify the data in a node. Table 10.1 presents important properties and methods of the XmlNode class.

Table 10.1. SELECTED MEMBERS OF THE XMLNODE CLASS
Element Type of Element Description Example
ChildNodes Property A collection of all the children of the node MessageBox.Show(theNode. ChildNodes[2].InnerText);//shows the text of child node 3 of the current node
FirstChild Property The first child of a node MessageBox.Show(theNode. FirstChild.InnerText);//shows the text of the first child of the current node
InnerText Property Gets or sets the text of this node and all its children MessageBox.Show (theNode. InnerText);//shows the text value of the current node
Name Property Returns or sets the name of the node MessageBox.Show(theNode. Name);//displays the name of the current node
NextSibling Property Returns the next node at the current level of the hierarchy (or null) theNode = theNode. NextSibling;//moves the node to its next sibling or sets the node to null
ParentNode Property Returns the parent of the current node theNode = theNode.ParentNode;//sets the node to its parent
AppendChild(node) Method Adds a child node to the end of this node’s children theNode.AppendNode(newNode);//adds newNode to theNode's children
Clone() Method Creates a copy of this node theNode.ParentNode. AppendNode(theNode.Clone());//adds a copy of this node at the same level as the node
RemoveChild(node) Method Removes a child node from the current node theNode.Remove(theNode. FirstChild);//removes the first child from the node

The XmlNode class is rarely used directly, but it has two descendants that form the foundation of all XML documents. The XmlDocument class describes an entire document, and the XmlElement class describes an element, which is the part of an XML document surrounded by a pair of tags. Much of the functionality of the XmlDocument and XmlElement classes is inherited from the XmlNode class, but the XmlDocument class has a few important properties and methods of its own, which are particular to a document.

Exploring the XmlDocument Class

The XmlDocument class describes an XML document. It features important properties and methods for working with the entire document. It has save and load methods, which allow you to save and load a document directly, without needing to use streams. The XmlDocument has methods for creating various elements inside the document, including the CreateElement() and CreateAttribute() methods. It has methods for adding a node to a document: Append(), InsertBefore(), and InsertAfter().

Table 10.2 demonstrates a few key properties and methods of the XmlDocument class.

Table 10.2. SELECTED MEMBERS OF THE XMLDOCUMENT CLASS
Member Type Description ExampleExample Description
DocumentElement Property Sets or gets the root XML element for the document MessageBox.Show (doc.DocumentElement); Displays the root element for an XmlDocument named doc
CreateAttribute (name) Method Creates a new attribute for an element with the specified namedoc.CreateAttribute ("type"); Creates a new attribute but does not add it
CreateElement (name) Method Creates a new element doc.CreateElement ("question"); Creates a new element but does not add it
Load(filename) Method Loads an XML document from the given file name doc.Load ("myStuff.xml"); Loads the specified file into the document
LoadXml (xmlString) Method Interprets a string value as XML doc.LoadXml ("<simpleXml>this is simple</simpleXml>"); Loads the text value as a simple XML document
Save (filename) Method Saves the document to the specified file doc.Save ("myStuff.xml"); Saves the current document to a file namedmyStuff.xml

Note that the CreateElement() and CreateAttribute() methods create the specified structures but do not specifically add them to the document.

Use the XmlNode.AppendNode(), XmlNode.InsertBefore(), or XmlNode.InsertAfter() method to add a node and the XmlElement.SetAttributeNode() method to set an attribute to a node.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.234.28