Many Technologies Contribute to the Power of XML

As we mentioned earlier, XML is not simply one Recommendation and one technology. This can be a bit confusing to people who are new to XML and are looking for a single, simple language that they can use, such as Visual Basic. For example, it is possible to use XML, the core 1.0 Recommendation version of XML, to create well-formed XML documents for use in a variety of computing applications. These XML documents could be used as a standard file format for storing application documents, or they could be used as configuration files for a server, and so on.

However, if you wanted to use XML as a file format for storing information, and then publishing that information in print, on CD-ROM, and on the World Wide Web, you would need to make use of some other technologies that are not specifically XML, but might be based on XML, or be supplementary to XML. For example, you might have an XML document that you want to display on the Web; however, XML documents do not contain any information about display formatting. To transform the XML data into HTML or XHTML for displaying it on the Web, you might need to use a style sheet, such as the Extensible Stylesheet Language (XSL).

You might also need to specify exactly how XML files are to be structured, using a set of rules such as a Document Type Definition (DTD). DTDs are an integral part of creating valid XML, but they are actually not formally defined anywhere.

Note

XML can come in two varieties: well formed and valid. Well-formed XML means that the XML is written in the proper format, and that it complies with all the rules for XML as set forth in the XML 1.0 Recommendation. Valid XML means that the XML document has been validated against a rule set, or schema, such as a Document Type Definition or an XML Schema. XML cannot be considered “valid” unless it has a DTD or schema, and the document meets the constraints set out in that schema.


DTDs are a holdover from SGML, maintained for compatibility reasons. The syntax used for the declarations in DTDs is defined as a part of the XML 1.0 Recommendation.

DTDs are useful—without them or another type of schema, it is impossible to verify that an XML file is structured properly within the rules the author had in mind. But DTDs are not required in order to use XML. In fact, that is how many of the following technologies could be described: They extend XML and are supplementary; however, none of them are required. Still, they can be very handy in saving you time and effort when working with XML. It's a good idea to familiarize yourself with these technologies and their uses, so that you can save yourself development time and effort in the long run.

XML 1.0

When people talk about XML, they are generally referring to the Extensible Markup Language as defined in the XML 1.0 Recommendation published by the W3C. This Recommendation defines the basic structures of XML, such as

  • Elements

  • Attributes

  • Entities

  • Notations

  • CDATA sections

  • PCData Sections

  • Comments

This includes defining the conventions for names, case sensitivity, start tags, end tags, and so on. Everything you need to work with well-formed XML is contained within this one Recommendation.

With well-formed XML, you could develop a document format to be used with a specific application. You could develop a document format for exchanging data between two applications, between vendors and suppliers, customers and vendors, and so on. Virtually any type of document that contains data can be constructed using simple, straightforward, well-formed XML. Here's a short example of a well-formed XML document:

<document> 
<title>Introducing XML</title>
<byline>John Doe</byline>
<body>Learning about XML is not complicated...</body>
</document>

All the elements have both start and end tags, and all the tags are properly nested; therefore, it meets the basic requirements of an XML document. However, if you want to go one step further, and actually make sure that the structure of the document adheres to some rules, you can author a Document Type Definition (DTD) and use that with your XML documents to create validated XML. Validation allows you to check each XML document against the rules set for violations, which helps you make sure that all of your documents are formatted correctly and helps you ensure data integrity.

All the information you need to author DTDs, from Element Type Declarations to Attribute List Declarations, are outlined in the XML 1.0 Recommendation. However, DTDs do have a different syntax from XML, which confuses many XML newcomers.

So, even though DTDs might seem as though they are a supplementary technology (because, in a way, they are), they are still part of the XML 1.0 Recommendation. We will take a closer look at DTDs and authoring them in Chapter 4, “Structuring XML Documents with DTDs.”

XML-Related Recommendations

Building on the foundation established with the XML 1.0 Recommendation, there are also a number of W3C Recommendations that are very closely related to the core XML technology. In this category, the Recommendations define some technologies that are designed specifically to add functionality to XML 1.0. In fact, these could be part of the 1.0 Recommendation, although that would complicate it needlessly. These technologies all have some common features:

  • They are all W3C Recommendations.

  • They are all related to well-formed or valid XML.

  • They are structural Recommendations.

These technologies include XML Namespaces and XML Schemas, both of which are designed to address shortcomings within the XML 1.0 Recommendation. Let's take a closer look at these Recommendations now.

Namespaces

XML allows developers to create their own markup languages, for use in a variety of applications. However, there is nothing to stop two developers from developing markup languages that have similar tags, but with different structure or meaning. If both of these developers were using their markup languages internally only, this might not be a problem. But what if these developers start sharing their vocabularies with their clients, vendors, and the general public? The result could be confusion about what tag means what, and in what context.

For example, let's say both developers want to keep track of customer names, a very common piece of information to store. Developer One designs a <name> element that looks like this:

<name> 
<first>John</first>
<last>Doe</last>
</name>

Developer Two, however, prefers to use a <name> element with no children:

<name>John Doe</name> 

Both are perfectly valid uses of XML, and both might have advantages over the other. That doesn't really matter in this example. What does matter, however, is that we now have two <name> elements that store similar data, but in different ways.

For example, what happens if a vendor is working with both organizations? How do they know which name format to use when? That is where namespaces come in handy. They allow you to create elements as being a part of a specific namespace. This means that when they are used, the parser is aware that they belong to a namespace, and if a similar element is used, but it belongs to a different namespace, there is no conflict.

Namespaces make use of a special attribute called xmlns that allows you to define a prefix and the namespace URI. For example, here's a simple document with two namespaces, vendor and supplier:

<?xml version="1.0"?> 
<customers
        xmlns:vendor="http://www.vendor.com"
        xmlns:supplier="http://www.supplier.com">
<vendor:name>John Dough</vendor:name>
<supplier:name>
<first>Jane</first>
<last>Doe</last>
</supplier:name>
</customers>

As you can see, we have two name elements; however, one is in the http://www.vendor.com namespace, and the other is in the http://www.supplier.com namespace. We will discuss namespaces and using them with XML in greater detail in Chapter 6, “Avoiding XML Confusion with XML Namespaces.”

XML Schemas

We've talked briefly about the concept of well-formed versus valid XML (and we will discuss this in more detail in later chapters as well). So, you should understand that in order to be considered valid, the XML document needs to either have a DTD or an XML Schema.

As we mentioned earlier, there is no formal Recommendation that deals explicitly with Document Type Definitions. Instead, the mechanics of DTDs are addressed directly in the XML 1.0 Recommendation as they are needed. Because of this, if you wanted to learn the mechanics of DTDs (which we will in Chapter 4), you would turn to the XML 1.0 Recommendation.

However, there is another mechanism for providing the rules set, or schema, for an XML document: XML Schemas. XML Schemas represent a formal schema language for defining the structure of XML documents.

The XML Schema specification deals with some of the shortcomings of DTDs, such as the lack of robust data structures, and also abandons the cryptic syntax of DTDs for an easier-to-use XML-based syntax. That doesn't mean that XML Schemas are necessarily simple; in fact, they can be very complex and powerful. We'll take a closer look at XML Schemas in Chapter 5, “Defining XML Document Structures with XML Schemas.”

In fact, there are actually three separate parts to the Recommendation for XML Schemas:

  • XML Schema Part 0: Primer

  • XML Schema Part 1: Structures

  • XML Schema Part 2: Datatypes

The first part, Part 0, is a tutorial designed to familiarize developers with the XML Schema syntax and use. The second part, Part 1, deals with the logical structures of XML Schema, and the syntax used for authoring schemas. Finally, Part 2 deals with data structures. Data structures are an important step for XML, because with DTDs there are no mechanisms for specifying data structures within a document. However, XML Schema brings the power and complexity of data structures to XML.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.142.156.202