HTML

XML is concerned with the definition and structure of the information in the document. XML has no tags to specify presentation.

FIGURE G.3 An example of an XML document

Figure G.3 shows an example of an XML document that provides details of the Finance department and its staff. Each item of data is known as an element and is enclosed within tags that describe the meaning of the data. For example, the element that is the name of a qualification is enclosed within <qualification-name> and </qualification-name> tags. Elements can be grouped. The name and award date of a qualification are both within a larger ‘qualification’ element, enclosed within <qualification> and </qualification> tags, and, in turn, a number of qualifications are listed within an even larger ‘qualifications’ element, enclosed within <qualifications> and </qualifications> tags.

XML documents have an inverted tree structure. At the top is a single root element, the ‘department’ element in this case. The tree structure for our document is shown in Figure G.4.

FIGURE G.4 The tree structure of the XML document

A valid XML document meets a number of technical criteria, the most important ones being:

  • There is a single root element.

  • Start tags and end tags match exactly.

  • There are no overlapping elements; each node in the tree has only one parent.

An XML document can easily be read by a human being, especially if it is well laid out. The XML document shown in this appendix has each element on a new line and indentation is used to illustrate the nesting of elements. However, a valid XML document can be laid out in any way. At the extreme, there may be no new lines or spaces at all, although these documents are less easy for a human to read. An XML document is also machine-readable, providing the machine that is reading the document understands the tags.

In HTML, the tags are all standardised and included in a specification published by the World Wide Web Consortium (W3C); there is no equivalent standard for XML. The data enclosed by the <last-name> and </last-name> tags in our example could just as easily have been enclosed by <last-name> and </last-name> tags, <family-name> and </family-name> tags, or <surname> and </surname> tags (and no doubt you can think of other possible tags that could be used here). The number of possible tags that can be used in an XML document is, therefore, infinite; the definition of tags is uncontrolled.

The lack of standard tags for XML is a problem. XML provides a very effective way to transfer data but the XML structure to be used for that transfer of data has to be defined in the same way that a structure, or schema, has to be defined for a relational database. Both sending and receiving parties, be they machines or humans, have to use the same elements, with those elements specified with the same enclosing tags, and the meaning of the content of those elements being unambiguously defined. Without common element definitions data cannot be transferred. If XML is to be used within an enterprise or between enterprises there has to be the same commitment to data definition as would be needed for common database designs for the sharing of data between databases. The set of allowed elements needs to be defined and the structure, the way that elements can be nested within each other, also needs to be specified. For example, ‘qualifications’ can be within ‘employee’, ‘qualification’ can be within ‘qualifications’, and ‘qualification-name’ can be within ‘qualification’. There have been a number of initiatives within particular industries to develop standard XML formats for the exchange of data between companies within that industry, but these initiatives are not coordinated. You can, therefore, end up with different formats for the same concept in different industries.

There are other problems with XML. First, it can generate documents that are very verbose to get over some quite simple data; this verbosity can inflate transmission and storage costs. Secondly, there are no datatypes in XML; everything is a character string. Thirdly, all XML structures are hierarchical in nature. This is a step backwards – hierarchical databases were replaced by network databases and then by relational databases because it is extremely difficult to represent the full complexity of data relationships using a hierarchical model alone. See Appendix B for an overview of hierarchical and network databases.

To document XML definitions and in an attempt to overcome some of these problems, the overall XML architecture includes a number of other components:

  • Document Type Definition (DTD) is a specification of the rules a group of XML documents must follow to be valid; for example, the elements that are allowed within a document are specified. One problem with DTDs is that they are expressed in a language that is not XML.

  • XML Schema Definition (XSD) is another way to specify the rules for a group of XML documents. An XSD is specified using an XML schema language and allows for more detailed constraints on a document’s logical structure than can be achieved with a DTD.

  • eXtensible Stylesheet Language (XSL) is the standard for describing presentation rules that apply to XML documents. The format of XML data can be converted into HTML so that it can be displayed using a web browser.

  • XSL Transformations (XSLT) is a specification that describes how to transform XML documents from one format to another.

  • XLink is a specification that describes how to define links between XML documents.

  • XPointer is a specification that describes how to specify a particular element within a document as the target of a link.

  • XPath makes it possible to refer to individual parts of an XML document to provide access to XML data from elsewhere.

  • XQuery is a query language for XML. It is analogous to SQL in relational databases but it can only be used to read data, not to manipulate it. XQuery provides the ability to navigate, select, combine, transform, sort and aggregate XML data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
13.59.96.247