16.4 XML

HTML is fixed; that is, HTML has a predefined set of tags and each tag has its own semantics (meaning). HTML specifies how the information in a web page should be formatted, but it doesn’t really indicate what the information represents. For example, HTML may indicate that a piece of text should be formatted as a heading, but it doesn’t specify what that heading describes. In fact, nothing about HTML tags describes the true content of a document. The Extensible Markup Language (XML) allows the creator of a document to describe its contents by defining his or her own set of tags.

XML is a metalanguage. Metalanguage is the word language plus the prefix meta-, which means “beyond” or “more comprehensive.” A metalanguage goes beyond a normal language by allowing us to speak precisely about that language. It is a language for talking about, or defining, other languages. It is like an English grammar book describing the rules of English.

A metalanguage called the Standard Generalized Markup Language (SGML) was used by Tim Berners-Lee to define HTML. XML is a simplified version of SGML and is used to define other markup languages. XML has taken the Web in a new direction. It does not replace HTML—it enriches it.

Like HTML, an XML document is made up of tagged data. But when you write an XML document, you are not restricted to a predefined set of tags, because there are none. You can create any set of tags necessary to describe the data in your document. The focus is not on how the data should be formatted, but rather on what the data is.

For example, the XML document in FIGURE 16.6 describes a set of books. The tags in the document annotate data that represents a book’s title, author(s), number of pages, publisher, ISBN number, and price.

Images

FIGURE 16.6 An XML document containing data about books

The first line of the document indicates the version of XML that is used. The second line indicates the file that contains the Document Type Definition (DTD) for the document. The DTD is a specification of the organization of the document. The rest of the document contains the data about two particular books.

The structure of a particular XML document is described by its corresponding DTD document. The contents of a DTD document not only define the tags, but also show how they can be nested. FIGURE 16.7 shows the DTD document that corresponds to the XML books example.

Images

FIGURE 16.7 The DTD document corresponding to the XML books document

The ELEMENT tags in the DTD document describe the tags that make up the corresponding XML document. The first line of this DTD file indicates that the books tag is made up of zero or more book tags. The asterisk (*) beside the word book in parentheses stands for zero or more. The next line specifies that the book tag is made up of several other tags in a particular order: title, authors, publisher, pages, isbn, and price. The next line indicates that the authors tag is made up of one or more author tags. The plus sign (+) beside the word author indicates one or more authors are permitted. The other tags are specified to contain PCDATA (Parsed Character Data), which indicates that the tags are not further broken down into other tags.

The only tag in this set that has an attribute is the price tag. The last line of the DTD document indicates that the price tag has an attribute called currency and that it is required.

XML provides a standard format for organizing data without tying it to any particular type of output. A related technology called Extensible Stylesheet Language (XSL) can be used to transform an XML document into another format suitable for a particular user. For example, an XSL document can be defined that specifies the transformation of an XML document into an HTML document so that it can be viewed on the Web. Another XSL document might be defined to transform the same XML document into a Microsoft Word document, into a format suitable for a mobile phone, or even into a format that can be used by a voice synthesizer. This process is depicted in FIGURE 16.8. We do not explore the details of XSL transformations in this book.

An XML document is sent to the XSL that is transformed into many output formats labeled HTML document, Phone document, MS-Word document, and Voice synthesizer.

FIGURE 16.8 An XML document can be transformed into many output formats

Another convenient characteristic of languages specified using XML is that documents in the language can be generated automatically with relative ease. A software system, usually with an underlying database, can be used to generate huge amounts of specific data formatted in a way that is easily conveyed and analyzed online. Once generated, the data can be transformed and viewed in whatever manner best serves individual users.

Several organizations have already developed XML languages for their particular topic areas. For example, chemists and chemical engineers have defined the Chemistry Markup Language (CML) to standardize the format of molecular data. CML includes a huge number of tags covering specific aspects of chemistry. It provides a common format by which chemistry professionals can share and analyze data.

Keep in mind that XML is a markup specification language, whereas XML files are data. The files just sit there until you run a program that displays them (like a browser), does some work with them (like a converter that writes the data in another format or a database that reads the data), or modifies them (like an editor). XML and its related technologies provide a powerful mechanism for information management and for communicating that information over the Web in a versatile and efficient manner. As these technologies evolve, new opportunities to capitalize on them will surely emerge.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.17.184.90