What Is XML?

XML has arisen from the need for a portable data format.

Essentially, XML is a standard for representing data in a text document. XML provides a framework for representing almost any kind of data, which is one of the reasons why it has attracted so much interest.

An XML document consists of text-based tags used to provide the document structure (similar to those used in HTML) together with the data itself. All XML documents consist of elements and optional declarations and comments.

Elements

An element has the following form:

<start_tag attributes>body<end_tag>

For example,

<book title="J2EE in 21 Days">A very useful book</book>

In XML, unlike HTML, the tags are not predefined. As the author of an XML document, you are free to invent whatever tags are appropriate for the data you are describing.

When defining an XML tag, you may include attributes that further describe the tag. In the previous example, the title of the book is supplied as an attribute to the book tag.

The body of an element is all the text, including any nested tags, enclosed by the start and end tags.

An element need not have any attributes or even any body.

Tag names must start with a letter or underscore and can contain any number of letters, numbers, hyphens, periods, or underscores, but they cannot include spaces.

All XML is case sensitive, and attributes must be quoted (both single and double quotes are accepted). The following are alternative forms for an element:

<tag>text</tag>
<tag attribute="text">text</tag>
<tag attribute="text"></tag>
<tag></tag>
<tag attribute="text"/>
<tag/>

The last two in this list show examples where the start and end have been combined. This is done simply to reduce clutter in the document.

Tags must nest. That is, an end tag must close the textually preceding start tag. For example,

<B><I>bold and italic</I></B>

The following is not well-formed XML:

<B><I>bold and italic</B></I>

To be well-formed XML, the </I> end tag must precede the </B> so the tags nest correctly.

The tags provide

  • Information about the meaning of the data

  • The relationships between different parts of the data

There must be exactly one top level element in an XML document, called the root element, which must enclose all the other elements in the document.

The following is a well-formed XML document:

<jobSummary>
  <job customer="winston" reference="Cigar Trimmer">
    <location>London</location>
    <description>Must like to talk and smoke</description>
    <skill>Cigar maker</skill>
    <skill>Critic</skill>
  </job>
</jobSummary>

The root element is <jobSummary...</jobSummary>. The <job> element has two attributes and enclosed elements.

Declarations

Declarations are used to provide information to the XML parser. They are of two forms. The first is a Processing Instruction and is enclosed in <? … ?>.

The following example tells the parser that the document has been written using XML version 1.0 and the UTF-8 character encoding:

<?xml version ="1.0" encoding="UTF-8"?>

The second form of declaration is an XML Document Type Declaration and is preceded enclosed in <! ... >.

Caution

Do not confuse a Document Type Declaration, which is the XML element containing declarations indicating the grammar that should be applied to validate an XML document, with the grammar itself, which is called a Document Type Definition (DTD). DTDs are explained later in this appendix.


<!DOCTYPE jobSummary SYSTEM "jobSummary.dtd">

Document Type Declarations are used to inform the parser of the correct structure of the XML document and to validate the XML. There is more information on the different type of document type declarations in section “Document Type Definition” later in this appendix.

If declarations appear in an XML document, they must precede the root element. This section is usually referred to as the prolog.

Comments

As well as elements and declarations, an XML document can contain comments that help to clarify the document content for human readers. Comments can be used anywhere within an XML document that a tag could appear. An example is as follows:

<!-- This is a really good book -->

Special Characters

The characters in Table C.1 have a special meaning in XML and, if required in the contents of an element, they must be replaced with the symbolic form.

Table C.1. Special XML Characters
Character Name Symbolic Form
& (ampersand) &amp;
< (open angle bracket) &lt;
> (close angle bracket) &gt;
' (single quotes) &apos;
" (double quotes) &quot;

Other special characters, such as non-printing characters, that may cause problems during processing, should be replaced by entities that give their decimal value. For example, ^A becomes &#01;.

If you are familiar with HTML, you will recognize the technique of replacing certain characters or including characters not found in standard character sets (such as ©) with a character entity that is either &name; or &#nnn (where nnn is a numeric representing the character). As an HTML user, you are also probably aware that browsers can interpret character entities differently. This means the character encoding you are familiar with may not conform to the standard. Refer to the W3C Web site to find a list of the character entities for the ISO-8859-1 (Unicode 2.0) character set. Only those character entities defined in the standard should be used in XML.

For data containing large amounts of special characters, you can use a CDATA section. This begins with the string <![CDATA[ and ends with ]] >. Any characters between the start and end of a CDATA section are not processed by the parser and are just treated as a text string.

Namespaces

Namespaces are used to scope tags within a document. The use of multiple namespaces allows different tags to have the same name but different meanings in a single XML document.

An attribute called xmlns (XML Name Space) is added to an element tag in a document and is used to define a namespace for the body of the element.

The following is a document with two namespaces:

<?xml version ="1.0"?>
<jobSummary xmlns:ad="ADAgency" xmlns:be="BEAgency">
							<ad:job customer="winston" reference="Cigar Trimmer">
    <ad:location>London</ad:location>
    <ad:description>Must like to talk and smoke</ad:description>
    <ad:skill>Cigar maker</ad:skill>
    <ad:skill>Critic</ad:skill>
  </ad:job>
  <be:job>
    a completely different form of the job element
  </be:job>
</jobSummary>
						

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
52.15.42.128