Discussion of XML 1.0 Recommendation

Now that you have an idea of how to read some of the information used in the XML Recommendation to define the components of XML, let's take a look at some actual definitions that are found in the specification.

First, we will look at how the specification defines whitespace.

You are probably familiar with whitespace in many shapes and forms. In fact, there is whitespace in between each of the words in this sentence. Whitespace in XML is denoted in the specification by an S. Whitespace can also consist of several different types of input. For example, a space (created with your spacebar), a tab, and a carriage return all produce whitespace in XML.

So, here's a look at the formal declaration for whitespace:

S ::= (#x20 | #x9 | #xd | #xa)+ 

This defines our symbol, S to be equal to the expression contained in the parentheses. Inside the parentheses, we have a series of characters that are defined by their ISO/IEC 10646 Representation. The expression simply says that we can have a space, a carriage return, a line feed, or a tab.

Finally, the + suffix operator is used to apply to the whole expression, which means that whitespace might consist of “one or more occurrences” of each of the characters that are legal whitespace.

This is a pretty simple rule, but it illustrates many of the grammatical constructs that we have been looking at, which will come in handy when reviewing the XML specification.

Let's take a look at a few more rules from the specification, to get a feel for reading how they are constructed.

One of the most commonly used elements in XML is parsed character data (PCData). PCData is the information that is contained between your start and end tags in an element, and most of your document will likely take the form of PCData. Here is how PCData is defined according to the specification:

PCData ::= [^<&]* 

This is a very simple rule. It means that the PCData symbol is defined as being any character, except a less than (<) or an ampersand (&).

These two characters cannot be used in character data because the less than sign is used to denote the start of a new tag and the ampersand is used to denote the start of an entity. If these two symbols were used in your PCData, they might confuse a parser that assumed they were part of the markup. Remember that PCData is Parsed Character Data.

Now, here is how start and end tags are defined:

STag ::= '<' Name (S Attribute)* S? '>' 
ETag ::= "</" Name S? '>'

By now, you should have a pretty good idea how to read these two rules. So give it a try before you continue on with the explanations. You might be surprised at how something that once looked like gibberish is beginning to become clearer.

Okay, we will begin with the rule for the start tag:

STag ::= '<' Name (S Attribute)* S? '>' 

Here we begin with the symbol for the start tag STag. Now, the first thing that we have in a start tag is the less than sign, which is represented as a literal character, contained in single quotes. Next comes the Name of the tag (which is defined elsewhere in the spec).

Now, the next section,

(S Attribute)* 

is treated as a single expression because of the parentheses. The S represents a whitespace character, followed by an attribute. The * suffix operator on the expression indicates that there might or might not be an attribute in the tag. Finally, the tag is closed with a greater than symbol.

So our final construct for a start tag consists of a '<', a tag name, maybe an attribute, and then a final '>'. Thus, any of the following are valid start tags:

<START> 
<START What="Attribute">
<Title Author="me">

However, none of the following would be valid tags:

>START< 
<Start
Start>

The rule for constructing end tags is very similar:

ETag ::= "</" Name S? '>' 

The major differences are that end tags require the slash / character following the less than sign ('<'), and they do not allow for attributes.

Armed with the knowledge from this appendix, you should be able to make out much of the full XML 1.0 Recommendation. In addition, some other resources are available to help you understand the information contained in the XML Specification, including one of the best XML resources available on the Net: the Annotated XML Recommendation.

The Annotated XML Recommendation (http://www.xml.com/axml/testaxml.htm) is written by Tim Bray, one of the XML 1.0 Recommendation authors, and it is an excellent resource for XML beginners. If you have questions about the recommendation, consult this resource first. Chances are good that it will contain an answer for you.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.188.152.157