Reading the Specification

For all the effort that went into developing the draft of the 1.0 Recommendation, not much effort went into explaining the Recommendation to the lay person. That really is not the fault of the working group. They created a recommendation that is very concise, clear, and well-defined, leaving very little room for interpretation.

This is the sign of a well-constructed recommendation. It is clearly and formally defined so that anyone implementing XML does so with the same understanding of how XML documents are created and defined. This is the very core of what makes a successful standard.

However, that is not very comforting as you sit down to look at the XML 1.0 Recommendation, and you can't make any sense of it. If you don't understand the specification, you cannot implement it correctly, nor can you come to understand it very easily.

In the next section, we will take a look at how the recommendation is written so that you can gain a better understanding of the vocabulary and, therefore, have an easier time following the spec.

Terms

Many terms are used in the XML Recommendation that have very specific meanings, as outlined by the recommendation's authors. The meaning of some of these terms will be obvious, but some might be interpreted slightly differently from what you would expect. So take a few minutes to familiarize yourself with the terminology used in the specification.

may— We warned you that some of these terms would be obvious. However, when referring to the recommendation, this means that XML data and applications “are permitted to but need not behave as described.”

must— This means that the data or applications are required to exhibit the specified behavior.

error— An error occurs when the rules of the recommendation are violated. The error is generated by a processing application, which can deal with the error appropriately.

fatal error— A fatal error is an error in which the application processing the XML data is required to generate an error and to cease parsing the XML document. This is reserved for errors that cause some kind of violation of the XML recommendation, serious enough to be corrected immediately before continuing.

validity constraint— A validity constraint is a rule in the recommendation applying to any XML document that is to be considered valid. A violation of a validity constraint must cause an error to be generated by validating XML processors.

well-formedness constraint— All XML must be well-formed. Therefore, an XML document that violates a well-formedness constraint must cause the XML processor to generate a fatal error.

at user option— This phrase means that any software conforming to the XML Recommendation needs to implement the ability to enable or disable this functionality as a user option.

case-folding— Case-folding is a process in which all characters in a character set are specified as “non-uppercase” into their uppercase equivalents. So, for example,

ThIs Is A sENtenCE witH MiXEd case. 

would become

THIS IS A SENTENCE WITH MIXED CASE. 

This is done to facilitate matching in character sets regardless of case sensitivity.

match— A match in the recommendation can mean one of several things. It can refer to a match of strings or names, which is not a case-sensitive match that occurs after the items being compared have been case-folded. For example,

<Element> matches <ELEMENT>.

A match can also refer to content and content models. In which case, a match occurs “if the content is Mixed and consists of character data and elements whose names match names in the content model, or if the content model matches the rule for elements, and the sequence of child elements belongs to the language generated by the regular expression in the content model.” Got it?

This is not quite as easy to follow as may and must, but let's break it down to see what it really means. Say that we have an element defined in our DTD called <STORY>, and <STORY> must contain <TITLE> and <AUTHOR>.

If we have some XML that looks like this,

<STORY> 
<TITLE>The Little Coder Who Could</TITLE>
<AUTHOR>Jim Causey</AUTHOR>
</STORY>

we have a match because our <STORY> element matches the requirements of our content model.

exact match— An exact match is simple: when two strings match and they are case sensitive. Thus,

<Element> does not match <ELEMENT>.

<ElEmEnT> does match <ElEmEnT>.

for compatibility— This term applies to XML features that are included solely for the purpose of remaining compatible with SGML.

for interoperability— This term applies to features that are recommended to help XML remain compatible with existing SGML processors. However, implementation of these features is not required to conform to the recommendation.

Notation

In addition to these terms, a number of special notations are also used within the XML specification. You might be familiar with some of these notations from other specifications or programming languages. However, understanding what type of information these notations describe is critical to correctly following the XML Recommendation.

At the heart of the grammar used for XML is a simple Extended Backus-Naur Form (EBNR) notation. It is really pretty straightforward:

symbol ::= expression 

This just means that a symbol is defined by an expression. For example, let's say that we were using this notation to define Pi as a literal value of 3.14. Our rule would look like this:

Pi ::= "3.14" 

Of course, this is an oversimplification, but it gets the point across.

The symbol portion of this rule can either begin with an uppercase letter, as in our example, which means that the symbol is defined by a simple regular expression. Or it can begin with a lowercase letter, which indicates that a recursive grammar is used to define the symbol. We'll take a look at some examples a little later.

The expression on the right side of a rule can consist of several different notations. These notations include

#xN 

In this notation, N is a hexadecimal integer, and the expression represents the corresponding character in the ISO/IEC 10646 specification.

[a-zA-Z], [#xN-#xN] 

These expressions represent any character with a value in the range specified, inclusive. So [l-pL-P] would include any letters from l to p, including the l and the p in both lower and uppercase.

[^a-z], [^#xN-#xN] 

These expressions represent any character with a value outside the range given. The caret symbol (^) means not, so [^a-q] would exclude any letter in the alphabet up to, and including q, which would make the set of characters defined consist of r through z.

[^abc], [^#xN#xN#xN] 

This expression represents any character except those given. So [^abcdefghijknopqrstuvwyz] would be a very long way to represent xml.

"string" or 'string' 

Any string within quotation marks represents that string, literally. So "XML" is "XML" is 'XML'.

a b 

This means that the symbol is defined as a followed by b.

a | b 

This means that the symbol can be defined by a or by b, but not both.

a - b 

This means that any strings are represented by a but not by b.

a? 

This means that the symbol might contain a, but it is not required to contain a.

a+ 

This means that the symbol must contain at least one a, but it might contain more.

a* 

This means that the symbol might contain zero as or more.

%a 

The % sign signifies that the represented data can be replaced by a parameter entity reference within the external Document Type Declaration.

(expression) 

This simply means that the entire expression contained within the parentheses is to be treated as one unit. Any of the notations that apply to a single unit can then be applied to the expression, such as the ?, +, or * suffix operators.

/* ... */ 

These are used to denote comments.

[WFC: ... ] 

The Well-Formedness Check (WFC) identifies any check for well-formedness associated with a rule.

[VC: ... ] 

The Validity Check (VC) identifies any check for validity that is associated with a rule.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.225.255.178