In plain English, the document, shown in Example 3-1 can be described as having:
One library
element composed of:
One or more book
elements having:
An id
attribute and an
available
attribute
An isbn
element composed of text
A title
element with an
xml:lang
attribute and a text node
One or more author
elements with:
An id
attribute
A name
element
An optional born
element
An optional died
element
Zero or more character
elements with:
An id
attribute
A name
element
An optional born
element
A qualification
element`
The good news—and what makes RELAX NG so easy to learn—is that in its simplest form, RELAX NG is pretty much a way to formalize the previous statements with simple matching rules. Terms described in the plain English description have matching terms in the RELAX NG Schema document that look a lot like XML:
A “library element” matches
<element name="library">...</element>
An “id attribute” matches
<attribute name="id"/>
“One or more” matches
<oneOrMore>...</oneOrMore>
“Zero or more” matches
<zeroOrMore>...</zeroOrMore>
“Text” matches
<text/>
“Optional” matches
<optional>...</optional>
You saw in Chapter 2 that almost every XML structure is a natural pattern for RELAX NG. Further, each RELAX NG element is a pattern; therefore, each RELAX NG pattern matches a structure from the XML document. Let’s now spend some time examining each basic pattern.
This pattern
is the simplest; it simply matches a text node. More precisely, it
matches zero or more text nodes. As you’ll see in
Chapter 6, the text
pattern may
also be used in the definition of mixed content models, elements that
may have both child elements and text nodes. For now, though, think
of text
as matching a text node.
Because attribute values contain text, the text
pattern can also match any attribute value. (The W3C XML Infoset
doesn’t consider attribute values to be nodes, but
RELAX NG does.)
The RELAX NG XML expression for text
patterns is
just:
<text/>
Not surprisingly, the
attribute
pattern matches attributes from an XML
instance document. The name of the attribute is defined in the
name
attribute of the attribute
pattern. The content of an attribute is defined as a child element of
the attribute
pattern.
To define the id
attribute, you can write:
<attribute name="id"> <text/> </attribute>
In this brief example, you can see how the definitions given earlier
apply here. The attribute’s name,
id
, is defined within the name
attribute. The content, text
, is in a child
element.
This example reads as: “an attribute named id with a
text value.” Since any attribute can have a value,
the text
pattern is assumed, so writing out
<text/>
is not required. Thus, the previous
definition is strictly equivalent to this shorter one:
<attribute name="id"/>
The last thing to know about the attribute
pattern
is that while attribute names are defined by the
name
attribute or the attribute
pattern, it is also possible to define sets of possible names for an
attribute. This feature is explained in detail in Chapter 12.
Just as the attribute
pattern matches attributes,
the element
pattern matches elements. To define the
name
element, write:
<element name="name"> <text/> </element>
Like the attribute
pattern, it is possible to
replace the name
attribute of the element
pattern with a set of names. This practice will be
explained in detail in Chapter 12.
Unlike attributes, not all elements accept text nodes. For that
reason, the text
pattern isn’t
implicitly assumed for elements. In fact, there is no implicit
content for elements. The content of each element must be explicitly
described, even if the description shows that the element is always
empty.
Because a text
pattern matches zero or more text
nodes, the previous definition of the name
element
also matches empty elements such as:
<name/>
as well as elements such as:
<name>Charles M Schulz</name>
There are additional ways to restrict text nodes. You’ll see in Chapter 7 how to add additional restrictions to text nodes to avoid empty elements if necessary. In Chapter 8, you’ll learn how to use the datatypes from W3C XML Schema to add more specific restrictions such as date or number requirements.
Attributes can be added within elements. To define the
title
element, write:
<element name="title"> <attribute name="xml:lang"/> <text/> </element>
You can see that an xml:lang
attribute has been
defined from the XML namespace. I will describe the support of
namespaces in Chapter 11, but here you can begin to
see how straightforward it is. The description of this attribute is
added by inserting xml:lang
as the name of the
attribute. Any xml
prefix has been predeclared to
refer to the XML namespace,
http://www.w3.org/XML/1998/namespace
. This means
that the previous address doesn’t need to be written
out. For other namespaces, however, you need to declare the namespace
using mechanisms described in Chapter 11.
Note that RELAX NG is clever enough to know that attributes are
always located in the start tag of XML elements and that the order in
which they are written isn’t considered significant.
This means that the attribute
pattern can be
located anywhere in the definition of elements. It
doesn’t make a difference if you write:
<element name="title"> <attribute name="xml:lang"/> <text/> </element>
as before or if you switch the order of the attributes like this:
<element name="title"> <text/> <attribute name="xml:lang"/> </element>
In addition to text nodes and attributes, elements can also include
child elements. You can define the author
element
this way:
<element name="author"> <attribute name="id"/> <element name="name"> <text/> </element> <element name="born"> <text/> </element> <element name="died"> <text/> </element> </element>
That’s not exactly the right definition, since we
want the born
and died
elements
to be optional. To make this happen, I need to introduce a new
pattern: the optional
pattern.
The optional
pattern makes its content just that,
optional; the element doesn’t have to be there. To
specify that the born
and died
elements are optional, write:
<optional> <element name="born"> <text/> </element> </optional> <optional> <element name="died"> <text/> </element> </optional>
Note that the markup and meaning are different from:
<optional> <element name="born"> <text/> </element> <element name="died"> <text/> </element> </optional>
And also different from:
<optional> <element name="born"> <text/> </element> <optional> <element name="died"> <text/> </element> </optional> </optional>
In the first case, each element is embedded in its own
optional
pattern. The two elements are thus
independently optional. I can include one, both, or none of them in
valid instance documents.
In the second case, both elements are embedded in the same
optional
pattern. Thus I can include either none
or both in instance documents.
In the third case, the first optional
pattern
includes the born
element and an optional
died
element. Both or none of them can be in an
instance document, but now there are more possibilities: the
born
element can be there alone, or the
born
element can be there with the
died
element, but the died
element can’t be there without the
born
element because of the way the elements are
nested.
None of these combinations is “right” or “wrong”; they are just different pattern combinations that allow different element combinations in the instance document. What’s nice about RELAX NG is that there are so few restrictions that almost any combination is allowed. Indeed, there are a few restrictions, but you don’t need to think about them until they’re covered in Chapter 15.
The oneOrMore
pattern specifies, as you might have
guessed, that its content may appear one or more times.
oneOrMore
specifies that a book must have one or
more authors:
<oneOrMore> <element name="author"> <attribute name="id"/> <element name="name"> <text/> </element> <element name="born"> <text/> </element> <optional> <element name="died"> <text/> </element> </optional> </element> </oneOrMore>
The last pattern needed in our example is
zeroOrMore
. You’ll have figured
out that it specifies its content to appear zero or more times. This
example shows the character
elements:
<zeroOrMore> <element name="character"> <attribute name="id"/> <element name="name"> <text/> </element> <optional> <element name="born"> <text/> </element> </optional> <element name="qualification"> <text/> </element> </element> </zeroOrMore>
18.118.37.214