The lack of whitespace normalization with RELAX
NG’s
string
datatype may lead to some
surprises. When attributes are defined, the XML parsers must remove
the linefeeds and carriage returns they find there, which can lead to
surprises in processing.
Attribute whitespace normalization can be confusing in several ways.
Our previous schema specified that the attribute that must match
on
hold
always matches an
attribute in which the space between on
and
hold
is replaced by a linefeed as in:
<book id="b0836217462" available="on hold">
Attribute whitespace
normalization is normal behavior in XML 1.0. All XML parsers must
normalize an attribute’s value before reporting it
to other applications, producing on
hold
, in this case. No schema language can change
this. These issues can also make it difficult to create schemas that
include strings that incorporate whitespace. This RELAX NG XML syntax
schema requires new features in order to be translated to the compact
syntax:
<attribute name="available"> <choice> <value type="string">available</value> <value type="string">checked out</value> <value type="string">on hold</value> </choice> </attribute>
The compact syntax doesn’t permit new lines within quotes. To translate this into the compact syntax, we need to introduce a couple of new features to permit the inclusion of linefeeds in values.
The first way to include them is borrowed from Python. If instead of using single (') or double (“) quotes, you use three single (''') or three double (“"”) quotes, you can include nearly everything in your values, including new lines:
attribute available {string "available"|string "checked out"|string """on hold"""}
or:
attribute available {string "available"|string "checked out"|string '''on hold'''''}
The second way to allow new lines is through
escaping the newline character using
the syntax x{A}
(where A
is
the Unicode value of newline in hexadecimal):
attribute available {string "available"|string "on hold"|string "onx{A}hold"}
This pattern specifies that the attribute can contain a value with a linefeed, something that can happen in XML only if the newline in the attribute is explicitly specified through its numeric value, such as:
<book id="b0836217462" available="who
knows?">
These are unlikely cases, but now you know what to do if you encounter them.
3.15.141.206