Chapter 4. XML Schemas

We have studied the concept of a Document Type Definition (DTD) in detail. We know that a DTD is used for validating the contents of an XML document. A DTD is undoubtedly an important feature of the XML technology. However, there are a number of areas in which DTDs are weak. The main argument against DTDs is that their syntax is not like that of XML documents. Therefore, the people working with DTDs have to learn a new syntax. Furthermore, this leads to problems such as we cannot search for information inside the DTDs, we cannot display their contents in the form of HTML, etc.

A schema is an alternative to DTD.

It is expected that schemas would eventually replace most (but not all) of the features of the DTDs. DTDs are easier to write and provide support for some features (e.g., entities) better. However, schemas are far richer in terms of their capabilities and extensibility. A schema document is a separate document, just like a DTD. However, the syntax of a schema is like the syntax of an XML document. Therefore, we can state:

The main difference between a DTD and a schema is that the syntax of a DTD is different from that of an XML. However, the syntax of a schema is the same as that of an XML.

In other words, a schema document is an XML document.

For example, we declare an element in a DTD by using the syntax <!ELEMENT>. This is clearly not legal in XML. We cannot begin an element declaration with an exclamation mark, as happens in the case of a DTD.

We can use a simple, yet powerful, example to illustrate the difference between using a DTD and using a schema. Suppose that we want to represent the marks of a student in an XML document. For this purpose, we want to add an element called as Marks to our root element Student. We will declare this element as of type PCDATA in our DTD file. This will ensure that the parser checks for the existence of the Marks element in the XML document. However, can it ensure that the marks are numeric? Clearly, no! We cannot control the contents the element Marks can have. These contents can be alphabetic or alphanumeric! This is shown in Figure 4.1.

Figure 4.1 Use of PCDATA does not control data type

As we can see, the usage of PCDATA in the declaration of an element does not stop us from entering alphabetic data in a Marks element. In other words, we cannot specify exactly what our elements should contain. This is clearly not desirable.

In the case of a schema, we can specify that our element should only contain numeric data. Moreover, we can control many other aspects of the contents of the elements; which is not possible in the case of DTDs. We use similar terminology for checking the correctness of the XML documents in the case of a schema (as in the case of DTDs). An XML document that conforms to the rules of a schema is called as a valid XML document. Otherwise, it is called as invalid.

It is interesting to note that we can associate a DTD as well as a schema with an XML document.

Let us now take a look at a simple schema. Consider an XML document that contains a greeting message. Let us write a corresponding schema for it. Figure 4.2 shows the details.

Figure 4.2 Example of an XML document and corresponding schema

We will notice several new syntactical details in the XML document and the schema file. Let us, therefore, understand this, step-by-step.

First and foremost, an XML schema is defined in a separate file. This file has the extension xsd. In our example, the schema file is named message.xsd.

The following declaration in our XML document indicates that we want to associate this schema with our XML document:

Let us dissect this statement.

The word MESSAGE indicates the root element of our XML document. There is nothing unusual about it.
The declaration xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" is an attribute. It defines a namespace prefix and a namespace URI. The namespace prefix is xmlns. The namespace URI is http://www.w3.org/2001/XMLSchema-instance. The namespace prefix can change. The namespace URI must be written exactly as shown. The namespace URI specifies a particular instance of the schema specifications to which our XML document is adhering.
The declaration xsi:noNamespaceSchemaLocation="message.xsd" specifies a particular schema file which we want to associate with our XML document. In this case, we are stating that our XML document wants to refer to a schema file whose name is message.xsd.

This is followed by the actual contents of our XML document. In this case, the contents are nothing but the contents of our root element.

These explanations are depicted in Figure 4.3.

Figure 4.3 Understanding our XML document

It is now time to understand our schema (i.e., message.xsd).

Note that the schema file is an XML file with an extension of xsd. That is, like any XML document, it begins with an <?xml …?> declaration.

The following lines specify that this is a schema file, and not an ordinary XML document. They also contain the actual contents of the schema. Let us first reproduce them:

<xsd:schema xmlns:xsd = “http://www.w3org/2001/XMLSchema”>

<xsd:element name = "MESSAGE" type = “xsd:string”/>

</xsd:schema>

Let us understand this, step-by-step.

The declaration <xsd:schema xmlns:xsd = “http://www.w3org/2001/XMLSchema”> indicates that this is a schema, because its root element is named schema. It has a namespace prefix of xsd. The namespace URI is http://www.w3org/2001/XMLSchema. This means that our schema declarations conform to the schema standards specified on the site http://www.w3org/2001/XMLSchema, and that we can use a namespace prefix of xsd to refer to them in our schema file.
The declaration <xsd:element name = "MESSAGE" type = “xsd:string”/> specifies that we want to use an element called as MESSAGE in our XML document. The type of this element is string. Also, we are using the namespace prefix xsd. Recall that this namespace prefix was associated with a namespace URI http://www.w3org/2001/XMLSchema in our earlier statement.
The line </xsd:schema> specifies the end of the schema.

These explanations are depicted in Figure 4.4.

Figure 4.4 Understanding our XML schema

Based on this discussion, let us have a small exercise.

Exercise 1

Write an XML document that contains a single element to specify the name of the student. Provide a corresponding XML schema.

Solution

XML document (student.xml)

<?xml version = “1.0” ?>

<STUDENT xmlns:xsi="http://www.3.org/2001/XMLSchema-instance"

xsi:noNamespaceSchemaLocation="student.xsd">

S Ramachandran

</STUDENT>

XML schema (student.xsd)

<?xml version = “1.0” ?>

<xsd:schema xmlns:xsd = “http://www.w3org/2001/XMLSchema”>

<xsd:element name = “STUDENT” type = “xsd:string”/>

</xsd:schema>

Exercise 2

Write the same XML document, but this time use a DTD.

Solution

XML document (student.xml)

<?xml version="1.0"?>

<!DOCTYPE STUDENT SYSTEM “student.dtd”>

S Ramachandran

</STUDENT>

DTD (student.dtd)

<!ELEMENT STUDENT (#PCDATA)>

4.2 COMPLEX TYPES

4.2.1 Basics of Simple and Complex Types

Elements in schema can be divided into two categories: simple and complex. This is shown in Figure 4.5.

Figure 4.5 Classification of elements in XML schemas

Let us understand the difference between the two types.

Simple elements: Simple elements can contain only text. They cannot have sub-elements or attributes. The text that they can contain, however, can be of various data types such as strings, numbers, dates, etc.
Complex elements: Complex elements, on the other hand, can contain sub-elements, attributes, etc. Many times, they are made up of one or more simple element. This is shown in Figure 4.6.

Figure 4.6 Complex element is made up of simple elements

Let us now consider an example. Suppose we want to capture student information in the form of the student's roll number, name, marks, and result. We can have all these individual blocks of information as simple elements. Then, we will have a complex element in the form of the root element. This complex element will encapsulate these individual simple elements. Figure 4.7 shows the resulting XML document, first.

Figure 4.7 XML document for Student example

Let us now immediately take a look at the corresponding schema file. Figure 4.8 shows this.

Figure 4.8 Schema for Student example

Let us understand our schema.

<xsd:schema xmlns:xsd = “http://www.w3org/2001/XMLSchema”>
We know that the root element of the schema is a reserved keyword called as schema. Here too, it is the same. The namespace prefix xsd maps to the namespace URI http://www.w3.org/2001/XMLSchema, as earlier. In general, this will be true for any schema that we write.
<xsd:element name = "STUDENT" type = “StudentType”/>
This declares STUDENT as the root element of our XML document. In the schema, it is called as the top-level element. Remember that in the case of a schema, the root element is always the keyword schema. Therefore, the root element in an XML document is not the root of the corresponding schema. Instead, it appears in the schema after the root element schema.

The STUDENT element is declared of type StudentType. This is a user-defined type.

Conceptually, a user-defined type is similar to a structure in C/C++ or a class in Java (without the methods). It allows us to create our own custom types.

In other words, the schema specification allows us to create our own custom data types. For example, we can create our own types for storing information about employees, departments, songs, friends, sports games, and so on. We recognise this as a user-defined type because it does not have our namespace prefix xsd. Remember that all the standard data types provided by the XML schema specifications reside at the namespace http://www.w3.org/2001/XMLSchema, which we have prefixed as xsd in the earlier statement.
<xsd:complexType name = “StudentType”>
Now that we have declared our own type, we must explain what it represents and contains. That is exactly what we are doing here. This statement indicates that we have used StudentType as a type earlier, and now we want to explain what it means. Also, note that we use a keyword complexType to designate that StudentType is a complex element. This is similar to stating struct StudentType or class StudentType in C++/Java.
<xsd:sequence>
Schemas allow us to force a sequence of simple elements within a complex element. We can specify that a particular complex element must contain one or more simple element in a strict sequence. Thus, if the complex element is A, containing two simple elements B and C, we can mandate that C must follow B inside A. In other words, the XML document must have:

    <A>

         <B> … </B>

         <C>… </C>

    </A>

This is accomplished by the sequence keyword.
<xsd:element name = “ROLL_NUMBER” type = “xsd:string”/>

This declaration specifies that the first simple element inside our complex element is ROLL_NUMBER, of type string. After this, we have NAME, MARKS, and RESULT as three more simple elements following ROLL_NUMBER. We will not discuss them. We will simply observe for now that ROLL_NUMBER has a different data type: an integer. We will discuss this in detail subsequently.

We will also not discuss the closure of the sequence, ComplexType, and schema tags.

Let us have a small exercise to build on these concepts.

Exercise 1

Write an XML document and a corresponding XML schema for maintaining the employee number, name, designation, and salary.

Solution

XML document (employee.xml)

<?xml version = “1.0” ?>

<EMPLOYEExmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:noNamespaceSchemaLocation="employee.xsd">

<EMP_NO> 9662 </EMP_NO>

<EMP_NAME> Atul Kahate </EMP_NAME>

<DESIGNATION> Consultant </DESIGNATION>

</EMPLOYEE>

XML schema (employee.xsd)

<?xml version = “1.0”?>

<xsd:schema xmlns:xsd = “http://www.w3org/2001/XMLSchema”>

<xsd:element name = “EMPLOYEE” type = “EmpType”/>

<xsd:complexType name = “EmpType”>

<xsd:sequence>

<xsd:element name = “EMP_NUMBER” type = “xsd:integer”/>

<xsd:element name = “EMP_NAME” type = “xsd:string”/>

<xsd:element name = “DESIGNATION” type = “xsd:integer”/>

<xsd:element name = “SALARY” type = “xsd:string”/>

</xsd:sequence>

</xsd:complexType>

</xsd:schema>

4.2.2 Specifying the Frequency: minOccurs and maxOccurs

Let us consider that we want to represent information about a book. The XML document depicting this information along with its corresponding schema is shown in Figure 4.9.

Figure 4.9 Book XML and schema

There is no problem with this example. However, now imagine a situation where we want to provide support for a book XML document that can have multiple authors. Would the same schema serve the purpose? Figure 4.10 shows this situation. Please note that the schema declaration in this figure is incorrect. We have shown it merely to explain the problem.

Figure 4.10 Book XML and incorrect schema

We now have two authors in the XML document. However, the corresponding schema talks about the author element only once. This is not legal in XML. We must use either or both of the minOccurs and maxOccurs attributes in such situations.

The minOccurs attribute specifies the minimum number of occurrences that an element can have. On the other hand, the maxOccurs attribute specifies the maximum number of occurrences.

Our requirement is to have two authors in this case. Therefore, we only require maxOccurs with a value of 2. Therefore, the declaration of the AUTHOR element in our schema would change to the following:

<xsd:element name = “AUTHOR” type = “xsd:string”

maxOccurs = “2”/>

Nothing else needs to change. The above declaration specifies that we can at the most have two authors.

The default value of both minOccurs and maxOccurs is 1. Therefore, if we do not specify either of them, the element is deemed to occur exactly once.

The above declaration is equivalent to the following:

<xsd:element name = “AUTHOR” type = “xsd:string”

minOccurs = “1” maxOccurs = “2”/>

There is a specific value called as unbounded, which means infinite occurrences. Whenever we wish to specify that the upper limit for an element occurrence is infinite (that is, there is no upper limit), we can specify it as unbounded. For example, if our book can have a minimum of one author or an infinite number of authors, our declaration would change to:

<xsd:element name = “AUTHOR” type = “xsd:string”

minOccurs = “1” maxOccurs = “unbounded”/>

Based on our requirements, we can set minOccurs and maxOccurs attributes to various values. These are summarised in Table 4.1.

Requirement	Set minOccurs to	Set maxOccurs to
An element should occur exactly once	1	1
An element should occur at least once and possibly many more times	1	Unbounded
An element is optional (may not occur at all), or may occur for any number of times	0	Unbounded
An element may not occur at all, or may occur only once	0	1

Table 4.1 Usage of minOccurs and maxOccurs

We would realize that minOccurs and maxOccurs are similar to, but far more effective than the ?, *, and + symbols of the DTDs. The minOccurs and maxOccurs are not only easier to read and understand, but they also provide a lot more accurate precision. For example, suppose that one manager can manage any number of employees, from eight to 20. Then we can specify minOccurs as eight and maxOccurs as 20. There is no such accurate precision available in the case of DTD declarations.

4.2.3 Specifying Element Content

Based on the concepts learnt so far, let us start making use of some of the key features. We have seen examples of simple types being a part of a complex type. However, we have not yet seen an example where the simple types themselves can become complex types. That is, we have not gone beyond one level of depth in the hierarchy of child elements. Let us do that now.

Consider that we have an employee working in an organisation. At any given time, the employee works in one or more projects, and has zero or more subordinates who work for her. The employee also has her own characteristics, such as name, designation, and salary. There are various ways in which this information can be represented in XML. We would use the version shown in Figure 4.11.

Figure 4.11 Sample XML (emp.xml)

Let us now write a schema for this XML document. We can see that we have an employee possibly managing multiple projects and multiple subordinates. Also, the employee has her own personal characteristics in terms of designation and salary.

Based on this description, one possible schema for this XML document is shown in Figure 4.12.

Figure 4.12 Corresponding schema (emp.xsd)

We will notice that the elements PROJECT and SUBORDINATE contain a sub-element each (named NAME). Therefore, they are not simple elements. Instead, we must declare them as complex elements in our schema. Because we declare them as complex, they automatically have to be a user-defined type. That is exactly what has happened in this case as well. We have defined these two as complex types, which we have later used in other elements.

To reinforce our understanding, let us discuss another example. Imagine that we have an Internet shopping site. People can browse our site and decide to buy goods online. We want to capture the result of placing one such order inside an XML document. Therefore, our XML document needs to store information about who is placing this order, what is the address at which the goods need to be delivered (i.e., shipping information), and the actual contents of the order (i.e., the goods ordered).

Therefore, the information we want to capture is something like the one shown in Figure 4.13.

Figure 4.13 Example of information we want to capture for processing

We represent this in another format as shown in Figure 4.14.

Figure 4.14 Visual representation of our intended XML contents

Let us now actually create an order placed by a customer, to see how this looks like. Figure 4.15 shows the resulting XML document.

Figure 4.15 XML document containing order details

Let us note some salient points about our XML document.

The root element is called as <shiporder>. It has an attribute named <orderid>.
One order can have multiple <item>s.
The <note> sub-element of the <item> element seems to be optional.

Based on this understanding, we can note a few points about the design of our schema, as follows.

The root element <shiporder> should be a complex element, containing sub-elements named <orderperson>, <shipto>, and <item>. It also has an attribute named <orderid>.
The <shipto> element itself is a complex element, consisting of <name>, <address>, <city>, and <country>.
The <item> element itself is a complex element, consisting of <title>, <note>, <quantity>, and <price>. The <note> element is optional (i.e., it may occur).

Accordingly, our XML schema looks as shown in Figure 4.16.

Figure 4.16 XML schema describing order details

There are a couple of things that we have used here, but have not described them earlier. For instance, we have used a decimal data type. We will cover such things at the appropriate time.

4.2.4 Content Model Reuse

Observant readers would have realised that there is some unwanted duplication in our earlier schema code for employees. The schema talks about projects and subordinates as two separate complex elements. However, these two are really similar. Both contain a sub-element name of type string. Rather than declaring the types for these two elements separately, can we not combine this declaration? The answer is yes, and that is what we mean by content model sharing or content model reuse.

The idea is simple. Rather than declaring ProjectType and SubordinateType, we would declare a single type, let us say NameType (for want of a better name!) and use it for both projects and subordinates. This is shown in Figure 4.17.

Figure 4.17 Content model reuse

Note that now both PROJECT and SUBORDINATE have the type specified as NameType. They do not have their own separate types.

Content model reuse helps us in abstracting the common features of data types, and in using them in an efficient manner to deal with unnecessary duplication.

4.2.5 Anonymous Types

In some situations, we want to put restrictions on the usage of certain user types. We may wish to mandate that a user-defined type only be used in a particular context (say inside a particular element only).

For example, let us again consider NameType, discussed in the earlier section. We know that NameType is meant to contain an element called as NAME. However, let us imagine that we want to split this NAME into two sub-elements: FIRST_NAME and LAST_NAME, only for subordinates. We obviously do not have such a split in the case of projects. However, because of the content model reuse, we would be forced to do so! This is shown in Figure 4.18.

Figure 4.18 Problem in sharing content models

As we can see, a subordinate should have a first name and a last name but a project will not have first and last name. But because NameType is defined that way, we must use it as defined! We cannot stop the misuse of sharing content models in such situations.

In other words, content model sharing can also bring its own set of problems. This is because our intention of abstracting common features and reusing them can get misused in such situations.

The solution to such problems is the usage of anonymous types. In our particular example, we can do the following changes:

Define a new complex type called as PersonType. This will contain our NameType. Split NameType into two sub-elements, called as FIRST_NAME and LAST_NAME as earlier.
Later, use PersonType in the case of the SUBORDINATE element, and other elements that may require the name to be split into the first and the last names.
Use a simple string (or a completely different user type) in the case of the PROJECT element and other elements that may require no split of the name into the first and the last names.

Therefore, our diagram would now change as shown in Figure 4.19.

Figure 4.19 Anonymous types

Note that NameType is now an anonymous type, because it is declared inside the PersonType. We cannot use NameType as a type anywhere (e.g., in PROJECT or in SUBORDINATE). It has no existence outside of PersonType. Therefore, it is anonymous.

4.2.6 Mixed Content

Sometimes, we want to allow text between elements. For example, suppose that we want to capture information about employee names. Then, we can think of the title, first name, middle name, and last name. Of these, we may mandate that the first name and the last name should be mandatory, whereas the other two are optional. As we know, we can define this in the form of an XML schema as shown in Figure 4.20.

Figure 4.20 Schema for capturing employee information

We can represent the same information by using mixed content. To do this, we need to add an attribute mixed with value true. When we do so, we can just keep the mandatory information as elements, and remove other elements that may have a corresponding value in the XML document. For example, here, we can remove the TITLE and MIDDLE_NAME elements from our schema, and instead declare the EMPLOYEE element to allow for mixed content. When we do so, the EMPLOYEE element must contain values for the first name and the last name, and can optionally have values for the title and the middle name.

The modified schema definition is shown in Figure 4.21.

Figure 4.21 Schema for capturing employee information (emp1.xsd)

The corresponding XML document is shown in Figure 4.22.

Figure 4.22 XML document for capturing employee information (emp1.xml)

We can now alter this definition by using mixed content. The modified schema definition is shown in Figure 4.23.

Figure 4.23 Schema with mixed content (emp2.xsd)

Note that we have dropped the TITLE and the MIDDLE_NAME elements. Also, we have added an attribute mixed with a value of true for the EmpType complex type. As a result, we cannot use the TITLE or the MIDDLE_NAME elements in our XML document. However, we can still specify the values for the title and the middle name elements as placeholders as shown in Figure 4.24. This is what mixed content allows us to do.

Figure 4.24 XML document with mixed content (emp2.xml)

4.3 GROUPING OF DATA

Thus far, we have been dealing with schemas that mandate a specific order of elements inside an XML document. That is, we have seen cases where element A must follow element B inside a document. In reality, this is always not the case. At times, we just want to make sure that an element exists inside an XML document – where, is not so important. This is where grouping of elements comes into picture.

The schema syntax provides support for three grouping constructs that also govern the sequence of elements inside an XML document. These three constructs are:

The xsd:all grouping specifies that all the elements in a group must occur at the most once, but their ordering is not significant.

The xsd:choice grouping allows us to specify that only one element from the group can appear. Alternatively, we can also specify that out of n elements in a group, m should appear in any order.

The xsd:sequence grouping mandates that every element in a group must appear exactly once, and also in the same order in which the elements are listed.

Let us discuss these now.

4.3.1 Mandating All Elements

When we use xsd:all, we mean that an element may occur. If it occurs, it must occur only once. The order of elements is not significant.

Consider the example shown in Figure 4.25.

Figure 4.25 Usage of all

In our example, the complex type NAME contains elements FIRST_NAME and LAST_NAME. Both FIRST_NAME and LAST_NAME must occur exactly once. Their order is not significant. This is because they are contained inside the <xsd:all> tag.

We must mention that we can also specify a value of zero for minOccurs or maxOccurs. That is, we can allow an element to not occur at all. In that sense, all is a misnomer. For instance, we can change the declaration of FIRST_NAME to the following:

<xsd:element name = “FIRST_NAME” type = “xsd:string”

minOccurs = “0” maxOccurs = “1”/>

We now allow FIRST_NAME to be missing from the XML document. This is perfectly all right.

We also need to note that we cannot specify an arbitrary number of occurrences in the case of all. That is, both minOccurs and maxOccurs can have only a value of zero or one. We cannot, for instance, say minOccurs = 2 and maxOccurs = 4. This is illegal.

Exercise

Write an XML schema and show the corresponding XML document for the following: It should contain information about a credit card so that the credit card can be validated.

Solution

XML schema (card.xsd)

<?xml version="1.0"?>

<xsd:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xsd:element name="CreditCard">

<xsd:complexType>

<xsd:all>

<xsd:element name="CardType" type="xsd:string"/>

<xsd:element name="CardNumber" type="xsd:string"/>

<xsd:element name="CardHolder" type="xsd:string"/>

<xsd:element name="CardValidTill" type="xsd:string"/>

</xsd:all>

</xsd:complexType>

</xsd:element>

</xsd:schema>

XML document (card.xml)

<?xml version = “1.0”?>

<CreditCard xmlns:xsi = “http://www.w3.org/2001/XMLSchema-instance”

xsi:noNamespaceSchemaLocation = “card.xsd”>

<CardHolder>Sonia Kapoor </CardHolder>

</CreditCard>

Note: The order of elements in the XML document is different from the once specified in the schema. As we have mentioned earlier, this is allowed in the case of all.

4.3.2 Making Choices

We know that in the case of a DTD, we can use the pipe (|) symbol to signify selection. In schema, the corresponding functionality is achieved by using the xsd:choice syntax. When we embed more than one element inside a choice boundary, exactly one of them must occur in the XML document.

For example, suppose that we want to store the information about the result of examination as Pass or Fail along with the percentage of marks obtained. Clearly, only one of these should be allowed. We can make use of the choice element, as shown in Figure 4.26.

Figure 4.26 Using the choice syntax

Of course, usually, we will not store just the result alone. It would be in the context of (i.e., a part of) some element, such as Student. For now, we have ignored that possibility. But, we can easily modify our XML schema and document to reflect this. Figure 4.27 illustrates the modified schema and the XML document.

Figure 4.27 A complete example using choice

Here is an exercise to understand this further.

Exercise

Write an XML schema and show the corresponding XML document for storing information about lunch. It should consist of a starter, a main course, and a dessert. There should be options in each of the categories.

Solution

XML schema (lunch.xsd)

<?xml version="1.0" encoding="UTF-8"?>

<xsd:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xsd:element name="Lunch">

<xsd:complexType>

<xsd:sequence>

<xsd:element name="Starter">

<xsd:complexType>

<xsd:choice>

<xsd:element name="Soup" type="xsd:string"/>

<xsd:element name="Juice" type="xsd:string"/>

</xsd:choice>

</xsd:complexType>

</xsd:element>

<xsd:element name="MainCourse">

<xsd:complexType>

<xsd:choice>

<xsd:element name="VegLunch" type="xsd:string"/>

<xsd:element name="NonVegLunch" type="xsd:string"/>

</xsd:choice>

</xsd:complexType>

</xsd:element>

<xsd:element name="Dessert">

<xsd:complexType>

<xsd:choice>

<xsd:element name="IceCream" type="xsd:string"/>

<xsd:element name="FruitSalad" type="xsd:string"/>

</xsd:choice>

</xsd:complexType>

</xsd:element>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

</xsd:schema>

XML document (lunch.xml)

<?xml version="1.0"?>

<Lunch xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

xsi:noNamespaceSchemaLocation="lunch.xsd">

<Juice>Apple</Juice>

</Starter>

<VegLunch>Thali</VegLunch>

</MainCourse>

<IceCream>Vanilla</IceCream>

</Dessert>

</Lunch>

4.3.3 Sequences

An xsd:sequence element allows us to group a set of sub-elements. These sub-elements must appear in the same sequence in the XML document, as declared in the schema. We have seen many examples of this earlier, while declaring complex elements. We can add the minOccurs and maxOccurs attributes either to the individual sub-elements or to the main grouping element to control the number of occurrences of the individual sub-elements, or that of the group.

Let us consider an example. Consider that we want to maintain information about the batting of a team in a cricket match. We know that there can be at the most 11 batsmen. For every batsman, we will maintain details such as his name, details of dismissal (how out, fielder, bowler), and the number of runs scored. We will maintain this for two innings. A typical entry for one player would be as follows:

Sachin Tendulkar

Innings 1: c Inzamam-ul-Haq b Shoaib Akhtar 103

Innings 2: not out 201

Let us design a schema to maintain this sort of information. Remember that at the most 11 batsmen can bat. Every batsman can have up to two innings.

Figure 4.28 illustrates the resulting schema.

Figure 4.28 Schema for representing batting details for a team in a cricket match

An XML document corresponding to this schema (with data for one batsman) is shown in Figure 4.29.

Figure 4.29 XML document containing batting details for a team in a cricket match

We will not discuss sequences further, since we have covered them in detail.

4.4 SIMPLE TYPES

So far, we have focused on the structure of an XML document with reference to schemas. We have not discussed the possibilities that exist with individual elements. For example, when we speak about the marks of a student, we generally believe that this would be a positive integer of up to three digits with a maximum value of 100. When we talk about a year, it is a four digit positive integer with some sensible value. Many such examples can be given.

In the case of DTDs, there is no way to specify this fine-grained detail about elements and their data types. In contrast, schemas allow us to provide many more details about such things.

XML schemas offer 44 built-in simple types.

We can roughly classify the XML schema simple types into seven categories, as shown in Figure 4.30.

Figure 4.30 Simple types as per schema specifications

4.4.1 Numeric Data Types

We widely use numbers. The XML schema specifications provide support for a wide range of numeric data types. In practice, we make use of only a few of these. However, for the sake of completeness, Table 4.2 summarises the various numeric data types.

Data type name	Meaning	Examples
xsd:float	32-bit, same as Java's float data type	0, 12345.6789
xsd:double	64-bit, same as Java's double data type	0, 45.89E-2, 123456789.56789
xsd:decimal	Arbitrary precision, same as java.math.BigDecimal	87200.29, -3.1415292
xsd:integer	Arbitrarily large or small integer, same as java.math.BigInteger	-7890000000000000, 723712637236839210123
xsd:nonPositive Integer	Integer less than or equal to 0	0, -1, -2, -3, …
xsd:negative Integer	Integer less than 0	-1, -2, -3, …
xsd:nonNegative Integer	Integer greater than or equal to 0	0, 1, 2, 3, …
xsd:positive Integer	Integer greater than 0	1, 2, 3, …
xsd:long	8-byte, 2's complement integer, similar to Java's long data type	-5612367128398213, 0, -19, 15, 915402742
xsd:int	4-byte, 2's complement integer, similar to Java's int data type	-615251, 0, 15310012
xsd:short	2-byte, 2's complement integer, similar to Java's short data type	-32767 to +32768
xsd:byte	1-byte, 2's complement integer, similar to Java's byte data type	-128 to + 127
xsd:unsignedLong	8-byte unsigned integer	0 to 18446744073709551615
xsd:unsignedInt	4-byte unsigned integer	0 to 4294967295
xsd:unsignedShort	2-byte unsigned integer	0 to 65535
xsd:unsignedByte	1-byte unsigned integer	0 to 255

Table 4.2 Simple numeric data types supported by XML schema

4.4.2 Time Data Types

In this section, we shall discuss the various date and time-related data types. These data types are quite common in database products. These data types are used to represent a variety of date and time formats. A generic rule is that wherever applicable, the formats contain year, followed by month, followed by day, followed by hours, and so on. Note that “g” means Gregorian.

Table 4.3 lists the various time data types.

Data type name	Meaning	Examples
xsd:dateTime	Date and time in the format YYYY-MM-DDTHH:MM:SS	2006-01-20T06:05:00
xsd:date	A specific date in the YYYY-MM-DD format	2006-01-20, 1973-04-07
xsd:time	A specific time of day in the HH:MM:SS format	06:05:00, 17:30:00
xsd:gDay	A day in a month	--01, --02, …, --31
xsd:gMonth	A month in a year	--01--, --02--, …, --12--
xsd:gYear	A year	2006, 1973
xsd:gYearMonth	A specific month in a specific year	2006-01, 1973-04
xsd:gMonthDay	A date without year	--01-20, --04-07
xsd:duration	Length of time in format	P2006Y01M20DT06H11M00S

Table 4.3 Simple time data types supported by XML schema

4.4.3 XML Data Types

We have discussed many XML data types in DTDs. XML data types in schemas are quite similar to those in the DTDs. However, there are also four additional data types in schemas under this category, as compared to DTDs.

Table 4.4 depicts the XML data types.

Data type name	Meaning	Examples
xsd:ID	A unique value for an element or an attribute	T1, M90, G101-100-Y6, Nine
xsd:IDREF	Value of another ID type defined elsewhere in the document	T1, M90, G101-100-Y6, Nine
xsd:ENTITY	An XML name, declared as an unparsed entity in a DTD	Bips, Graph10, PICTURE5
xsd:NOTATION	Usually indicates a file format	PDF, TIF, GIF, JPEG
xsd:IDREFS	Reference to a list of ID names	T1, M90, G101-100-Y6, Nine
xsd:ENTITIES	List of ENTITY names	Bips, Graph10, PICTURE5
xsd:NMTOKEN	NMTOKEN type	67 how are you 1910
xsd:NMTOKENS	A list of NMTOKEN types	67 how are you 1910
xsd:language	Language name from a list of valid values	En, en – US, en – GB, fr, ara
xsd:Name	XML name, with or without colons	Student, employee, Team:Player
xsd:QName	Prefixed name	xsd:element, Team:Player
xsd:NCName	Local name without colons	Student, employee, player, salary

Table 4.4 XML data types supported by XML schema

4.4.4 String Data Types

We have extensively used the xsd:string data type. It allows for any string value for any length. The internal representation of these strings is in Unicode. Apart from this, there are two more string data types, as shown in Table 4.5.

Data type name	Meaning	Examples
xsd:string	A Unicode character-based string of any length	Sachin Tendulkar is the best!, Amitabh Bachchan needs to be saluted, Mahatma Gandhi was the Father of the Nation, Hi there!, Your password is protected as ******, 2006
xsd:normalizedString	A string in which all the carriage returns, linefeeds, and tabs are replaced with a single blank (space) character	Hello XML, This is news to me!, red pepper
xsd:token	Same as above, but in addition, all leading and tailing spaces are trimmed and consecutive spaces are converted into a single space	Bips, Graph10, PICTURE5

Table 4.5 String data types supported by XML schema

4.4.5 Binary Data Types

Usually, XML is meant to carry text. However, at times, it must also support binary data. The problem with binary data is that it can have byte patterns that are illegal. This is because some characters (i.e., byte patterns) such as null have a different meaning, and they cannot be a part of the XML content. Therefore, mechanisms are needed to encode such illegal characters into a legal form before such characters can be considered as a part of the XML document. Two standards for doing this are prevalent: hexadecimal conversion, and base-64 encoding. XML schemas support both of these. A detailed explanation of these standards is outside of the scope of this text. Nevertheless, we will provide a crisp overview.

In hexadecimal conversion, every byte of the input is mapped to two hexadecimal bytes. Therefore, the size of the file effectively doubles. The hexadecimal bytes content can have a value between 00 to FF. A sample portion of such content is as follows:

56AF679181201267123EEBA8923CD90909D90D
In base-64 encoding, the input is read as a series of 24-bit blocks, and transformed into a 32-bit block. A mapping table is used for this purpose. Therefore, the size of the file effectively increases by about 25 per cent.

4.4.6 Other Data Types

The other two data types are Boolean and URI.

The xsd:boolean data type is similar to the Boolean data type in Java, or the bool data type in C++. It allows one of the four possible values: zero, one, true, and false. Zero and false mean the same thing; one and true mean the same thing.
The xsd;anyURI data type allows us to specify a URI. For example, we can have http://www.test.com/name.html.

4.5 DERIVING TYPES

Deriving the custom simple types from the basic simple types is a powerful feature of XML schemas. We can use this feature to come up with data types that are specific to our application, but are unlikely to be available as basic simple data types.

For example, let us imagine that we want to store the publication year of all the books in a library, and we do not wish to register books that were published before 1970. In that case, we can use a derived type that restricts the set of legal values to a minimum of 1970.

There are three techniques for deriving types, as shown in Figure 4.31.

Figure 4.31 Deriving new simple types in XML

Let us discuss these now.

4.5.1 Deriving by Restriction

Restriction allows us to select a subset of values allowed by the base type.

We can use an element of type xsd:restriction as a child element of an xsd:simpleType element, for creating a new type based on a simple existing type. The base attribute specifies how the restriction applies.

Figure 4.32 shows an example. Here, we have simply specified that the publishing year of a book has a type of xsd:gYear.

Figure 4.32 Example of restriction

Thus, we have restricted the allowed values for the publishingYear element to that of a year data type. Of course, this is not the only thing we can do. We can enhance this restriction further, by also defining a range of allowed years. For this purpose, we need to make use of facets.

4.5.2 Facets

A facet allows us to specify more restrictions than what a basic type allows.

For example, to restrict the publishing year so that the books must be published in or after 1970, we can use a facet called as minInclusive. The minInclusive facet specifies the minimum value that an element can have. The resulting restriction is shown in Figure 4.33.

Figure 4.33 Example of restriction

Now, 1970, 1971… 2006 are all examples of a legal book publishing year. But a year less than these is illegal. For example, 1969 is not an allowed value.

Now, publishingYear is itself a type. That is, it can be used as a type to define another element.

Like minInclusive, there are a number of other facets. Table 4.6 summarises them.

Facet	Description
xsd:minInclusive	The minimum value that all the instances of this type must be greater than or equal to
xsd:maxInclusive	The maximum value that all the instances of this type must be less than or equal to
xsd:minExclusive	The minimum value that all the instances of this type must be greater than
xsd:maxExclusive	The maximum value that all the instances of this type must be less than
xsd:enumeration	A list of allowed values
xsd:whiteSpace	How white space is treated in this element
xsd:pattern	A pattern with which the contents of the element are compared
xsd:length	The length of a string, items in a list, or bytes in binary data
xsd:minLength	The minimum length
xsd:maxLength	The maximum length
xsd:totalDigits	The maximum number of digits allowed in the element
xsd:fractionDigits	The maximum number of digits allowed in the fractional part of the element

Table 4.6 List of facets

Of course, not all facets make sense for all data types. It is meaningless, for instance, to apply a fractionDigits facet to an integer data type – integers cannot simply have a fractional part! Therefore, the above table needs to be used carefully in conjunction with the appropriate data types.

In order to better understand the use and applicability of facets, we will now discuss them in the suitable matching contexts.

4.5.2.1 String facets

The three main facets that can be applied to strings are with reference to the length of a string. These three string facets are xsd:length, xsd:minLength, and xsd:maxLength. By using these facets, we can control the length of a string.

For example, suppose that we want to create a facet for the salutation of a person. Let us also imagine that we want to restrict this to one of Mr, Ms, or Mrs. Therefore, the minimum length of this element is two, and the maximum is three. Based on this, we can apply our facets as shown in Figure 4.34.

Figure 4.34 Example of minLength and maxLength facets (person1.xsd)

The XML document shown in Figure 4.35 is based on the schema person1.xsd. Since it obeys the rules of the facets, it is ok. Note that the salutation is Mr. This consists of two characters, which is acceptable per our facet rules.

Figure 4.35 Example of an XML document conforming to facet restrictions

However, the XML document shown in Figure 4.36 is not valid, because it violates the restrictions specified by the facets of the schema. We have changed Mr to Prof in the SALUTATION element now.

Figure 4.36 Example of an XML document violating the facet restrictions

Similarly, we can use the xsd:length facet to specify the exact length of an element. We will not show an example of this, and leave it as an exercise to the reader.

4.5.2.2 The white space facet

The white space facet allows us to specify how we want to deal with white spaces. It does not specify restrictions, unlike the other 11 facets.

The xsd:whiteSpace facet allows three possible values, as follows.

preserve: This is the default. It means that the white space in the XML document should be kept as is.
replace: This facet value indicates that we want to replace every tab, line feed, and carriage return character in our XML document with a single space character.
collapse: This facet value is a superset of the replace value. After performing the job of replace, this facet value further condenses multiple consecutive spaces into a single space.

Figure 4.37 shows an example of a schema containing this facet. Note that we have specified a value of collapse for this facet. This means that we wish to transform white spaces in our XML document (i.e., all contiguous spaces, tabs, line feeds, and carriage returns) into a single space.

Figure 4.37 Example of the xsd:whiteSpace facet in a schema (poem.xsd)

Figure 4.38 shows the corresponding XML document. We have deliberately introduced plenty of spaces, tabs, and blank lines to show how the xsd:whiteSpace facet value of collapse in the schema helps us clear white space and change it into a single space.

Figure 4.38 XML document containing many white spaces (poem.xml)

To see the impact of the xsd:whiteSpace = “collapse” facet, we need to open our XML document in a browser. When we do that, the facet gets applied and the resulting XML document looks as shown in Figure 4.39. As we can see, all the white spaces are crunched into a single space character.

Figure 4.39 Result of applying the whiteSpace facet (poem.xml)

4.5.2.3 Facets for numbers

There are two main number facets. The xsd:totalDigits facet specifies the maximum number of digits in a number. The xsd:fractionDigits, on the other hand, specifies the maximum number of digits in the fractional part (i.e., in the part to the right of the decimal point).

4.5.2.4 Facet for enumeration

The enumeration facet is similar to the choice construct in the case of a DTD.

The enumeration facet in XML schemas allows us to specify a list of possible values for an element. The XML document corresponding to this schema must have one of these values.

Let us consider an example. Suppose that we want to maintain information about a book on computer science. One of the details that we want to maintain is the category of this book from a list of possible categories. Figure 4.40 shows the resulting XML schema.

Figure 4.40 Schema example using the enumeration facet

As we can see, our schema declaration specifies an enumeration facet for the book category. This allows us to specify a number of possible book category values, in this case, five. It should now be clear that our XML document based on this schema must have one of these values in the CATEGORY element. Otherwise, it would be considered an illegal XML document.

We can use the enumerated facet also with other data types, such as integer, NMTOKEN, date, etc.

4.5.2.5 The pattern facet

We encounter situations commonly where we need to specify that the value of an XML element must start with something, or end with something, or should have some specific characters at a particular location, etc. Some of the common situations where this will apply are as follows:

The pin code should be of six characters, starting with 4 as the first digit
The employee ID must begin with E as the first alphabet
The roll number of a student needs to start with S

The pattern facet allows us to deal with such requirements.

We will first see an example and then go into the syntactical details of this facet. Suppose that we want to define a three-digit book code, to start with. The resulting scheme declaration is shown in Figure 4.41.

Figure 4.41 Example of a pattern facet – 1

The full schema (bookcode1.xsd) and the corresponding XML document (bookcode1.xml) are shown in Figure 4.42.

Figure 4.42 Example of a pattern facet – 2

Let us first understand the facet declaration. Our facet declaration is:

<xsd:pattern value="p{Nd}p{Nd}p{Nd}"/>

The part p indicates one character. That is, whenever we say p in the facet declaration, we mean that we wish to indicate the position for one character. What that character should actually contain, is not yet specified.
This is followed by {Nd}. This specifies one numeric digit.

As a result, together, our facet declaration portion of p {Nd} indicates one numeric digit. Now, our pattern fact consists of three such declarations, i.e., it looks as p {Nd} p {Nd} p {Nd}. Clearly, this means three numeric digit positions.

Therefore, what if we now want to represent a pin code of six digits? Clearly, our pattern facet would be as follows:

<xsd:pattern value="p{Nd}p{Nd}p{Nd} p{Nd}p{Nd}p{Nd}"/>

Note that there are six numeric digit positions defined now.

There are many interesting things we can do with pattern facets. For example, consider the following pattern facet:

<xsd:pattern value="ABp{Nd}"/>

Now, we are saying that the element corresponding to this declaration consists of three positions. The first two positions must contain upper case alphabets A and B. The third position must contain a numeric digit.

Following are some of the valid XML elements corresponding to this pattern facet.

AB1

AB0

AB9

Following are some of the invalid XML elements. The reasons are specified in the brackets.

890 (Must start with AB)

ABT (Third position must contain a number)

Ab9 (Second position must contain an upper case B)

Based on this understanding, let us summarise the various pattern facet options, as shown in Table 4.7. Here, X and Y should be interpreted as generic symbols, which, in real life, could be replaced by any string. Similarly, m and n are integers, which would be replaced by actual number of occurrences.

Symbol	Purpose
X?	Zero or one occurrences of X
X*	Zero or more occurrences of X
X+	One or more occurrences of X
X{n, m}	Occurrences of X between n and m
X{n}	Exactly n occurrences of X
X{n,}	At least n occurrences of X
X \| Y	Either of X or Y
XY	X immediately followed by Y
.	Any one single character
p{A}	One character from a Unicode class (explained separately later)
[abcde]	A single occurrence of any of the characters specified inside brackets
[^abcde]	A single occurrence of any of the characters not specified inside brackets
[a-z]	A single occurrence of any of the characters between a and z, both inclusive
[^a-z]	A single occurrence of any of the characters other than the range a-z, both inclusive
	New line (or linefeed)
	Carriage return

Table 4.7 Main regular expression symbols for XML schema

We have omitted some less significant symbols from the list.

Earlier, we had stated the significance of the pattern facet p{}. Let us dwell on it further. In the context of this pattern, we can specify a number of things inside the curly brackets. What we specify there, determines what the corresponding element in the XML document can contain. For example, if we have an N there, we can have a numeric digit, as we have seen previously. What are the other options? Figure 4.43 summarises this.

Figure 4.43 Various pattern facet classes

The pattern facet classes allow us to define a large number of patterns quite comprehensively. However, we do not require all of them in most practical situations. For example, we would encounter the L and the N quite commonly, but not the M and the Z. Regardless, we would list them for the sake of completeness, as shown in Table 4.8.

Pattern abbreviation (L)	Contains	Examples
L	All alphabets	a, b, c, …, A, B, C, …
Lu	Uppercase alphabets	A, B, C, …
Ll	Lowercase alphabets	a, b, c, …
Lt	Title case alphabets	<< Not English >>
Lm	Modified letters (e.g., superscript)	m, k, u
Lo	Other letters	Japanese, etc.

Table 4.8 (A) Letters pattern facet classes

Pattern abbreviation (M)	Contains	Examples
M	All marks
Mn	Non-spacing marks	<<Not English >>
Mc	Spacing combining marks	<<Not English >>
Me	Enclosing marks	<< Million sign >>

Table 4.8 (B) Marks pattern facet classes

Pattern abbreviation (N)	Contains	Examples
N	All numbers	0, 1, 2, …, I, II, III, …, ½, ¾, …
Nd	Decimal digits	0, 1, 2, …
Nl	Numbers based on letters	I, II, III, …
No	Other numbers	¼, 2/3, …

Table 4.8 (C) Numbers pattern facet classes

Pattern abbreviation (P)	Contains	Examples
P	All punctuations	( ) [ ] { } @
Pc	Connectors	<< Not relevant >>
Pd	Dashes	Hyphens, etc
Ps	Starting punctuations	( [ {
Pe	Ending punctuations	) ]}
Pi	Initial quotation marks	' “
Pf	Final quotation marks	' "
Po	Other quotation marks	? ! .

Table 4.8 (D) Punctuations pattern facet classes

Pattern abbreviation (Z)	Contains	Examples
Z	All Separators	–
Zs	Space	–
Zl	Line separators	–
Zp	Paragraph separators	–

Table 4.8 (E) Separators pattern facet classes

Pattern abbreviation (S)	Contains	Examples
S	All symbols	©☺
Sm	Mathematical symbols	≤≥ ∞μ
Sc	Currencies	€£¥
So	Other symbols	®..«†

Table 4.8 (F) Symbols pattern facet classes

Pattern abbreviation (C)	Contains	Examples
C	All others	–
Cc	Control characters	–
Cf	Format characters	–
Co	Private use characters	–
Cn	Unassigned	–

Table 4.8 (G) Other pattern facet classes

Let us consider an exercise.

Problem

Suppose that we want to create a pattern facet to represent an amount of the pattern $99.99. In other words, this is a currency of up to a maximum of $99.99. Explain which pattern face classes we should use, and in what manner.

Solution

Step 1: We need to represent the currency. Therefore, we will use the pattern facet p{Sc}.

Step 2: We now need to keep a provision for two decimal integer positions. Therefore, we will use the pattern p{Nd}p{Nd}.

Step 3: Now we have a decimal point. This is represented as ..

Step 4: Now we have two more decimal digits for the fractional part. We can use p{Nd} again.

From the above steps, our resulting pattern is p{Sc}p{Nd}p{Nd}.p{Nd}p{Nd}.

Let us now put our exercise into action by providing the full schema and the corresponding XML document. Figure 4.44 shows the schema declaration for a book, including its price and the corresponding XML document.

Figure 4.44 Example of currency pattern facet

Note that the price of the book satisfies the pattern facet requirements. If you, instead, specify a value such as, say, $400.20 or $40.200, etc., it would not be acceptable, and an error will be flagged.

Let us have one more exercise before we conclude pattern facets

Problem

How can we represent a telephone number such as (9120)22907048 using pattern facets?

Solution

Step 1: We need to represent the opening bracket. Therefore, we will use the pattern facet p{Ps}.

Step 2: We now need to keep a provision for four decimal integer positions. Therefore, we will use the pattern p{Nd}p{Nd}p{Nd}p{Nd}.

Step 3: Now we need to close the opened bracket. Therefore, we will use the pattern facet p{Pe}.

Step 4: Now we have eight more decimal digits. We can use p{Nd} 8 times. However, let us imagine that we do not know how many decimal digit positions we require; but we require at least one. Then we can use the pattern facet p{Nd}+.

From the above steps, our resulting pattern is p{Ps}p{Nd}p{Nd}p{Nd}p{Nd}p{Pe}p{Nd}+.

The resulting schema and XML document are shown in Figure 4.45.

Figure 4.45 Example of brackets facet

4.5.2.6 Unions

We have discussed restrictions in detail. Restrictions are widely used for creating new simple types. We need to remember, however, that this is not the only mechanism for creating new simple types.

Unions allow us to combine simple types to create new simple types.

Suppose we want to represent information about a student. Let us imagine that a student can be identified uniquely either by roll number (a numeric type) or name (a string). Further, our XML document should support either of these identification mechanisms. In that case, we can create a new simple type that can have either the roll number or the student name. Thus, we are creating a union of the student's roll number, and the name. This can be depicted in a schema as shown in Figure 4.46.

Figure 4.46 Example of union in a schema

The corresponding XML document is shown in Figure 4.47.

Figure 4.47 Union in use in an XML document

4.5.2.7 Lists

A list type allows the creation of a list of a particular simple type.

For example, suppose that we want a list of employee numbers to be created as follows.

<EMP_LIST> 9662 10000 10190 9939 </EMP_LIST>

Then we can use the xsd:list type in conjunction with an xsd:simpleType. There is an attribute called as itemType with an xsd:list, which allows us to specify the type of each of the items in the list. This is shown as follows for the above XML content.

<xsd:simpleType name = “EmpList”>

<xsd:list itemType = “xsd:int”/>

</xsd:simpleType>

Figure 4.48 shows both the complete schema and the corresponding XML document. We have considered two cases: a valid XML document, and an invalid XML document. The invalid or illegal XML document contains a string in the list. This is not allowed, as our list definition in the schema clearly allows only integers. The addition of a string violates this rule.

Figure 4.48 List example

Although lists themselves may not be useful, we can create more interesting lists by restricting them. For example, we can alter the earlier list by mandating that the list can contain the employee numbers of only five employees. The modified schema and the corresponding XML document are shown in Figure 4.49.

Figure 4.49 Modified list example

If we reduce or add one more employee number from or to the list, the XML document would become illegal. We must have exactly five employee numbers.

4.5.3 Empty Elements

We know that an empty element in XML is the one that cannot have child elements or its own data. We have declared empty elements in a DTD by using the EMPTY content type. In the case of a schema, we declare an element as empty by not specifying a child of the form xsd:sequence, xsd;all, or xsd:choice.

For example, we can have the following declaration:

<xsd:complexType name = “ThisIsEmptyExample”>

</xsd:complexType>

4.6 ATTRIBUTES

So far, we have ignored attributes in the context of schemas. We would now take a look at them. The declaration of attributes is similar to that of elements. We specify the keyword called as attribute to define an attribute. For example, suppose that we want to declare an attribute named designation. Then, its declaration would be as follows:

We can also add a data type to an attribute. For example, the above declaration can be modified as follows:

An attribute occurs only once, so we cannot use minOccurs or maxOccurs. However, we can specify whether we (a) must have an attribute (required), (b) may have an attribute (optional), or (c) cannot have an attribute (prohibited). These three possibilities are illustrated in Figure 4.50.

Figure 4.50 Possibilities about the presence or absence of an attribute

We can also assign a fixed or default value to an attribute. Figure 4.51 shows examples of these two.

Figure 4.51 Specifying fixed or default values for an attribute

In the first case, the type of the attribute is required. It also has a fixed value. In other words, this is a mandatory attribute, which has a constant value. The corresponding XML document cannot either drop this attribute, or change the attribute value.

In the second case, the attribute is of type optional. It has a default value. This means that this attribute may be present in the XML document. If present, it will have a default value as specified. However, the attribute can also be absent from the XML document. Alternatively, the attribute may be present, but with the value that is same as what is specified as default, or with another value.

If we specify a default value for an attribute, then its type must be optional.

Recall that attributes have no independent existence. An attribute is relevant only in the context of an element. Let us now see how we can attach an attribute to an element. For this purpose, we need to declare an attribute inside an element's declaration. However, if we want to attach an attribute to an element, the element must have a complex type. A simple element can only have some content as the body of the element, and therefore, cannot have attributes.

Let us now consider an example. Suppose that we want to maintain the information about an employee in terms of the following: employee ID, name, designation and whether the employee is confirmed. We can model the first three as elements, and the last as the attribute. This is shown in Figure 4.52.

Figure 4.52 Defining an attribute in a schema

Note that the attribute is defined as a part of a complex type, named EmpType. This complex type (i.e., EmpType) is the data type of our root element EMPLOYEE. Thus, the attribute emp_confirmed is also associated with our EMPLOYEE root element.

Let us now create a sample XML document corresponding to our schema. This is shown in Figure 4.53.

Figure 4.53 Using an attribute in an XML document

Note how we have used the attribute emp_confirmed. We have associated it with the root element EMPLOYEE. This is in perfect agreement with our earlier schema definition.

As we had mentioned earlier, the decision regarding whether to model something as a child element or an attribute, depends on the kind of application and the designer's view. For example, suppose that we want to create a schema to store information about a product. To start with, we do not want to store anything but the name of the product. Then, we have two choices: (a) Define a sub element called as NAME, or (b) Use an attribute called as name. Accordingly, the XML documents would differ, as shown in Figure 4.54.

Figure 4.54 Defining content as a child element versus as an attribute in a schema Let us examine these two cases.

In case (a), we define the product name as a child element of the root element (PRODUCT) in the XML document. Therefore, in the corresponding schema definition, we make up the root element of a complex type (productType). This complex type (productType), in turn, contains this child element (NAME).
In case (b), we do not have a child element in the XML document. Instead, we associate the product name attribute (name) with the root element (PRODUCT). In other words, our XML document does not have content or child elements. Instead, it only has an attribute associated with it. Therefore, in our schema definition, we declare the root element as a complex type containing one attribute (name). It does not specify child elements, unlike case (a). Another minor point is that we have made the attribute mandatory. Therefore, if we omit this from our XML document, an error will be flagged.

4.6.1 Grouping Attributes

Similar to how we can group elements, we can also group attributes. An attribute group is a set of attributes. It can be used on a set of elements. How is this achieved?

If an element has several attributes, then we can group them and provide a reference of this group to the concerned element.

This approach provides two benefits:

Reading the schema becomes easier.
The same set of attributes (in the form of the group) can be reused for multiple elements.
If an attribute changes, the changes are centralised at once place, and are not needed to be performed everywhere.

An example of attribute grouping is shown in Figure 4.55. As shown, a group of attributes is created here for the attributes corresponding to an employee element.

Figure 4.55 Grouping attributes

As we can see, the syntax for grouping attributes is the use of the attributeGroup ref declaration. This specifies that we want to create a group of attributes, with the specified name (in this case, empDetails). We then specify the details of the attribute group (i.e., which attributes it groups) with the help of the attributeGroup name declaration. This tag specifies the attributes that we want to group together.

KEY TERMS AND CONCEPTS

Anonymous type
Base-64 encoding
Complex element
Facet
Hexadecimal conversion
Mixed content
Restriction
Schema
Simple element
Valid document
XML data type

CHAPTER SUMMARY

A Document Type Definition (DTD) allows us to validate the contents of an XML document.
An XML schema is used for validating the contents of an XML document. The syntax of schema is similar to that of XML. The syntax for DTD is different from this.
An XML document that conforms to the rules of a schema is called as a valid XML document. Otherwise, it is called as invalid.
An XML document can use both DTD and schema for its validation at the same time.
XML schema is defined in a separate file, which has its extension as xsd.
The XML document specifies the schema file it is referring to through the declaration xsi:noNamespaceSchemaLocation = “schema file name”.
The schema file begins with the <?…..?> declaration.
The root element of the schema file is always schema declared as <xsd:schema>…….. </xsd:schema>.
The elements in XML are of two types: simple and complex. Simple elements can contain only text and complex elements can contain sub-elements and text, or we can say that they are made up of simple elements.
The root element of an XML document is known as the top-level element in the XML schema.
The elements of the XML, i.e., their name, type and frequency are declared in the schema by using declaration: <xsd:element name =“Element name” type =“datatype” , minoccurs =“Numeric value/unbounded” maxOccurs =“numeric value/unbounded”>……………</xsd:element>. The minOccurs and maxOccurs attribute are used to specify the frequency of an element and their default value is 1. The keyword Unbounded is used to represent that the element can occur any number of times.
The schema can have user defined data types. A data type is declared as: <xsd complexType name =“element" type =“name of datatype”> ……….</xsd:complexType>.
The schemas allow us to force a sequence of simple elements within a complex element by using: <xsd:sequence> ……</xsd:sequence>.
Content model reuse helps us in abstracting the common features of data types and in using them in an efficient manner, while dealing with unnecessary duplications.
The schema syntax provides support for three grouping constructs that also govern the sequence of elements inside an XML document. They are xsd:all, xsd:choice and xsd:sequence.
The xsd:all grouping specifies that all the elements in a group must occur at the most once, but their ordering is not significant.
The xsd:choice grouping allows us to specify that only one element from the group can appear. Alternatively, we can also specify that out of n elements in a group, m should appear in any order.
The xsd:sequence grouping mandates that every element in a group must appear exactly once, and also in the same order in which the elements are listed.
XML schemas offer 44 built-in simple types. The broad types are, Numeric, String, Time, Binary, Boolean, and URI reference.
There are three ways in which data types can be derived from the base type: Restriction, Union and List.
Restriction allows us to select a subset of values allowed by the base type. We can use an element of type xsd:restriction as a child element of an xsd:simpleType element for creating a new type based on a simple existing type. The base attribute specifies how the restriction applies specify more restrictions than the basic type allows.
The facets allow us to specify more restrictions than what the basic type allows. Some of the facets are xsd:minInclusive, xsd:maxInclusive, xsd:minExclusive, xsd:maxExclusive, xsd:enumeration, xsd:whiteSpace, xsd:pattern, xsd:length, xsd:minLength, xsd:maxLength, xsd:totalDigits, xsd:fractionDigits, and xsd:whiteSpace.
The three main facets that can be applied to strings are with reference to the length of a string. These three string facets are xsd:length, xsd:minLength, and xsd:maxLength. By using these facets, we can control the length of a string.
The white space facet allows us to specify how we want to deal with white spaces. It does not specify restrictions. It can have three possible values: preserve, replace, and collapse.
Preserve is the default value. It means that the white space in the XML document should be kept as is. The facet value of replace indicates that we want to replace every tab, line feed, and carriage return character in our XML document with a single space character. The facet value of collapse is a superset of the replace value. After performing the job of replace, this facet value further condenses all multiple consecutive spaces into a single space.
The two main facets applicable to numbers are xsd:totalDigits and xsd:fractionDigits. The xsd:totalDigits facet specifies the maximum number of digits in a number. The xsd:fractionDigits, on the other hand, specifies the maximum number of digits in the fractional part (i.e., in the part to the right of the decimal point).
The enumeration facet in XML schemas allows us to specify a list of possible values for an element. The XML document corresponding to this schema must have one of these values for that element.
The schema uses pattern facets to specify the contents of those elements whose value have a certain pattern. It is specified as <xsd:pattern value="pattern of the value"/>. There are number of regular expressions used to specify the patterns.
Unions allow us to combine simple types to create new simple types.
A list type allows the creation of a list of values of a particular simple type. The element xsd:list is used to declare the list, in conjunction with an xsd:simpleType. The attribute itemType is used with an xsd:list, to specify the type of each of the items in the list.
The empty elements are declared in schema by not specifying a child of the form xsd:sequence, xsd;all, or xsd:choice.
Attributes are used in the context of an element to specify something about them. The attributes name, type, presence and value can be declared as <attribute name = “Attribute name” type = “datatype” use = “required/prohibited/optional” />. The values for the attribute can be defined as fixed or default.
If an element has several attributes, then we can group them and provide a reference of this group to the concerned element.

PRACTICE SET

True or False Questions

The syntax of schema is similar to that of XML.
XML can be associated with both DTD and schema at the same time.
The root element of the schema file can be anything, except that it should follow the XML identifier naming conventions.
The xsd:all grouping specifies that all the elements in a group must occur at the most once, but their ordering is not significant.
In the case of a schema, we declare an element as empty by not specifying a child of the form xsd:sequence, xsd;all, or xsd:choice.
If we want to condense the multiple occurrences of white spaces into one in XML, we use replace facet.
The minOccurs and maxOccurs cannot be used with attribute as it occurs only once.
The xsd:minInclusive facet is used to specify the minimum value that all the instances of this type must be greater than.
The regular expressions X* and X? can both be used to represent zero occurrence of X in the pattern facet.
The pattern value p{NI} indicates that there will be one digit, based on letter.

Multiple Choice Questions

The elements in the XML are of ________ and ________ types.
1. primary, secondary
2. basic, complex
3. simple, complex
4. parent, child
The root element of an XML document is known as ________ element in schema.
1. parent
2. primary
3. root
4. top level
To specify that an element may not occur at all, or may occur for any number of times, the value of the minOccurs and maxOccurs is ________.
1. 1, 1
2. 1, 0
3. 0, unbounded
4. 1, unbounded
The ________ grouping allows that out of multiple elements, few should appear in any order.
1. xsd:few
2. xsd:all
3. xsd:choice
4. xsd:alternative
The xsd:whitespace can have three possible values ________, ________ or ________.
1. preserve, replace, collapse
2. retain, replace, condense
3. preserve, replace, condense
4. retain, replace, condense
The list of possible values for an element can be specified using ________ facet.
1. xsd:list
2. choice
3. enumeration
4. none
The ________ and ________ pattern facet options are used to specify the exactly and at least n occurrence of X respectively.
1. X{n}, X{n,}
2. X{n,}, X{n}
3. X|n, X{n}
4. X{n|}, X{n,}
The p{_} and p{_} are used to represent letters and separators.
1. L, S
2. L, Z
3. L, P
4. M, L
The xsd:date ,time data types represents the date in the format ________.
1. DD-MM-YY
2. DD-MM-YYYY
3. YYYY-MM-DD
4. DD-MON-YY
If we specify the default value for an attribute, then its type must be ________.
1. fixed
2. required
3. prohibited
4. optional

Detailed Questions

Why is schema better than DTD? Give a small example to explain the basic structure of a schema.
What are the two types of elements we can have in a schema?
Define and explain the term content model reuse.
What are anonymous type data types used for?
What grouping constructs are available in the XML Schema? Elaborate with an example.
How many types of simple types are there in a schema broadly?
How are derived types declared in schema? Explain Union and Lists.
How is restriction used along with facets to derive new data types?
What are attributes? How do we specify them in a schema?
What are the advantages of grouping the attributes?

Exercises

Think about the scenario where something equivalent to schemas is not available. What will you use to validate XML documents in that case?
Study a few examples in real life that use schemas.
Write a schema for a case where we want to keep information about the children of employees. One employee can have at the most five children. A child can be a boy or a girl.
Create an XML document that complies with the above schema specification.
Think of practical situations where empty elements would be useful.
Why are schemas becoming more popular than DTDs?
Does an XML parser always require an XML document to be associated with a schema? Investigate.
Find out a tool that simplifies the creation of schemas.
Learn more about facets.
In any given situation, is it simpler to write schemas as compared to DTDs? Investigate and justify your answer.

ANSWERS TO EXERCISES

True or False Questions

1. False	2. True	3. False	4. True
5. True	6. False	7. True	8. False
9. True	10. True

Multiple Choice Questions

1. c	2. d	3. c	4. c
5. a	6. c	7. a	8. b
9. c	10. d

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Chapter 4. XML Schemas

Create new playlist

Sign In

Sign Up

4

XML Schemas

4.1 INTRODUCTION TO SCHEMA

4.2 COMPLEX TYPES

4.2.1 Basics of Simple and Complex Types

4.2.2 Specifying the Frequency: minOccurs and maxOccurs

4.2.3 Specifying Element Content

4.2.4 Content Model Reuse

4.2.5 Anonymous Types

4.2.6 Mixed Content

4.3 GROUPING OF DATA

4.3.1 Mandating All Elements

4.3.2 Making Choices

4.3.3 Sequences

4.4 SIMPLE TYPES

4.4.1 Numeric Data Types

4.4.2 Time Data Types

4.4.3 XML Data Types

4.4.4 String Data Types

4.4.5 Binary Data Types

4.4.6 Other Data Types

4.5 DERIVING TYPES

4.5.1 Deriving by Restriction

4.5.2 Facets

4.5.2.1 String facets

4.5.2.2 The white space facet

4.5.2.3 Facets for numbers

4.5.2.4 Facet for enumeration

4.5.2.5 The pattern facet

4.5.2.6 Unions

4.5.2.7 Lists

4.5.3 Empty Elements

4.6 ATTRIBUTES

4.6.1 Grouping Attributes

KEY TERMS AND CONCEPTS

CHAPTER SUMMARY

PRACTICE SET

ANSWERS TO EXERCISES

Table of Contents for
Chapter 4. XML Schemas