4.2.1 Basics of Simple and Complex Types
4.2.2 Specifying the Frequency: minOccurs and maxOccurs
We have studied the concept of a Document Type Definition (DTD) in detail. We know that a DTD is used for validating the contents of an XML document. A DTD is undoubtedly an important feature of the XML technology. However, there are a number of areas in which DTDs are weak. The main argument against DTDs is that their syntax is not like that of XML documents. Therefore, the people working with DTDs have to learn a new syntax. Furthermore, this leads to problems such as we cannot search for information inside the DTDs, we cannot display their contents in the form of HTML, etc.
A schema is an alternative to DTD.
It is expected that schemas would eventually replace most (but not all) of the features of the DTDs. DTDs are easier to write and provide support for some features (e.g., entities) better. However, schemas are far richer in terms of their capabilities and extensibility. A schema document is a separate document, just like a DTD. However, the syntax of a schema is like the syntax of an XML document. Therefore, we can state:
The main difference between a DTD and a schema is that the syntax of a DTD is different from that of an XML. However, the syntax of a schema is the same as that of an XML.
In other words, a schema document is an XML document.
For example, we declare an element in a DTD by using the syntax <!ELEMENT>. This is clearly not legal in XML. We cannot begin an element declaration with an exclamation mark, as happens in the case of a DTD.
We can use a simple, yet powerful, example to illustrate the difference between using a DTD and using a schema. Suppose that we want to represent the marks of a student in an XML document. For this purpose, we want to add an element called as Marks to our root element Student. We will declare this element as of type PCDATA in our DTD file. This will ensure that the parser checks for the existence of the Marks element in the XML document. However, can it ensure that the marks are numeric? Clearly, no! We cannot control the contents the element Marks can have. These contents can be alphabetic or alphanumeric! This is shown in Figure 4.1.
Figure 4.1 Use of PCDATA does not control data type
As we can see, the usage of PCDATA in the declaration of an element does not stop us from entering alphabetic data in a Marks element. In other words, we cannot specify exactly what our elements should contain. This is clearly not desirable.
In the case of a schema, we can specify that our element should only contain numeric data. Moreover, we can control many other aspects of the contents of the elements; which is not possible in the case of DTDs. We use similar terminology for checking the correctness of the XML documents in the case of a schema (as in the case of DTDs). An XML document that conforms to the rules of a schema is called as a valid XML document. Otherwise, it is called as invalid.
It is interesting to note that we can associate a DTD as well as a schema with an XML document.
Let us now take a look at a simple schema. Consider an XML document that contains a greeting message. Let us write a corresponding schema for it. Figure 4.2 shows the details.
Figure 4.2 Example of an XML document and corresponding schema
We will notice several new syntactical details in the XML document and the schema file. Let us, therefore, understand this, step-by-step.
First and foremost, an XML schema is defined in a separate file. This file has the extension xsd. In our example, the schema file is named message.xsd.
The following declaration in our XML document indicates that we want to associate this schema with our XML document:
<MESSAGE xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="message.xsd">
Let us dissect this statement.
This is followed by the actual contents of our XML document. In this case, the contents are nothing but the contents of our root element.
These explanations are depicted in Figure 4.3.
Figure 4.3 Understanding our XML document
It is now time to understand our schema (i.e., message.xsd).
Note that the schema file is an XML file with an extension of xsd. That is, like any XML document, it begins with an <?xml …?> declaration.
The following lines specify that this is a schema file, and not an ordinary XML document. They also contain the actual contents of the schema. Let us first reproduce them:
<xsd:schema xmlns:xsd = “http://www.w3org/2001/XMLSchema”>
<xsd:element name = "MESSAGE" type = “xsd:string”/>
</xsd:schema>
Let us understand this, step-by-step.
These explanations are depicted in Figure 4.4.
Figure 4.4 Understanding our XML schema
Based on this discussion, let us have a small exercise.
Exercise 1
Write an XML document that contains a single element to specify the name of the student. Provide a corresponding XML schema.
Solution
XML document (student.xml)
<?xml version = “1.0” ?>
<STUDENT xmlns:xsi="http://www.3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="student.xsd">
S Ramachandran
</STUDENT>
XML schema (student.xsd)
<?xml version = “1.0” ?>
<xsd:schema xmlns:xsd = “http://www.w3org/2001/XMLSchema”>
<xsd:element name = “STUDENT” type = “xsd:string”/>
</xsd:schema>
Exercise 2
Write the same XML document, but this time use a DTD.
Solution
XML document (student.xml)
<?xml version="1.0"?>
<!DOCTYPE STUDENT SYSTEM “student.dtd”>
<STUDENT>
S Ramachandran
</STUDENT>
DTD (student.dtd)
<!ELEMENT STUDENT (#PCDATA)>
Elements in schema can be divided into two categories: simple and complex. This is shown in Figure 4.5.
Figure 4.5 Classification of elements in XML schemas
Let us understand the difference between the two types.
Figure 4.6 Complex element is made up of simple elements
Let us now consider an example. Suppose we want to capture student information in the form of the student's roll number, name, marks, and result. We can have all these individual blocks of information as simple elements. Then, we will have a complex element in the form of the root element. This complex element will encapsulate these individual simple elements. Figure 4.7 shows the resulting XML document, first.
Figure 4.7 XML document for Student example
Let us now immediately take a look at the corresponding schema file. Figure 4.8 shows this.
Figure 4.8 Schema for Student example
Let us understand our schema.
We know that the root element of the schema is a reserved keyword called as schema. Here too, it is the same. The namespace prefix xsd maps to the namespace URI http://www.w3.org/2001/XMLSchema, as earlier. In general, this will be true for any schema that we write.
This declares STUDENT as the root element of our XML document. In the schema, it is called as the top-level element. Remember that in the case of a schema, the root element is always the keyword schema. Therefore, the root element in an XML document is not the root of the corresponding schema. Instead, it appears in the schema after the root element schema.
The STUDENT element is declared of type StudentType. This is a user-defined type.
Conceptually, a user-defined type is similar to a structure in C/C++ or a class in Java (without the methods). It allows us to create our own custom types.
In other words, the schema specification allows us to create our own custom data types. For example, we can create our own types for storing information about employees, departments, songs, friends, sports games, and so on. We recognise this as a user-defined type because it does not have our namespace prefix xsd. Remember that all the standard data types provided by the XML schema specifications reside at the namespace http://www.w3.org/2001/XMLSchema, which we have prefixed as xsd in the earlier statement.
Now that we have declared our own type, we must explain what it represents and contains. That is exactly what we are doing here. This statement indicates that we have used StudentType as a type earlier, and now we want to explain what it means. Also, note that we use a keyword complexType to designate that StudentType is a complex element. This is similar to stating struct StudentType or class StudentType in C++/Java.
Schemas allow us to force a sequence of simple elements within a complex element. We can specify that a particular complex element must contain one or more simple element in a strict sequence. Thus, if the complex element is A, containing two simple elements B and C, we can mandate that C must follow B inside A. In other words, the XML document must have:
<A>
<B> … </B>
<C>… </C>
</A>
This is accomplished by the sequence keyword.
This declaration specifies that the first simple element inside our complex element is ROLL_NUMBER, of type string. After this, we have NAME, MARKS, and RESULT as three more simple elements following ROLL_NUMBER. We will not discuss them. We will simply observe for now that ROLL_NUMBER has a different data type: an integer. We will discuss this in detail subsequently.
We will also not discuss the closure of the sequence, ComplexType, and schema tags.
Let us have a small exercise to build on these concepts.
Exercise 1
Write an XML document and a corresponding XML schema for maintaining the employee number, name, designation, and salary.
Solution
XML document (employee.xml)
<?xml version = “1.0” ?>
<EMPLOYEExmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="employee.xsd">
<EMP_NO> 9662 </EMP_NO>
<EMP_NAME> Atul Kahate </EMP_NAME>
<DESIGNATION> Consultant </DESIGNATION>
<SALARY> 1000 </SALARY>
</EMPLOYEE>
XML schema (employee.xsd)
<?xml version = “1.0”?>
<xsd:schema xmlns:xsd = “http://www.w3org/2001/XMLSchema”>
<xsd:element name = “EMPLOYEE” type = “EmpType”/>
<xsd:complexType name = “EmpType”>
<xsd:sequence>
<xsd:element name = “EMP_NUMBER” type = “xsd:integer”/>
<xsd:element name = “EMP_NAME” type = “xsd:string”/>
<xsd:element name = “DESIGNATION” type = “xsd:integer”/>
<xsd:element name = “SALARY” type = “xsd:string”/>
</xsd:sequence>
</xsd:complexType>
</xsd:schema>
Let us consider that we want to represent information about a book. The XML document depicting this information along with its corresponding schema is shown in Figure 4.9.
Figure 4.9 Book XML and schema
There is no problem with this example. However, now imagine a situation where we want to provide support for a book XML document that can have multiple authors. Would the same schema serve the purpose? Figure 4.10 shows this situation. Please note that the schema declaration in this figure is incorrect. We have shown it merely to explain the problem.
Figure 4.10 Book XML and incorrect schema
We now have two authors in the XML document. However, the corresponding schema talks about the author element only once. This is not legal in XML. We must use either or both of the minOccurs and maxOccurs attributes in such situations.
The minOccurs attribute specifies the minimum number of occurrences that an element can have. On the other hand, the maxOccurs attribute specifies the maximum number of occurrences.
Our requirement is to have two authors in this case. Therefore, we only require maxOccurs with a value of 2. Therefore, the declaration of the AUTHOR element in our schema would change to the following:
<xsd:element name = “AUTHOR” type = “xsd:string”
maxOccurs = “2”/>
Nothing else needs to change. The above declaration specifies that we can at the most have two authors.
The default value of both minOccurs and maxOccurs is 1. Therefore, if we do not specify either of them, the element is deemed to occur exactly once.
The above declaration is equivalent to the following:
<xsd:element name = “AUTHOR” type = “xsd:string”
minOccurs = “1” maxOccurs = “2”/>
There is a specific value called as unbounded, which means infinite occurrences. Whenever we wish to specify that the upper limit for an element occurrence is infinite (that is, there is no upper limit), we can specify it as unbounded. For example, if our book can have a minimum of one author or an infinite number of authors, our declaration would change to:
<xsd:element name = “AUTHOR” type = “xsd:string”
minOccurs = “1” maxOccurs = “unbounded”/>
Based on our requirements, we can set minOccurs and maxOccurs attributes to various values. These are summarised in Table 4.1.
Requirement | Set minOccurs to | Set maxOccurs to |
---|---|---|
An element should occur exactly once | 1 |
1 |
An element should occur at least once and possibly many more times | 1 |
Unbounded |
An element is optional (may not occur at all), or may occur for any number of times | 0 |
Unbounded |
An element may not occur at all, or may occur only once | 0 |
1 |
Table 4.1 Usage of minOccurs and maxOccurs
We would realize that minOccurs and maxOccurs are similar to, but far more effective than the ?, *, and + symbols of the DTDs. The minOccurs and maxOccurs are not only easier to read and understand, but they also provide a lot more accurate precision. For example, suppose that one manager can manage any number of employees, from eight to 20. Then we can specify minOccurs as eight and maxOccurs as 20. There is no such accurate precision available in the case of DTD declarations.
Based on the concepts learnt so far, let us start making use of some of the key features. We have seen examples of simple types being a part of a complex type. However, we have not yet seen an example where the simple types themselves can become complex types. That is, we have not gone beyond one level of depth in the hierarchy of child elements. Let us do that now.
Consider that we have an employee working in an organisation. At any given time, the employee works in one or more projects, and has zero or more subordinates who work for her. The employee also has her own characteristics, such as name, designation, and salary. There are various ways in which this information can be represented in XML. We would use the version shown in Figure 4.11.
Figure 4.11 Sample XML (emp.xml)
Let us now write a schema for this XML document. We can see that we have an employee possibly managing multiple projects and multiple subordinates. Also, the employee has her own personal characteristics in terms of designation and salary.
Based on this description, one possible schema for this XML document is shown in Figure 4.12.
Figure 4.12 Corresponding schema (emp.xsd)
We will notice that the elements PROJECT and SUBORDINATE contain a sub-element each (named NAME). Therefore, they are not simple elements. Instead, we must declare them as complex elements in our schema. Because we declare them as complex, they automatically have to be a user-defined type. That is exactly what has happened in this case as well. We have defined these two as complex types, which we have later used in other elements.
To reinforce our understanding, let us discuss another example. Imagine that we have an Internet shopping site. People can browse our site and decide to buy goods online. We want to capture the result of placing one such order inside an XML document. Therefore, our XML document needs to store information about who is placing this order, what is the address at which the goods need to be delivered (i.e., shipping information), and the actual contents of the order (i.e., the goods ordered).
Therefore, the information we want to capture is something like the one shown in Figure 4.13.
Figure 4.13 Example of information we want to capture for processing
We represent this in another format as shown in Figure 4.14.
Figure 4.14 Visual representation of our intended XML contents
Let us now actually create an order placed by a customer, to see how this looks like. Figure 4.15 shows the resulting XML document.
Figure 4.15 XML document containing order details
Let us note some salient points about our XML document.
Based on this understanding, we can note a few points about the design of our schema, as follows.
Accordingly, our XML schema looks as shown in Figure 4.16.
Figure 4.16 XML schema describing order details
There are a couple of things that we have used here, but have not described them earlier. For instance, we have used a decimal data type. We will cover such things at the appropriate time.
Observant readers would have realised that there is some unwanted duplication in our earlier schema code for employees. The schema talks about projects and subordinates as two separate complex elements. However, these two are really similar. Both contain a sub-element name of type string. Rather than declaring the types for these two elements separately, can we not combine this declaration? The answer is yes, and that is what we mean by content model sharing or content model reuse.
The idea is simple. Rather than declaring ProjectType and SubordinateType, we would declare a single type, let us say NameType (for want of a better name!) and use it for both projects and subordinates. This is shown in Figure 4.17.
Figure 4.17 Content model reuse
Note that now both PROJECT and SUBORDINATE have the type specified as NameType. They do not have their own separate types.
Content model reuse helps us in abstracting the common features of data types, and in using them in an efficient manner to deal with unnecessary duplication.
In some situations, we want to put restrictions on the usage of certain user types. We may wish to mandate that a user-defined type only be used in a particular context (say inside a particular element only).
For example, let us again consider NameType, discussed in the earlier section. We know that NameType is meant to contain an element called as NAME. However, let us imagine that we want to split this NAME into two sub-elements: FIRST_NAME and LAST_NAME, only for subordinates. We obviously do not have such a split in the case of projects. However, because of the content model reuse, we would be forced to do so! This is shown in Figure 4.18.
Figure 4.18 Problem in sharing content models
As we can see, a subordinate should have a first name and a last name but a project will not have first and last name. But because NameType is defined that way, we must use it as defined! We cannot stop the misuse of sharing content models in such situations.
In other words, content model sharing can also bring its own set of problems. This is because our intention of abstracting common features and reusing them can get misused in such situations.
The solution to such problems is the usage of anonymous types. In our particular example, we can do the following changes:
Therefore, our diagram would now change as shown in Figure 4.19.
Figure 4.19 Anonymous types
Note that NameType is now an anonymous type, because it is declared inside the PersonType. We cannot use NameType as a type anywhere (e.g., in PROJECT or in SUBORDINATE). It has no existence outside of PersonType. Therefore, it is anonymous.
Sometimes, we want to allow text between elements. For example, suppose that we want to capture information about employee names. Then, we can think of the title, first name, middle name, and last name. Of these, we may mandate that the first name and the last name should be mandatory, whereas the other two are optional. As we know, we can define this in the form of an XML schema as shown in Figure 4.20.
Figure 4.20 Schema for capturing employee information
We can represent the same information by using mixed content. To do this, we need to add an attribute mixed with value true. When we do so, we can just keep the mandatory information as elements, and remove other elements that may have a corresponding value in the XML document. For example, here, we can remove the TITLE and MIDDLE_NAME elements from our schema, and instead declare the EMPLOYEE element to allow for mixed content. When we do so, the EMPLOYEE element must contain values for the first name and the last name, and can optionally have values for the title and the middle name.
The modified schema definition is shown in Figure 4.21.
Figure 4.21 Schema for capturing employee information (emp1.xsd)
The corresponding XML document is shown in Figure 4.22.
Figure 4.22 XML document for capturing employee information (emp1.xml)
We can now alter this definition by using mixed content. The modified schema definition is shown in Figure 4.23.
Figure 4.23 Schema with mixed content (emp2.xsd)
Note that we have dropped the TITLE and the MIDDLE_NAME elements. Also, we have added an attribute mixed with a value of true for the EmpType complex type. As a result, we cannot use the TITLE or the MIDDLE_NAME elements in our XML document. However, we can still specify the values for the title and the middle name elements as placeholders as shown in Figure 4.24. This is what mixed content allows us to do.
Figure 4.24 XML document with mixed content (emp2.xml)
Thus far, we have been dealing with schemas that mandate a specific order of elements inside an XML document. That is, we have seen cases where element A must follow element B inside a document. In reality, this is always not the case. At times, we just want to make sure that an element exists inside an XML document – where, is not so important. This is where grouping of elements comes into picture.
The schema syntax provides support for three grouping constructs that also govern the sequence of elements inside an XML document. These three constructs are:
The xsd:all grouping specifies that all the elements in a group must occur at the most once, but their ordering is not significant.
The xsd:choice grouping allows us to specify that only one element from the group can appear. Alternatively, we can also specify that out of n elements in a group, m should appear in any order.
The xsd:sequence grouping mandates that every element in a group must appear exactly once, and also in the same order in which the elements are listed.
Let us discuss these now.
When we use xsd:all, we mean that an element may occur. If it occurs, it must occur only once. The order of elements is not significant.
Consider the example shown in Figure 4.25.
Figure 4.25 Usage of all
In our example, the complex type NAME contains elements FIRST_NAME and LAST_NAME. Both FIRST_NAME and LAST_NAME must occur exactly once. Their order is not significant. This is because they are contained inside the <xsd:all> tag.
We must mention that we can also specify a value of zero for minOccurs or maxOccurs. That is, we can allow an element to not occur at all. In that sense, all is a misnomer. For instance, we can change the declaration of FIRST_NAME to the following:
<xsd:element name = “FIRST_NAME” type = “xsd:string”
minOccurs = “0” maxOccurs = “1”/>
We now allow FIRST_NAME to be missing from the XML document. This is perfectly all right.
We also need to note that we cannot specify an arbitrary number of occurrences in the case of all. That is, both minOccurs and maxOccurs can have only a value of zero or one. We cannot, for instance, say minOccurs = 2 and maxOccurs = 4. This is illegal.
Exercise
Write an XML schema and show the corresponding XML document for the following: It should contain information about a credit card so that the credit card can be validated.
Solution
XML schema (card.xsd)
<?xml version="1.0"?>
<xsd:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsd:element name="CreditCard">
<xsd:complexType>
<xsd:all>
<xsd:element name="CardType" type="xsd:string"/>
<xsd:element name="CardNumber" type="xsd:string"/>
<xsd:element name="CardHolder" type="xsd:string"/>
<xsd:element name="CardValidTill" type="xsd:string"/>
</xsd:all>
</xsd:complexType>
</xsd:element>
</xsd:schema>
XML document (card.xml)
<?xml version = “1.0”?>
<CreditCard xmlns:xsi = “http://www.w3.org/2001/XMLSchema-instance”
xsi:noNamespaceSchemaLocation = “card.xsd”>
<CardHolder>Sonia Kapoor </CardHolder>
<CardType> Visa </CardType>
<CardValidTill>Feb–2010</CardValidTill>
<CardNumber> 1234567890123456 </CardNumber>
</CreditCard>
Note: The order of elements in the XML document is different from the once specified in the schema. As we have mentioned earlier, this is allowed in the case of all.
We know that in the case of a DTD, we can use the pipe (|) symbol to signify selection. In schema, the corresponding functionality is achieved by using the xsd:choice syntax. When we embed more than one element inside a choice boundary, exactly one of them must occur in the XML document.
For example, suppose that we want to store the information about the result of examination as Pass or Fail along with the percentage of marks obtained. Clearly, only one of these should be allowed. We can make use of the choice element, as shown in Figure 4.26.
Figure 4.26 Using the choice syntax
Of course, usually, we will not store just the result alone. It would be in the context of (i.e., a part of) some element, such as Student. For now, we have ignored that possibility. But, we can easily modify our XML schema and document to reflect this. Figure 4.27 illustrates the modified schema and the XML document.
Figure 4.27 A complete example using choice
Here is an exercise to understand this further.
Exercise
Write an XML schema and show the corresponding XML document for storing information about lunch. It should consist of a starter, a main course, and a dessert. There should be options in each of the categories.
Solution
XML schema (lunch.xsd)
<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xsd:element name="Lunch">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Starter">
<xsd:complexType>
<xsd:choice>
<xsd:element name="Soup" type="xsd:string"/>
<xsd:element name="Juice" type="xsd:string"/>
</xsd:choice>
</xsd:complexType>
</xsd:element>
<xsd:element name="MainCourse">
<xsd:complexType>
<xsd:choice>
<xsd:element name="VegLunch" type="xsd:string"/>
<xsd:element name="NonVegLunch" type="xsd:string"/>
</xsd:choice>
</xsd:complexType>
</xsd:element>
<xsd:element name="Dessert">
<xsd:complexType>
<xsd:choice>
<xsd:element name="IceCream" type="xsd:string"/>
<xsd:element name="FruitSalad" type="xsd:string"/>
</xsd:choice>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
XML document (lunch.xml)
<?xml version="1.0"?>
<Lunch xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="lunch.xsd">
<Starter>
<Juice>Apple</Juice>
</Starter>
<MainCourse>
<VegLunch>Thali</VegLunch>
</MainCourse>
<Dessert>
<IceCream>Vanilla</IceCream>
</Dessert>
</Lunch>
An xsd:sequence element allows us to group a set of sub-elements. These sub-elements must appear in the same sequence in the XML document, as declared in the schema. We have seen many examples of this earlier, while declaring complex elements. We can add the minOccurs and maxOccurs attributes either to the individual sub-elements or to the main grouping element to control the number of occurrences of the individual sub-elements, or that of the group.
Let us consider an example. Consider that we want to maintain information about the batting of a team in a cricket match. We know that there can be at the most 11 batsmen. For every batsman, we will maintain details such as his name, details of dismissal (how out, fielder, bowler), and the number of runs scored. We will maintain this for two innings. A typical entry for one player would be as follows:
Sachin Tendulkar
Innings 1: c Inzamam-ul-Haq b Shoaib Akhtar 103
Innings 2: not out 201
Let us design a schema to maintain this sort of information. Remember that at the most 11 batsmen can bat. Every batsman can have up to two innings.
Figure 4.28 illustrates the resulting schema.
Figure 4.28 Schema for representing batting details for a team in a cricket match
An XML document corresponding to this schema (with data for one batsman) is shown in Figure 4.29.
Figure 4.29 XML document containing batting details for a team in a cricket match
We will not discuss sequences further, since we have covered them in detail.
So far, we have focused on the structure of an XML document with reference to schemas. We have not discussed the possibilities that exist with individual elements. For example, when we speak about the marks of a student, we generally believe that this would be a positive integer of up to three digits with a maximum value of 100. When we talk about a year, it is a four digit positive integer with some sensible value. Many such examples can be given.
In the case of DTDs, there is no way to specify this fine-grained detail about elements and their data types. In contrast, schemas allow us to provide many more details about such things.
XML schemas offer 44 built-in simple types.
We can roughly classify the XML schema simple types into seven categories, as shown in Figure 4.30.
Figure 4.30 Simple types as per schema specifications
We widely use numbers. The XML schema specifications provide support for a wide range of numeric data types. In practice, we make use of only a few of these. However, for the sake of completeness, Table 4.2 summarises the various numeric data types.
Data type name | Meaning | Examples |
---|---|---|
xsd:float | 32-bit, same as Java's float data type | 0, 12345.6789 |
xsd:double | 64-bit, same as Java's double data type | 0, 45.89E-2, 123456789.56789 |
xsd:decimal | Arbitrary precision, same as java.math.BigDecimal |
87200.29, -3.1415292 |
xsd:integer | Arbitrarily large or small integer, same as java.math.BigInteger |
-7890000000000000, 723712637236839210123 |
xsd:nonPositive Integer |
Integer less than or equal to 0 | 0, -1, -2, -3, … |
xsd:negative Integer |
Integer less than 0 | -1, -2, -3, … |
xsd:nonNegative Integer |
Integer greater than or equal to 0 | 0, 1, 2, 3, … |
xsd:positive Integer |
Integer greater than 0 | 1, 2, 3, … |
xsd:long | 8-byte, 2's complement integer, similar to Java's long data type |
-5612367128398213, 0, -19, 15, 915402742 |
xsd:int | 4-byte, 2's complement integer, similar to Java's int data type |
-615251, 0, 15310012 |
xsd:short | 2-byte, 2's complement integer, similar to Java's short data type |
-32767 to +32768 |
xsd:byte | 1-byte, 2's complement integer, similar to Java's byte data type |
-128 to + 127 |
xsd:unsignedLong | 8-byte unsigned integer | 0 to 18446744073709551615 |
xsd:unsignedInt | 4-byte unsigned integer | 0 to 4294967295 |
xsd:unsignedShort | 2-byte unsigned integer | 0 to 65535 |
xsd:unsignedByte | 1-byte unsigned integer | 0 to 255 |
Table 4.2 Simple numeric data types supported by XML schema
In this section, we shall discuss the various date and time-related data types. These data types are quite common in database products. These data types are used to represent a variety of date and time formats. A generic rule is that wherever applicable, the formats contain year, followed by month, followed by day, followed by hours, and so on. Note that “g” means Gregorian.
Table 4.3 lists the various time data types.
Data type name | Meaning | Examples |
---|---|---|
xsd:dateTime | Date and time in the format YYYY-MM-DDTHH:MM:SS | 2006-01-20T06:05:00 |
xsd:date | A specific date in the YYYY-MM-DD format | 2006-01-20, 1973-04-07 |
xsd:time | A specific time of day in the HH:MM:SS format | 06:05:00, 17:30:00 |
xsd:gDay | A day in a month | --01, --02, …, --31 |
xsd:gMonth | A month in a year | --01--, --02--, …, --12-- |
xsd:gYear | A year | 2006, 1973 |
xsd:gYearMonth | A specific month in a specific year | 2006-01, 1973-04 |
xsd:gMonthDay | A date without year | --01-20, --04-07 |
xsd:duration | Length of time in format | P2006Y01M20DT06H11M00S |
Table 4.3 Simple time data types supported by XML schema
We have discussed many XML data types in DTDs. XML data types in schemas are quite similar to those in the DTDs. However, there are also four additional data types in schemas under this category, as compared to DTDs.
Table 4.4 depicts the XML data types.
Data type name | Meaning | Examples |
---|---|---|
xsd:ID | A unique value for an element or an attribute | T1, M90, G101-100-Y6, Nine |
xsd:IDREF | Value of another ID type defined elsewhere in the document | T1, M90, G101-100-Y6, Nine |
xsd:ENTITY | An XML name, declared as an unparsed entity in a DTD | Bips, Graph10, PICTURE5 |
xsd:NOTATION | Usually indicates a file format | PDF, TIF, GIF, JPEG |
xsd:IDREFS | Reference to a list of ID names | T1, M90, G101-100-Y6, Nine |
xsd:ENTITIES | List of ENTITY names | Bips, Graph10, PICTURE5 |
xsd:NMTOKEN | NMTOKEN type | 67 how are you 1910 |
xsd:NMTOKENS | A list of NMTOKEN types | 67 how are you 1910 |
xsd:language | Language name from a list of valid values | En, en – US, en – GB, fr, ara |
xsd:Name | XML name, with or without colons | Student, employee, Team:Player |
xsd:QName | Prefixed name | xsd:element, Team:Player |
xsd:NCName | Local name without colons | Student, employee, player, salary |
Table 4.4 XML data types supported by XML schema
We have extensively used the xsd:string data type. It allows for any string value for any length. The internal representation of these strings is in Unicode. Apart from this, there are two more string data types, as shown in Table 4.5.
Data type name | Meaning | Examples |
---|---|---|
xsd:string | A Unicode character-based string of any length | Sachin Tendulkar is the best!, Amitabh Bachchan needs to be saluted, Mahatma Gandhi was the Father of the Nation, Hi there!, Your password is protected as ******, 2006 |
xsd:normalizedString | A string in which all the carriage returns, linefeeds, and tabs are replaced with a single blank (space) character | Hello XML, This is news to me!, red pepper |
xsd:token | Same as above, but in addition, all leading and tailing spaces are trimmed and consecutive spaces are converted into a single space | Bips, Graph10, PICTURE5 |
Table 4.5 String data types supported by XML schema
Usually, XML is meant to carry text. However, at times, it must also support binary data. The problem with binary data is that it can have byte patterns that are illegal. This is because some characters (i.e., byte patterns) such as null have a different meaning, and they cannot be a part of the XML content. Therefore, mechanisms are needed to encode such illegal characters into a legal form before such characters can be considered as a part of the XML document. Two standards for doing this are prevalent: hexadecimal conversion, and base-64 encoding. XML schemas support both of these. A detailed explanation of these standards is outside of the scope of this text. Nevertheless, we will provide a crisp overview.
The other two data types are Boolean and URI.
Deriving the custom simple types from the basic simple types is a powerful feature of XML schemas. We can use this feature to come up with data types that are specific to our application, but are unlikely to be available as basic simple data types.
For example, let us imagine that we want to store the publication year of all the books in a library, and we do not wish to register books that were published before 1970. In that case, we can use a derived type that restricts the set of legal values to a minimum of 1970.
There are three techniques for deriving types, as shown in Figure 4.31.
Figure 4.31 Deriving new simple types in XML
Let us discuss these now.
Restriction allows us to select a subset of values allowed by the base type.
We can use an element of type xsd:restriction as a child element of an xsd:simpleType element, for creating a new type based on a simple existing type. The base attribute specifies how the restriction applies.
Figure 4.32 shows an example. Here, we have simply specified that the publishing year of a book has a type of xsd:gYear.
Figure 4.32 Example of restriction
Thus, we have restricted the allowed values for the publishingYear element to that of a year data type. Of course, this is not the only thing we can do. We can enhance this restriction further, by also defining a range of allowed years. For this purpose, we need to make use of facets.
A facet allows us to specify more restrictions than what a basic type allows.
For example, to restrict the publishing year so that the books must be published in or after 1970, we can use a facet called as minInclusive. The minInclusive facet specifies the minimum value that an element can have. The resulting restriction is shown in Figure 4.33.
Figure 4.33 Example of restriction
Now, 1970, 1971… 2006 are all examples of a legal book publishing year. But a year less than these is illegal. For example, 1969 is not an allowed value.
Now, publishingYear is itself a type. That is, it can be used as a type to define another element.
Like minInclusive, there are a number of other facets. Table 4.6 summarises them.
Facet | Description |
---|---|
xsd:minInclusive | The minimum value that all the instances of this type must be greater than or equal to |
xsd:maxInclusive | The maximum value that all the instances of this type must be less than or equal to |
xsd:minExclusive | The minimum value that all the instances of this type must be greater than |
xsd:maxExclusive | The maximum value that all the instances of this type must be less than |
xsd:enumeration | A list of allowed values |
xsd:whiteSpace | How white space is treated in this element |
xsd:pattern | A pattern with which the contents of the element are compared |
xsd:length | The length of a string, items in a list, or bytes in binary data |
xsd:minLength | The minimum length |
xsd:maxLength | The maximum length |
xsd:totalDigits | The maximum number of digits allowed in the element |
xsd:fractionDigits | The maximum number of digits allowed in the fractional part of the element |
Table 4.6 List of facets
Of course, not all facets make sense for all data types. It is meaningless, for instance, to apply a fractionDigits facet to an integer data type – integers cannot simply have a fractional part! Therefore, the above table needs to be used carefully in conjunction with the appropriate data types.
In order to better understand the use and applicability of facets, we will now discuss them in the suitable matching contexts.
The three main facets that can be applied to strings are with reference to the length of a string. These three string facets are xsd:length, xsd:minLength, and xsd:maxLength. By using these facets, we can control the length of a string.
For example, suppose that we want to create a facet for the salutation of a person. Let us also imagine that we want to restrict this to one of Mr, Ms, or Mrs. Therefore, the minimum length of this element is two, and the maximum is three. Based on this, we can apply our facets as shown in Figure 4.34.
Figure 4.34 Example of minLength and maxLength facets (person1.xsd)
The XML document shown in Figure 4.35 is based on the schema person1.xsd. Since it obeys the rules of the facets, it is ok. Note that the salutation is Mr. This consists of two characters, which is acceptable per our facet rules.
Figure 4.35 Example of an XML document conforming to facet restrictions
However, the XML document shown in Figure 4.36 is not valid, because it violates the restrictions specified by the facets of the schema. We have changed Mr to Prof in the SALUTATION element now.
Figure 4.36 Example of an XML document violating the facet restrictions
Similarly, we can use the xsd:length facet to specify the exact length of an element. We will not show an example of this, and leave it as an exercise to the reader.
The white space facet allows us to specify how we want to deal with white spaces. It does not specify restrictions, unlike the other 11 facets.
The xsd:whiteSpace facet allows three possible values, as follows.
Figure 4.37 shows an example of a schema containing this facet. Note that we have specified a value of collapse for this facet. This means that we wish to transform white spaces in our XML document (i.e., all contiguous spaces, tabs, line feeds, and carriage returns) into a single space.
Figure 4.37 Example of the xsd:whiteSpace facet in a schema (poem.xsd)
Figure 4.38 shows the corresponding XML document. We have deliberately introduced plenty of spaces, tabs, and blank lines to show how the xsd:whiteSpace facet value of collapse in the schema helps us clear white space and change it into a single space.
Figure 4.38 XML document containing many white spaces (poem.xml)
To see the impact of the xsd:whiteSpace = “collapse” facet, we need to open our XML document in a browser. When we do that, the facet gets applied and the resulting XML document looks as shown in Figure 4.39. As we can see, all the white spaces are crunched into a single space character.
Figure 4.39 Result of applying the whiteSpace facet (poem.xml)
There are two main number facets. The xsd:totalDigits facet specifies the maximum number of digits in a number. The xsd:fractionDigits, on the other hand, specifies the maximum number of digits in the fractional part (i.e., in the part to the right of the decimal point).
The enumeration facet is similar to the choice construct in the case of a DTD.
The enumeration facet in XML schemas allows us to specify a list of possible values for an element. The XML document corresponding to this schema must have one of these values.
Let us consider an example. Suppose that we want to maintain information about a book on computer science. One of the details that we want to maintain is the category of this book from a list of possible categories. Figure 4.40 shows the resulting XML schema.
Figure 4.40 Schema example using the enumeration facet
As we can see, our schema declaration specifies an enumeration facet for the book category. This allows us to specify a number of possible book category values, in this case, five. It should now be clear that our XML document based on this schema must have one of these values in the CATEGORY element. Otherwise, it would be considered an illegal XML document.
We can use the enumerated facet also with other data types, such as integer, NMTOKEN, date, etc.
We encounter situations commonly where we need to specify that the value of an XML element must start with something, or end with something, or should have some specific characters at a particular location, etc. Some of the common situations where this will apply are as follows:
The pattern facet allows us to deal with such requirements.
We will first see an example and then go into the syntactical details of this facet. Suppose that we want to define a three-digit book code, to start with. The resulting scheme declaration is shown in Figure 4.41.
Figure 4.41 Example of a pattern facet – 1
The full schema (bookcode1.xsd) and the corresponding XML document (bookcode1.xml) are shown in Figure 4.42.
Figure 4.42 Example of a pattern facet – 2
Let us first understand the facet declaration. Our facet declaration is:
As a result, together, our facet declaration portion of p {Nd} indicates one numeric digit. Now, our pattern fact consists of three such declarations, i.e., it looks as p {Nd} p {Nd} p {Nd}. Clearly, this means three numeric digit positions.
Therefore, what if we now want to represent a pin code of six digits? Clearly, our pattern facet would be as follows:
<xsd:pattern value="p{Nd}p{Nd}p{Nd} p{Nd}p{Nd}p{Nd}"/>
Note that there are six numeric digit positions defined now.
There are many interesting things we can do with pattern facets. For example, consider the following pattern facet:
<xsd:pattern value="ABp{Nd}"/>
Now, we are saying that the element corresponding to this declaration consists of three positions. The first two positions must contain upper case alphabets A and B. The third position must contain a numeric digit.
Following are some of the valid XML elements corresponding to this pattern facet.
AB1
AB0
AB9
Following are some of the invalid XML elements. The reasons are specified in the brackets.
890 (Must start with AB)
ABT (Third position must contain a number)
Ab9 (Second position must contain an upper case B)
Based on this understanding, let us summarise the various pattern facet options, as shown in Table 4.7. Here, X and Y should be interpreted as generic symbols, which, in real life, could be replaced by any string. Similarly, m and n are integers, which would be replaced by actual number of occurrences.
Table 4.7 Main regular expression symbols for XML schema
We have omitted some less significant symbols from the list.
Earlier, we had stated the significance of the pattern facet p{}. Let us dwell on it further. In the context of this pattern, we can specify a number of things inside the curly brackets. What we specify there, determines what the corresponding element in the XML document can contain. For example, if we have an N there, we can have a numeric digit, as we have seen previously. What are the other options? Figure 4.43 summarises this.
Figure 4.43 Various pattern facet classes
The pattern facet classes allow us to define a large number of patterns quite comprehensively. However, we do not require all of them in most practical situations. For example, we would encounter the L and the N quite commonly, but not the M and the Z. Regardless, we would list them for the sake of completeness, as shown in Table 4.8.
Pattern abbreviation (L) | Contains | Examples |
---|---|---|
L | All alphabets | a, b, c, …, A, B, C, … |
Lu | Uppercase alphabets | A, B, C, … |
Ll | Lowercase alphabets | a, b, c, … |
Lt | Title case alphabets | << Not English >> |
Lm | Modified letters (e.g., superscript) | m, k, u |
Lo | Other letters | Japanese, etc. |
Table 4.8 (A) Letters pattern facet classes
Pattern abbreviation (M) | Contains | Examples |
---|---|---|
M | All marks | |
Mn | Non-spacing marks | <<Not English >> |
Mc | Spacing combining marks | <<Not English >> |
Me | Enclosing marks | << Million sign >> |
Table 4.8 (B) Marks pattern facet classes
Pattern abbreviation (N) | Contains | Examples |
---|---|---|
N | All numbers | 0, 1, 2, …, I, II, III, …, ½, ¾, … |
Nd | Decimal digits | 0, 1, 2, … |
Nl | Numbers based on letters | I, II, III, … |
No | Other numbers | ¼, 2/3, … |
Table 4.8 (C) Numbers pattern facet classes
Pattern abbreviation (P) | Contains | Examples |
---|---|---|
P | All punctuations | ( ) [ ] { } @ |
Pc | Connectors | << Not relevant >> |
Pd | Dashes | Hyphens, etc |
Ps | Starting punctuations | ( [ { |
Pe | Ending punctuations | ) ]} |
Pi | Initial quotation marks | ' “ |
Pf | Final quotation marks | ' " |
Po | Other quotation marks | ? ! . |
Table 4.8 (D) Punctuations pattern facet classes
Pattern abbreviation (Z) | Contains | Examples |
---|---|---|
Z | All Separators | – |
Zs | Space | – |
Zl | Line separators | – |
Zp | Paragraph separators | – |
Table 4.8 (E) Separators pattern facet classes
Pattern abbreviation (S) | Contains | Examples |
---|---|---|
S | All symbols | ©☺ |
Sm | Mathematical symbols | ≤≥ ∞μ |
Sc | Currencies | €£¥ |
So | Other symbols | ®..«† |
Table 4.8 (F) Symbols pattern facet classes
Pattern abbreviation (C) | Contains | Examples |
---|---|---|
C | All others | – |
Cc | Control characters | – |
Cf | Format characters | – |
Co | Private use characters | – |
Cn | Unassigned | – |
Table 4.8 (G) Other pattern facet classes
Let us consider an exercise.
Problem
Suppose that we want to create a pattern facet to represent an amount of the pattern $99.99. In other words, this is a currency of up to a maximum of $99.99. Explain which pattern face classes we should use, and in what manner.
Solution
Step 1: We need to represent the currency. Therefore, we will use the pattern facet p{Sc}.
Step 2: We now need to keep a provision for two decimal integer positions. Therefore, we will use the pattern p{Nd}p{Nd}.
Step 3: Now we have a decimal point. This is represented as ..
Step 4: Now we have two more decimal digits for the fractional part. We can use p{Nd} again.
From the above steps, our resulting pattern is p{Sc}p{Nd}p{Nd}.p{Nd}p{Nd}.
Let us now put our exercise into action by providing the full schema and the corresponding XML document. Figure 4.44 shows the schema declaration for a book, including its price and the corresponding XML document.
Figure 4.44 Example of currency pattern facet
Note that the price of the book satisfies the pattern facet requirements. If you, instead, specify a value such as, say, $400.20 or $40.200, etc., it would not be acceptable, and an error will be flagged.
Let us have one more exercise before we conclude pattern facets
Problem
How can we represent a telephone number such as (9120)22907048 using pattern facets?
Solution
Step 1: We need to represent the opening bracket. Therefore, we will use the pattern facet p{Ps}.
Step 2: We now need to keep a provision for four decimal integer positions. Therefore, we will use the pattern p{Nd}p{Nd}p{Nd}p{Nd}.
Step 3: Now we need to close the opened bracket. Therefore, we will use the pattern facet p{Pe}.
Step 4: Now we have eight more decimal digits. We can use p{Nd} 8 times. However, let us imagine that we do not know how many decimal digit positions we require; but we require at least one. Then we can use the pattern facet p{Nd}+.
From the above steps, our resulting pattern is p{Ps}p{Nd}p{Nd}p{Nd}p{Nd}p{Pe}p{Nd}+.
The resulting schema and XML document are shown in Figure 4.45.
Figure 4.45 Example of brackets facet
We have discussed restrictions in detail. Restrictions are widely used for creating new simple types. We need to remember, however, that this is not the only mechanism for creating new simple types.
Unions allow us to combine simple types to create new simple types.
Suppose we want to represent information about a student. Let us imagine that a student can be identified uniquely either by roll number (a numeric type) or name (a string). Further, our XML document should support either of these identification mechanisms. In that case, we can create a new simple type that can have either the roll number or the student name. Thus, we are creating a union of the student's roll number, and the name. This can be depicted in a schema as shown in Figure 4.46.
Figure 4.46 Example of union in a schema
The corresponding XML document is shown in Figure 4.47.
Figure 4.47 Union in use in an XML document
A list type allows the creation of a list of a particular simple type.
For example, suppose that we want a list of employee numbers to be created as follows.
<EMP_LIST> 9662 10000 10190 9939 </EMP_LIST>
Then we can use the xsd:list type in conjunction with an xsd:simpleType. There is an attribute called as itemType with an xsd:list, which allows us to specify the type of each of the items in the list. This is shown as follows for the above XML content.
<xsd:simpleType name = “EmpList”>
<xsd:list itemType = “xsd:int”/>
</xsd:simpleType>
Figure 4.48 shows both the complete schema and the corresponding XML document. We have considered two cases: a valid XML document, and an invalid XML document. The invalid or illegal XML document contains a string in the list. This is not allowed, as our list definition in the schema clearly allows only integers. The addition of a string violates this rule.
Figure 4.48 List example
Although lists themselves may not be useful, we can create more interesting lists by restricting them. For example, we can alter the earlier list by mandating that the list can contain the employee numbers of only five employees. The modified schema and the corresponding XML document are shown in Figure 4.49.
Figure 4.49 Modified list example
If we reduce or add one more employee number from or to the list, the XML document would become illegal. We must have exactly five employee numbers.
We know that an empty element in XML is the one that cannot have child elements or its own data. We have declared empty elements in a DTD by using the EMPTY content type. In the case of a schema, we declare an element as empty by not specifying a child of the form xsd:sequence, xsd;all, or xsd:choice.
For example, we can have the following declaration:
<xsd:complexType name = “ThisIsEmptyExample”>
</xsd:complexType>
So far, we have ignored attributes in the context of schemas. We would now take a look at them. The declaration of attributes is similar to that of elements. We specify the keyword called as attribute to define an attribute. For example, suppose that we want to declare an attribute named designation. Then, its declaration would be as follows:
<attribute name = “designation”/>
We can also add a data type to an attribute. For example, the above declaration can be modified as follows:
<attribute name = “designation” type = “xsd:string”/>
An attribute occurs only once, so we cannot use minOccurs or maxOccurs. However, we can specify whether we (a) must have an attribute (required), (b) may have an attribute (optional), or (c) cannot have an attribute (prohibited). These three possibilities are illustrated in Figure 4.50.
Figure 4.50 Possibilities about the presence or absence of an attribute
We can also assign a fixed or default value to an attribute. Figure 4.51 shows examples of these two.
Figure 4.51 Specifying fixed or default values for an attribute
In the first case, the type of the attribute is required. It also has a fixed value. In other words, this is a mandatory attribute, which has a constant value. The corresponding XML document cannot either drop this attribute, or change the attribute value.
In the second case, the attribute is of type optional. It has a default value. This means that this attribute may be present in the XML document. If present, it will have a default value as specified. However, the attribute can also be absent from the XML document. Alternatively, the attribute may be present, but with the value that is same as what is specified as default, or with another value.
If we specify a default value for an attribute, then its type must be optional.
Recall that attributes have no independent existence. An attribute is relevant only in the context of an element. Let us now see how we can attach an attribute to an element. For this purpose, we need to declare an attribute inside an element's declaration. However, if we want to attach an attribute to an element, the element must have a complex type. A simple element can only have some content as the body of the element, and therefore, cannot have attributes.
Let us now consider an example. Suppose that we want to maintain the information about an employee in terms of the following: employee ID, name, designation and whether the employee is confirmed. We can model the first three as elements, and the last as the attribute. This is shown in Figure 4.52.
Figure 4.52 Defining an attribute in a schema
Note that the attribute is defined as a part of a complex type, named EmpType. This complex type (i.e., EmpType) is the data type of our root element EMPLOYEE. Thus, the attribute emp_confirmed is also associated with our EMPLOYEE root element.
Let us now create a sample XML document corresponding to our schema. This is shown in Figure 4.53.
Figure 4.53 Using an attribute in an XML document
Note how we have used the attribute emp_confirmed. We have associated it with the root element EMPLOYEE. This is in perfect agreement with our earlier schema definition.
As we had mentioned earlier, the decision regarding whether to model something as a child element or an attribute, depends on the kind of application and the designer's view. For example, suppose that we want to create a schema to store information about a product. To start with, we do not want to store anything but the name of the product. Then, we have two choices: (a) Define a sub element called as NAME, or (b) Use an attribute called as name. Accordingly, the XML documents would differ, as shown in Figure 4.54.
Figure 4.54 Defining content as a child element versus as an attribute in a schema Let us examine these two cases.
Similar to how we can group elements, we can also group attributes. An attribute group is a set of attributes. It can be used on a set of elements. How is this achieved?
If an element has several attributes, then we can group them and provide a reference of this group to the concerned element.
This approach provides two benefits:
An example of attribute grouping is shown in Figure 4.55. As shown, a group of attributes is created here for the attributes corresponding to an employee element.
Figure 4.55 Grouping attributes
As we can see, the syntax for grouping attributes is the use of the attributeGroup ref declaration. This specifies that we want to create a group of attributes, with the specified name (in this case, empDetails). We then specify the details of the attribute group (i.e., which attributes it groups) with the help of the attributeGroup name declaration. This tag specifies the attributes that we want to group together.
True or False Questions
Multiple Choice Questions
Detailed Questions
Exercises
True or False Questions
1. False | 2. True | 3. False | 4. True |
5. True | 6. False | 7. True | 8. False |
9. True | 10. True |
Multiple Choice Questions
1. c | 2. d | 3. c | 4. c |
5. a | 6. c | 7. a | 8. b |
9. c | 10. d |
3.145.59.187