Chapter 2. Background

This chapter discusses some background information that is crucial to understanding the rest of the book. This material is not intended to be a thorough introduction but rather a quick refresher. If you are already familiar with these topics, feel free to move on to the next chapter. The topics reviewed in this chapter are XML concepts (including basics, schemas, Infoset and XPath) and Web concepts (URIs, HTTP and MIME).

XML

This section briefly reviews the core concepts of XML. Basic reviews of eXtensible Markup Language (XML), Document Type Definitions (DTDs), XML Schema, RelaxNG (pronounced relaxing), XML Namespaces, XML Infoset, and XPath are provided to ensure that you are familiar with these technologies.

XML Basics

The world of Web services builds heavily on the core set of XML [XML] specifications. XML is actually not a language, but a metalanguage for defining new languages. The definition of XML is platform independent and is defined using Unicode, which allows it to represent content from many natural languages. Because of these factors and because of the wide support of XML by basically every software vendor, XML has rapidly become the de facto format for data interchange between disparate entities.

XML provides a small set of core concepts for defining new languages:

  • Elements—An XML element is a named construct that has a set of attributes and some children. The children of an element can be other elements, literal text, comments, and a few other types. Elements are written using angle brackets:

    start element foo "<foo>" and end element foo "</foo>"

  • Attributes—Attributes are name-value pairs that are associated with an element. An element can have any number of attributes and is written as follows:

    <foo name1="value1" name2="value2" …>
    
  • Comments—Comments are enclosed within "<!--" and "-->" character sequences and are meant for the processor to ignore.

  • Literal text—Elements can contain character sequences consisting of Unicode characters. A key quality of XML is that all characters contained within XML documents are represented in Unicode. By using Unicode, XML can store characters from almost any language. Therefore, XML is internationalized by definition.

  • Document—An XML document is a unit of XML packaging that consists of exactly one element (called the document element) and might contain comments and a few other items.

Using these concepts, you can define new languages simply by deciding on a set of element names, their valid content, and the kinds of literal text that are permissible as attribute values and element content. A key improvement of XML over SGML (its precursor) was the notion of well-formed, but not necessarily valid, documents. A document is said to be well formed if it adheres to all the XML syntax rules. A well-formed document is also valid if it conforms to some DTD or XML Schema or some other document structure definition language. By creating the concept of well formed, the creators of XML enabled its rapid adoption because that allowed applications to use XML as a syntax without having to have a DTD or an XML Schema and having to validate the XML structure.

The elements that define the SOAP message format is an example of an XML vocabulary. WSDL is another example. Thus, most of the Web service specifications define one or more XML languages.

DTDs, XML Schema, and RelaxNG

As indicated earlier, it is simple to define a new XML language. How do you declare the structure of an XML language? That is, how do you indicate what the document element of an XML document must be, what elements it can have as children, what attributes the elements have, and so on? That is the role of DTDs, XML Schema [XML Schema], and RelaxNG [RelaxNG].

DTDs

DTDs were created for SGML (the precursor to XML) to declare the structure of SGML documents. When XML was designed, the DTD mechanism was the de facto mechanism for defining the structure of XML documents. Using a non-XML syntax, the DTD language allows you to define the names of elements, the number of times an element occurs within another, and so on.

People used DTDs widely during the early days of XML because that was the only standard structure definition language. However, with the advent of XML Schema, the use of DTDs has declined rapidly and is primarily of historical interest at this point.

XML Schema

XML Schema is a document structuring and type definition specification that the World Wide Web Consortium (W3C) developed. Using an XML syntax, XML Schema allows you to specify XML structures similar to DTDs but in a much more powerful manner. In addition to structure, XML Schema also allows you to specify data types. It defines a set of primitive data types that you can use to define attribute and element values and recursive type constructors for defining arbitrarily complex type structures. XML Schema is powerful and sometimes daunting, but it has wide industry support. Today, it includes tools that make it quite simple to define schemas.

All of the Web services specifications that this book discusses have associated schemas written using XML Schema.

RelaxNG

RelaxNG is a merger of two schema languages: TREX by James Clark and Relax by Mukato Murato. TREX (Tree Regular Expressions for XML) is a regular expression way to define only the structure of XML. Relax is a tree-automata-based regular expression language. Thus, RelaxNG focuses only on defining the structure of XML documents, not their types. The premise is that XML Schema has become complex partly because of mixing structure and typing declaration capabilities.

Although RelaxNG is indeed technically solid, its adoption in the industry has been weak due to the proliferation of XML Schema.

XML Namespaces

Because you can easily define XML languages by choosing a set of element and attribute names and defining how they relate to each other, many XML vocabularies are available. If you want to combine two such vocabularies, however, you might run into trouble with name conflicts. Two or more languages might have chosen the same element names.

Enter XML Namespaces [XML Namespaces], which is a way to scope element and attribute names to a namespace so that their usage is always unique, regardless of their context.

Thus, XML Namespaces introduces the concept of qualified names (QNames). A QName is a combination of a namespace name and a local name. The namespace name scopes the local name.

XML Infoset

Although common understanding of XML is in its “angle bracket” form, the XML Infoset defines the fundamentals of XML [XML Infoset]. The XML Infoset defines the underlying information model of XML. In other words, it abstracts from the angle bracket syntax and defines what information is contained in an element, an attribute, and so on.

The XML Infoset defines a set of information items that correspond to the syntactic constructs of XML. The important information items are as follows:

  • Element information item (EII)—An EII is the abstraction of an element. EII properties include a list of children (which might be other EIIs, character information items, and so on), a list of attributes, the name and namespace name, and so on.

  • Attribute information item (AII)—An AII is an abstraction of an attribute. Thus, an AII has as properties its name and namespace name, its value, and so on.

  • Character information item (CII)—A CII represents literal text found as element content.

If an XML Schema has been used to validate a document, the Infoset might be augmented to form the Post Schema Validation Infoset (PSVI). The PSVI basically has additional properties containing the type information and the validation status of each element.

Although many people think of the angle bracket form when they think of XML, the real action of XML lies at the Infoset level. The familiar angle brackets are simply a serialization of the Infoset. Two such serializations are already defined: XML 1.0 and XML 1.1, both of which use the familiar angle bracket form.

Going into the future, serializations of the XML Infoset likely will not use Unicode text characters as XML 1.0 and XML 1.1 do. Rather, binary serializations—which are extremely space efficient and fast to parse—are beginning to appear. See, for example, [W3C Binary XML]. The emergence of such serializations allows you to address some of the biggest criticisms of XML: that it is bloated and slow to process.

Many of the Web service specifications have taken this abstract concept one step further. SOAP 1.2 and WSDL 2.0, for example, are now defined at an abstract level, and their XML representation is simply a serialization of the abstraction. Therefore, you can use SOAP 1.2 and WSDL 2.0 and never serialize them using XML!

DOM, SAX, and So On

Document Object Model (DOM), Streaming API for XML (SAX), and so on, are programming APIs for accessing the Infoset. Because this book does not cover programming aspects of Web services, this topic is not discussed further.

XPath

The XML Path Language (XPath) [XPath] was designed as part of XSL [XSL] as an addressing mechanism to identify elements, attributes, and other information items within XML documents. Using XPath, you can use a familiar file path notation to identify locations within XML documents. You can use XPath as a query language to select and extract a set of items or as an addressing mechanism to identify a specific location.

XPath has been hugely successful and has been used in numerous settings, including in BPEL4WS, as you will see later. XPath is also the basis of XQuery [XQuery], a full query language for XML.

World Wide Web

The Web is built on a few core standards that are universally adopted. This section briefly reviews uniform resource identifiers (URIs), HyperText Transfer Protocol (HTTP), and Multipurpose Internet Mail Exchange (MIME).

URIs

A URI [URI] is a format for identifying an abstract or concrete resource, which can be basically anything with an identity. Uniform Resource Locators (URLs) are a popular form of URIs—basically those URIs that can be de-referenced to obtain a representation of the resource identified by the URI. Although recent revisions of the URI specification do away with the URL concept and treat all URIs the same, the basic idea of some URIs being de-referenceable and others not still exists.

It is important to note, however, that even URIs that look de-referenceable (for example, http://this-is-a-fake-name.com/foo) might not be. Thus, the only assertion you can make about a URI is that it identifies some resource and might be de-referenceable to obtain a representation of that resource.

Many Web service specifications use URIs to identify components defined by those specifications. Some specifications, however, use QNames, and the question of whether URIs or QNames identify things is a frequent debate among the Web community. The use of QNames to identify components received a boost by the XML Schema specification because it uses QNames to identify types and elements that a given schema defines. QNames allow you to formalize the fact that a set of definitions are related because they have a shared namespace name. If you were to use URIs to name a group of related items, you would need to understand the structure of the URI so that you could recover the relationship. Significant parts of URIs (depending on the URI scheme) are supposed to be opaque except to their creator.

The debate of URIs versus QNames continues. Unfortunately, some of the Web service specifications use URIs for naming (WS-Policy in particular), although many others use only QNames, bringing this debate to the world of Web services as well.

HTTP

HTTP [HTTP] is the communication protocol that transfers representations of resources on the Web. The use of HTTP and URIs on the Web has been modeled by an architectural style called Representational State Transfer (REST) [F00].

Web services (SOAP in particular) uses HTTP as a transport (rather than transfer) protocol to carry SOAP messages from one endpoint to another. The use of HTTP in this manner has been quite controversial at times because SOAP can be carried on many transport protocols and does not really adhere to the HTTP semantics. However, the widespread deployment of HTTP means that SOAP will continue to use HTTP as a transport protocol for the foreseeable future, despite its “breaking of the religion” of HTTP as an application level transfer protocol rather than a transport protocol.

MIME

MIME [MIME] is a standard that was originally developed to address the problem of sending nontext content as e-mail attachments. MIME’s generality and power were so attractive that HTTP also adopted it as the mechanism to type HTTP messages, thereby gaining MIME’s advantages for transmitting multipart content.

MIME defines a set of media types that you can use to tag the type of some media being sent or received as a binary sequence of bytes. The SOAP with Attachments specification (see Chapter 4, “SOAP”) uses HTTP’s MIME transport capability to define a packaging model for how to transmit messages that involve SOAP messages and attachments of various binary forms conforming to MIME media types.

Summary

This chapter has given you a brief review of some of the key technologies that the Web services platform builds on. If you need more information about any of these topics, there are many excellent books available.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
18.116.80.34